:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dinkel, Heinrich, Sun, Xingwei, Li, Gang, Mei, Jiahao, Niu, Yadong, Liu, Jizhong, Li, Xiyang, Liao, Yifan, Zhou, Jiahao, Zhang, Junbo, Luan, Jian
Format:	Preprint
Published:	2026
Subjects:	Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2602.23765
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text
by: Mei, Jiahao, et al.
Published: (2026)

MiDashengLM: Efficient Audio Understanding with General Audio Captions
by: Dinkel, Heinrich, et al.
Published: (2025)

ACAVCaps: Enabling large-scale training for fine-grained and diverse audio understanding
by: Niu, Yadong, et al.
Published: (2026)

GLAP: General contrastive audio-text pretraining across domains and languages
by: Dinkel, Heinrich, et al.
Published: (2025)

Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering
by: Li, Gang, et al.
Published: (2025)

MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks
by: Niu, Yadong, et al.
Published: (2025)

Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders
by: Sun, Xingwei, et al.
Published: (2025)

X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance
by: Zhang, Junbo, et al.
Published: (2025)

The Interspeech 2026 Audio Encoder Capability Challenge for Large Audio Language Models
by: Dinkel, Heinrich, et al.
Published: (2026)

Scaling up masked audio encoder learning for general audio classification
by: Dinkel, Heinrich, et al.
Published: (2024)

Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
by: Liu, Jizhong, et al.
Published: (2024)

Bridging Language Gaps in Audio-Text Retrieval
by: Yan, Zhiyong, et al.
Published: (2024)

The ICME 2025 Audio Encoder Capability Challenge
by: Zhang, Junbo, et al.
Published: (2025)

Streaming Audio Transformers for Online Audio Tagging
by: Dinkel, Heinrich, et al.
Published: (2023)

GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
by: Yu, Jiahao, et al.
Published: (2023)

A unified multimodal understanding and generation model for cross-disciplinary scientific research
by: Yang, Xiaomeng, et al.
Published: (2026)

Gauge flux generations of weakly magnetized Dirac spin liquid in a kagomé lattice
by: Pan, Si-Yu, et al.
Published: (2025)

Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
by: Li, Ning, et al.
Published: (2025)

Enhancing Few-Shot Stock Trend Prediction with Large Language Models
by: Deng, Yiqi, et al.
Published: (2024)

TEAdapter: Supply abundant guidance for controllable text-to-music generation
by: Zou, Jialing, et al.
Published: (2024)

Efficient 3D Content Reconstruction and Generation
by: Li, Jiahao
Published: (2026)

ScenDi: 3D-to-2D Scene Diffusion Cascades for Urban Generation
by: Guo, Hanlei, et al.
Published: (2026)

Moreover. Are we just not clever enough to understand the mind?
Published: (1999)

Audio Dialogues: Dialogues dataset for audio and music understanding
by: Goel, Arushi, et al.
Published: (2024)

asvspoof2015 in WebDataset Format
by: Yadong, Niu
Published: (2025)

vocalimitationset in WebDataset Format
by: Yadong, Niu
Published: (2025)

Natural language is not enough: Benchmarking multi-modal generative AI for Verilog generation
by: Chang, Kaiyan, et al.
Published: (2024)

Adalbert Stifter
by: Dinkel, Benjamin
Published: (2025)

Finite‐Time Inverse Optimal Control for Low‐Order Stochastic Nonlinear Systems With Time‐Varying Orders
by: Mengmeng Jiang, et al.
Published: (2025)

UniMIC: Token-Based Multimodal Interactive Coding for Human-AI Collaboration
by: Mao, Qi, et al.
Published: (2025)

A unified physics-informed generative operator framework for general inverse problems
by: Bao, Gang, et al.
Published: (2025)

Are LLMs Reliable Rankers? Rank Manipulation via Two-Stage Token Optimization
by: Xing, Tiancheng, et al.
Published: (2025)

Source code for "Urban expansion reconfigures the landscape of a neglected zoonosis by advancing the peri-urban fringe"
by: Li, Xiyang
Published: (2026)

Heterogeneous Federated Fine-Tuning with Parallel One-Rank Adaptation
by: Zhang, Zikai, et al.
Published: (2026)

Direct numerical simulation of out-scale-actuated spanwise wall oscillation in turbulent boundary layers
by: Zhang, Jizhong, et al.
Published: (2026)

A Moldable, Tough Mineral‐Dominated Nanocomposite as a Recyclable Structural Material
by: Yadong Yu, et al.
Published: (2025)

Q2A: Querying Implicit Fully Continuous Feature Pyramid to Align Features for Medical Image Segmentation
by: Yu, Jiahao, et al.
Published: (2024)

A Survey on GUI Agents with Foundation Models Enhanced by Reinforcement Learning
by: Li, Jiahao, et al.
Published: (2025)

One stabilization is not enough for contractible 4-manifolds
by: Kang, Sungkyung
Published: (2022)

idekerlab/cellmaps_coembedding: Add umap generation
by: Joanna, et al.
Published: (2025)