Saved in:
| Main Authors: | Wang, Hualei, Li, Yiming, Ma, Shuo, Liu, Hong, Wang, Xiangdong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.11039 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Leveraging Language Model Capabilities for Sound Event Detection
by: Wang, Hualei, et al.
Published: (2023)
by: Wang, Hualei, et al.
Published: (2023)
Bridging Language Gaps in Audio-Text Retrieval
by: Yan, Zhiyong, et al.
Published: (2024)
by: Yan, Zhiyong, et al.
Published: (2024)
Focus Then Listen: Exploring Plug-and-Play Audio Enhancer for Noise-Robust Large Audio Language Models
by: Yin, Han, et al.
Published: (2026)
by: Yin, Han, et al.
Published: (2026)
Audio-Thinker: Guiding Audio Language Model When and How to Think via Reinforcement Learning
by: Wu, Shu, et al.
Published: (2025)
by: Wu, Shu, et al.
Published: (2025)
Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering
by: Glazer, Neta, et al.
Published: (2026)
by: Glazer, Neta, et al.
Published: (2026)
Aligned Better, Listen Better for Audio-Visual Large Language Models
by: Guo, Yuxin, et al.
Published: (2025)
by: Guo, Yuxin, et al.
Published: (2025)
Can Audio Large Language Models Verify Speaker Identity?
by: Ren, Yiming, et al.
Published: (2025)
by: Ren, Yiming, et al.
Published: (2025)
SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding
by: Sun, Luoyi, et al.
Published: (2026)
by: Sun, Luoyi, et al.
Published: (2026)
Learning When to Think While Listening in Large Audio-Language Models
by: Song, Zhiyuan, et al.
Published: (2026)
by: Song, Zhiyuan, et al.
Published: (2026)
DIFFA: Large Language Diffusion Models Can Listen and Understand
by: Zhou, Jiaming, et al.
Published: (2025)
by: Zhou, Jiaming, et al.
Published: (2025)
AudioBench: A Universal Benchmark for Audio Large Language Models
by: Wang, Bin, et al.
Published: (2024)
by: Wang, Bin, et al.
Published: (2024)
ChronosAudio: A Comprehensive Long-Audio Benchmark for Evaluating Audio-Large Language Models
by: Luo, Kaiwen, et al.
Published: (2026)
by: Luo, Kaiwen, et al.
Published: (2026)
Resonate: Reinforcing Text-to-Audio Generation via Online Feedback from Large Audio Language Models
by: Li, Xiquan, et al.
Published: (2026)
by: Li, Xiquan, et al.
Published: (2026)
AudioKV: KV Cache Eviction in Efficient Large Audio Language Models
by: Wang, Yuxuan, et al.
Published: (2026)
by: Wang, Yuxuan, et al.
Published: (2026)
AudioRAG+: Feedback-driven Retrieval-augmented Audio Generation with Large Audio Language Models
by: Zhao, Junqi, et al.
Published: (2025)
by: Zhao, Junqi, et al.
Published: (2025)
HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models
by: Zhao, Feiyu, et al.
Published: (2026)
by: Zhao, Feiyu, et al.
Published: (2026)
FLAM: Frame-Wise Language-Audio Modeling
by: Wu, Yusong, et al.
Published: (2025)
by: Wu, Yusong, et al.
Published: (2025)
Direct Simultaneous Translation Activation for Large Audio-Language Models
by: Zhang, Pei, et al.
Published: (2025)
by: Zhang, Pei, et al.
Published: (2025)
Temporal Contrastive Decoding: A Training-Free Method for Large Audio-Language Models
by: Li, Yanda, et al.
Published: (2026)
by: Li, Yanda, et al.
Published: (2026)
A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook
by: Luo, Kaiwen, et al.
Published: (2026)
by: Luo, Kaiwen, et al.
Published: (2026)
Audio Super-Resolution with Latent Bridge Models
by: Li, Chang, et al.
Published: (2025)
by: Li, Chang, et al.
Published: (2025)
SAR-LM: Symbolic Audio Reasoning with Large Language Models
by: Taheri, Termeh, et al.
Published: (2025)
by: Taheri, Termeh, et al.
Published: (2025)
The Interspeech 2026 Audio Encoder Capability Challenge for Large Audio Language Models
by: Dinkel, Heinrich, et al.
Published: (2026)
by: Dinkel, Heinrich, et al.
Published: (2026)
Frame-Level Internal Tool Use for Temporal Grounding in Audio LMs
by: An, Joesph, et al.
Published: (2026)
by: An, Joesph, et al.
Published: (2026)
Listen, Pause, and Reason: Toward Perception-Grounded Hybrid Reasoning for Audio Understanding
by: Wang, Jieyi, et al.
Published: (2026)
by: Wang, Jieyi, et al.
Published: (2026)
Unlocking Large Audio-Language Models for Interactive Language Learning
by: Liu, Hongfu, et al.
Published: (2026)
by: Liu, Hongfu, et al.
Published: (2026)
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
by: Li, Kai, et al.
Published: (2025)
by: Li, Kai, et al.
Published: (2025)
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
by: Liu, Jizhong, et al.
Published: (2024)
by: Liu, Jizhong, et al.
Published: (2024)
Towards Fine-grained Temporal Perception: Post-Training Large Audio-Language Models with Audio-Side Time Prompt
by: Shi, Yanfeng, et al.
Published: (2026)
by: Shi, Yanfeng, et al.
Published: (2026)
Can Large Language Models Understand Spatial Audio?
by: Tang, Changli, et al.
Published: (2024)
by: Tang, Changli, et al.
Published: (2024)
LLM-Codec: Neural Audio Codec Meets Language Model Objectives
by: Chung, Ho-Lam, et al.
Published: (2026)
by: Chung, Ho-Lam, et al.
Published: (2026)
Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models
by: Sridhar, Arvind Krishna, et al.
Published: (2024)
by: Sridhar, Arvind Krishna, et al.
Published: (2024)
Advancing Singlish Understanding: Bridging the Gap with Datasets and Multimodal Models
by: Wang, Bin, et al.
Published: (2025)
by: Wang, Bin, et al.
Published: (2025)
Thinking with Sound: Audio Chain-of-Thought Enables Multimodal Reasoning in Large Audio-Language Models
by: Xiong, Zhen, et al.
Published: (2025)
by: Xiong, Zhen, et al.
Published: (2025)
SEA-Spoof: Bridging The Gap in Multilingual Audio Deepfake Detection for South-East Asian
by: Wu, Jinyang, et al.
Published: (2025)
by: Wu, Jinyang, et al.
Published: (2025)
Words at Play: Benchmarking Audio Pun Understanding in Large Audio-Language Models
by: Su, Yuchen, et al.
Published: (2026)
by: Su, Yuchen, et al.
Published: (2026)
StyleBreak: Revealing Alignment Vulnerabilities in Large Audio-Language Models via Style-Aware Audio Jailbreak
by: Li, Hongyi, et al.
Published: (2025)
by: Li, Hongyi, et al.
Published: (2025)
Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
by: Chen, Yiming, et al.
Published: (2024)
by: Chen, Yiming, et al.
Published: (2024)
AudioStory: Generating Long-Form Narrative Audio with Large Language Models
by: Guo, Yuxin, et al.
Published: (2025)
by: Guo, Yuxin, et al.
Published: (2025)
Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model
by: Ma, Ziyang, et al.
Published: (2025)
by: Ma, Ziyang, et al.
Published: (2025)
Similar Items
-
Leveraging Language Model Capabilities for Sound Event Detection
by: Wang, Hualei, et al.
Published: (2023) -
Bridging Language Gaps in Audio-Text Retrieval
by: Yan, Zhiyong, et al.
Published: (2024) -
Focus Then Listen: Exploring Plug-and-Play Audio Enhancer for Noise-Robust Large Audio Language Models
by: Yin, Han, et al.
Published: (2026) -
Audio-Thinker: Guiding Audio Language Model When and How to Think via Reinforcement Learning
by: Wu, Shu, et al.
Published: (2025) -
Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering
by: Glazer, Neta, et al.
Published: (2026)