Saved in:
| Main Authors: | Luo, Jiale, Liang, Xiaoyu, Hu, Haoji |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.25179 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Audio Token Compression in Large Audio Language Models
by: Bhati, Saurabhchand, et al.
Published: (2025)
by: Bhati, Saurabhchand, et al.
Published: (2025)
HeadRouter: Dynamic Head-Weight Routing for Task-Adaptive Audio Token Pruning in Large Audio Language Models
by: He, Peize, et al.
Published: (2026)
by: He, Peize, et al.
Published: (2026)
When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models
by: Li, Chen-An, et al.
Published: (2025)
by: Li, Chen-An, et al.
Published: (2025)
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
by: Liu, Tianqiao, et al.
Published: (2025)
by: Liu, Tianqiao, et al.
Published: (2025)
Flatter Tokens are More Valuable for Speculative Draft Model Training
by: Fan, Jiaming, et al.
Published: (2026)
by: Fan, Jiaming, et al.
Published: (2026)
Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization
by: Wang, Dixuan, et al.
Published: (2024)
by: Wang, Dixuan, et al.
Published: (2024)
Causal Tracing of Audio-Text Fusion in Large Audio Language Models
by: Chen, Wei-Chih, et al.
Published: (2026)
by: Chen, Wei-Chih, et al.
Published: (2026)
Evaluating Robustness of Large Audio Language Models to Audio Injection: An Empirical Study
by: Hou, Guanyu, et al.
Published: (2025)
by: Hou, Guanyu, et al.
Published: (2025)
MERaLiON-AudioLLM: Bridging Audio and Language with Large Language Models
by: He, Yingxu, et al.
Published: (2024)
by: He, Yingxu, et al.
Published: (2024)
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
by: Li, Kai, et al.
Published: (2025)
by: Li, Kai, et al.
Published: (2025)
SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia
by: Liu, Chaoqun, et al.
Published: (2025)
by: Liu, Chaoqun, et al.
Published: (2025)
Words at Play: Benchmarking Audio Pun Understanding in Large Audio-Language Models
by: Su, Yuchen, et al.
Published: (2026)
by: Su, Yuchen, et al.
Published: (2026)
On the Audio Hallucinations in Large Audio-Video Language Models
by: Nishimura, Taichi, et al.
Published: (2024)
by: Nishimura, Taichi, et al.
Published: (2024)
IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models
by: Gao, Yiming, et al.
Published: (2025)
by: Gao, Yiming, et al.
Published: (2025)
MiMo-Audio: Audio Language Models are Few-Shot Learners
by: Core Team, et al.
Published: (2025)
by: Core Team, et al.
Published: (2025)
Dynamic Token Reduction during Generation for Vision Language Models
by: Liang, Xiaoyu, et al.
Published: (2025)
by: Liang, Xiaoyu, et al.
Published: (2025)
AudioBERT: Audio Knowledge Augmented Language Model
by: Ok, Hyunjong, et al.
Published: (2024)
by: Ok, Hyunjong, et al.
Published: (2024)
Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation
by: Feng, Bo-Han, et al.
Published: (2026)
by: Feng, Bo-Han, et al.
Published: (2026)
Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization
by: Fang, Zheng, et al.
Published: (2026)
by: Fang, Zheng, et al.
Published: (2026)
AHELM: A Holistic Evaluation of Audio-Language Models
by: Lee, Tony, et al.
Published: (2025)
by: Lee, Tony, et al.
Published: (2025)
ALARM: Audio-Language Alignment for Reasoning Models
by: Grinberg, Petr, et al.
Published: (2026)
by: Grinberg, Petr, et al.
Published: (2026)
AudioBench: A Universal Benchmark for Audio Large Language Models
by: Wang, Bin, et al.
Published: (2024)
by: Wang, Bin, et al.
Published: (2024)
SonicBench: Dissecting the Physical Perception Bottleneck in Large Audio Language Models
by: Sun, Yirong, et al.
Published: (2026)
by: Sun, Yirong, et al.
Published: (2026)
Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
by: Chen, Yiming, et al.
Published: (2024)
by: Chen, Yiming, et al.
Published: (2024)
Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models
by: Ieong, Lok-Lam, et al.
Published: (2026)
by: Ieong, Lok-Lam, et al.
Published: (2026)
Pay More Attention To Audio: Mitigating Imbalance of Cross-Modal Attention in Large Audio Language Models
by: Wang, Junyu, et al.
Published: (2025)
by: Wang, Junyu, et al.
Published: (2025)
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
by: Ghosh, Sreyan, et al.
Published: (2025)
by: Ghosh, Sreyan, et al.
Published: (2025)
Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models
by: Goel, Arushi, et al.
Published: (2025)
by: Goel, Arushi, et al.
Published: (2025)
When Audio and Text Disagree: Revealing Text Bias in Large Audio-Language Models
by: Wang, Cheng, et al.
Published: (2025)
by: Wang, Cheng, et al.
Published: (2025)
Thinking with Sound: Audio Chain-of-Thought Enables Multimodal Reasoning in Large Audio-Language Models
by: Xiong, Zhen, et al.
Published: (2025)
by: Xiong, Zhen, et al.
Published: (2025)
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
by: Long, Zuwei, et al.
Published: (2025)
by: Long, Zuwei, et al.
Published: (2025)
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
by: Huang, Ailin, et al.
Published: (2025)
by: Huang, Ailin, et al.
Published: (2025)
From Token to Token Pair: Efficient Prompt Compression for Large Language Models in Clinical Prediction
by: Zhu, Mingcheng, et al.
Published: (2026)
by: Zhu, Mingcheng, et al.
Published: (2026)
Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models
by: Sridhar, Arvind Krishna, et al.
Published: (2024)
by: Sridhar, Arvind Krishna, et al.
Published: (2024)
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
by: Ye, Zhen, et al.
Published: (2024)
by: Ye, Zhen, et al.
Published: (2024)
Reducing Object Hallucination in Large Audio-Language Models via Audio-Aware Decoding
by: Hsu, Tzu-wen, et al.
Published: (2025)
by: Hsu, Tzu-wen, et al.
Published: (2025)
Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model
by: Ma, Ziyang, et al.
Published: (2025)
by: Ma, Ziyang, et al.
Published: (2025)
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
by: Liu, Jizhong, et al.
Published: (2024)
by: Liu, Jizhong, et al.
Published: (2024)
All That Glitters Is Not Audio: Rethinking Text Priors and Audio Reliance in Audio-Language Evaluation
by: Foo, Leonardo Haw-Yang, et al.
Published: (2026)
by: Foo, Leonardo Haw-Yang, et al.
Published: (2026)
Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models
by: Gao, Kuofeng, et al.
Published: (2024)
by: Gao, Kuofeng, et al.
Published: (2024)
Similar Items
-
Towards Audio Token Compression in Large Audio Language Models
by: Bhati, Saurabhchand, et al.
Published: (2025) -
HeadRouter: Dynamic Head-Weight Routing for Task-Adaptive Audio Token Pruning in Large Audio Language Models
by: He, Peize, et al.
Published: (2026) -
When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models
by: Li, Chen-An, et al.
Published: (2025) -
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
by: Liu, Tianqiao, et al.
Published: (2025) -
Flatter Tokens are More Valuable for Speculative Draft Model Training
by: Fan, Jiaming, et al.
Published: (2026)