Saved in:
| Main Authors: | Yin, Han, Choi, Jung-Woo |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.13148 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
EvA: An Evidence-First Audio Understanding Paradigm for LALMs
by: Xie, Xinyuan, et al.
Published: (2026)
by: Xie, Xinyuan, et al.
Published: (2026)
Focus Then Listen: Exploring Plug-and-Play Audio Enhancer for Noise-Robust Large Audio Language Models
by: Yin, Han, et al.
Published: (2026)
by: Yin, Han, et al.
Published: (2026)
ChildVox: A Speech, Audio, and Large Audio-Language Model Benchmark in Understanding and Characterizing Sound across Childhood
by: Feng, Tiantian, et al.
Published: (2026)
by: Feng, Tiantian, et al.
Published: (2026)
Can Large Language Models Understand Spatial Audio?
by: Tang, Changli, et al.
Published: (2024)
by: Tang, Changli, et al.
Published: (2024)
Words at Play: Benchmarking Audio Pun Understanding in Large Audio-Language Models
by: Su, Yuchen, et al.
Published: (2026)
by: Su, Yuchen, et al.
Published: (2026)
Exploring Fine-Tuning of Large Audio Language Models for Spoken Language Understanding under Limited Speech Data
by: Choi, Youngwon, et al.
Published: (2025)
by: Choi, Youngwon, et al.
Published: (2025)
Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models
by: Gao, Kuofeng, et al.
Published: (2024)
by: Gao, Kuofeng, et al.
Published: (2024)
AudioJudge: Understanding What Works in Large Audio Model Based Speech Evaluation
by: Manakul, Potsawee, et al.
Published: (2025)
by: Manakul, Potsawee, et al.
Published: (2025)
CAF-Score: Calibrating CLAP with LALMs for Reference-free Audio Captioning Evaluation
by: Lee, Insung, et al.
Published: (2026)
by: Lee, Insung, et al.
Published: (2026)
KoALa-Bench: Evaluating Large Audio Language Models on Korean Speech Understanding and Faithfulness
by: Kim, Jinyoung, et al.
Published: (2026)
by: Kim, Jinyoung, et al.
Published: (2026)
Audio Large Language Models Can Be Descriptive Speech Quality Evaluators
by: Chen, Chen, et al.
Published: (2025)
by: Chen, Chen, et al.
Published: (2025)
VoiceGiraffe: A Benchmark for Extreme Long-Context Audio-Language Understanding
by: Ye, Jashin, et al.
Published: (2026)
by: Ye, Jashin, et al.
Published: (2026)
Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models
by: Sridhar, Arvind Krishna, et al.
Published: (2024)
by: Sridhar, Arvind Krishna, et al.
Published: (2024)
ChronosAudio: A Comprehensive Long-Audio Benchmark for Evaluating Audio-Large Language Models
by: Luo, Kaiwen, et al.
Published: (2026)
by: Luo, Kaiwen, et al.
Published: (2026)
Extending Audio Context for Long-Form Understanding in Large Audio-Language Models
by: Chaichana, Yuatyong, et al.
Published: (2025)
by: Chaichana, Yuatyong, et al.
Published: (2025)
The World is Not Mono: Enabling Spatial Understanding in Large Audio-Language Models
by: You, Yuhuan, et al.
Published: (2026)
by: You, Yuhuan, et al.
Published: (2026)
Audio Entailment: Assessing Deductive Reasoning for Audio Understanding
by: Deshmukh, Soham, et al.
Published: (2024)
by: Deshmukh, Soham, et al.
Published: (2024)
Who Can Withstand Chat-Audio Attacks? An Evaluation Benchmark for Large Audio-Language Models
by: Yang, Wanqi, et al.
Published: (2024)
by: Yang, Wanqi, et al.
Published: (2024)
DeFT-Mamba: Universal Multichannel Sound Separation and Polyphonic Audio Classification
by: Lee, Dongheon, et al.
Published: (2024)
by: Lee, Dongheon, et al.
Published: (2024)
A Sensitivity Analysis of Multi-Event Audio Grounding in Audio LLMs
by: Lee, Taehan, et al.
Published: (2026)
by: Lee, Taehan, et al.
Published: (2026)
SALM: Spatial Audio Language Model with Structured Embeddings for Understanding and Editing
by: Hu, Jinbo, et al.
Published: (2025)
by: Hu, Jinbo, et al.
Published: (2025)
AudioScene: Integrating Object-Event Audio into 3D Scenes
by: Yuan, Shuaihang, et al.
Published: (2025)
by: Yuan, Shuaihang, et al.
Published: (2025)
Audio-Language Datasets of Scenes and Events: A Survey
by: Wijngaard, Gijs, et al.
Published: (2024)
by: Wijngaard, Gijs, et al.
Published: (2024)
Representation-Regularized Convolutional Audio Transformer for Audio Understanding
by: Han, Bing, et al.
Published: (2026)
by: Han, Bing, et al.
Published: (2026)
TinyMU: A Compact Audio-Language Model for Music Understanding
by: Li, Xiquan, et al.
Published: (2026)
by: Li, Xiquan, et al.
Published: (2026)
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
by: Ghosh, Sreyan, et al.
Published: (2024)
by: Ghosh, Sreyan, et al.
Published: (2024)
AudioChat: Unified Audio Storytelling, Editing, and Understanding with Transfusion Forcing
by: Chen, William, et al.
Published: (2026)
by: Chen, William, et al.
Published: (2026)
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
by: Tian, Jinchuan, et al.
Published: (2025)
by: Tian, Jinchuan, et al.
Published: (2025)
HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models
by: Zhao, Feiyu, et al.
Published: (2026)
by: Zhao, Feiyu, et al.
Published: (2026)
DIFFA-2: A Practical Diffusion Large Language Model for General Audio Understanding
by: Zhou, Jiaming, et al.
Published: (2026)
by: Zhou, Jiaming, et al.
Published: (2026)
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
by: Ghosh, Sreyan, et al.
Published: (2025)
by: Ghosh, Sreyan, et al.
Published: (2025)
Audio-Mind: An Auditable Agentic Framework for Audio Understanding
by: Wang, Yucheng, et al.
Published: (2026)
by: Wang, Yucheng, et al.
Published: (2026)
AudioBench: A Universal Benchmark for Audio Large Language Models
by: Wang, Bin, et al.
Published: (2024)
by: Wang, Bin, et al.
Published: (2024)
AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
by: He, Peize, et al.
Published: (2025)
by: He, Peize, et al.
Published: (2025)
AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders
by: Aparin, Georgii, et al.
Published: (2026)
by: Aparin, Georgii, et al.
Published: (2026)
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
by: Li, Kai, et al.
Published: (2025)
by: Li, Kai, et al.
Published: (2025)
Do Audio-Language Models Understand Linguistic Variations?
by: Selvakumar, Ramaneswaran, et al.
Published: (2024)
by: Selvakumar, Ramaneswaran, et al.
Published: (2024)
MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2026)
by: Yang, Chih-Kai, et al.
Published: (2026)
When Scaling Fails: Mitigating Audio Perception Decay of LALMs via Multi-Step Perception-Aware Reasoning
by: Mao, Ruixiang, et al.
Published: (2026)
by: Mao, Ruixiang, et al.
Published: (2026)
AudioRAG+: Feedback-driven Retrieval-augmented Audio Generation with Large Audio Language Models
by: Zhao, Junqi, et al.
Published: (2025)
by: Zhao, Junqi, et al.
Published: (2025)
Similar Items
-
EvA: An Evidence-First Audio Understanding Paradigm for LALMs
by: Xie, Xinyuan, et al.
Published: (2026) -
Focus Then Listen: Exploring Plug-and-Play Audio Enhancer for Noise-Robust Large Audio Language Models
by: Yin, Han, et al.
Published: (2026) -
ChildVox: A Speech, Audio, and Large Audio-Language Model Benchmark in Understanding and Characterizing Sound across Childhood
by: Feng, Tiantian, et al.
Published: (2026) -
Can Large Language Models Understand Spatial Audio?
by: Tang, Changli, et al.
Published: (2024) -
Words at Play: Benchmarking Audio Pun Understanding in Large Audio-Language Models
by: Su, Yuchen, et al.
Published: (2026)