Saved in:
| Main Authors: | Sridhar, Arvind Krishna, Guo, Yinyi, Visser, Erik |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.14666 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Spatial Audio Question Answering and Reasoning on Dynamic Source Movements
by: Sridhar, Arvind Krishna, et al.
Published: (2026)
by: Sridhar, Arvind Krishna, et al.
Published: (2026)
Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models
by: Sridhar, Arvind Krishna, et al.
Published: (2024)
by: Sridhar, Arvind Krishna, et al.
Published: (2024)
LongAudio-RAG: Event-Grounded Question Answering over Multi-Hour Long Audio
by: Vakada, Naveen, et al.
Published: (2026)
by: Vakada, Naveen, et al.
Published: (2026)
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
by: Ghosh, Sreyan, et al.
Published: (2024)
by: Ghosh, Sreyan, et al.
Published: (2024)
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
by: Sakshi, S, et al.
Published: (2024)
by: Sakshi, S, et al.
Published: (2024)
AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
by: He, Peize, et al.
Published: (2025)
by: He, Peize, et al.
Published: (2025)
Extending Audio Context for Long-Form Understanding in Large Audio-Language Models
by: Chaichana, Yuatyong, et al.
Published: (2025)
by: Chaichana, Yuatyong, et al.
Published: (2025)
Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models
by: Gao, Kuofeng, et al.
Published: (2024)
by: Gao, Kuofeng, et al.
Published: (2024)
Resource-Efficient Reference-Free Evaluation of Audio Captions
by: Mahfuz, Rehana, et al.
Published: (2024)
by: Mahfuz, Rehana, et al.
Published: (2024)
AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives
by: Chen, Yanxi, et al.
Published: (2025)
by: Chen, Yanxi, et al.
Published: (2025)
Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation
by: Feng, Bo-Han, et al.
Published: (2026)
by: Feng, Bo-Han, et al.
Published: (2026)
Fish Audio S2 Technical Report
by: Liao, Shijia, et al.
Published: (2026)
by: Liao, Shijia, et al.
Published: (2026)
MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2026)
by: Yang, Chih-Kai, et al.
Published: (2026)
Audio ControlNet for Fine-Grained Audio Generation and Editing
by: Zhu, Haina, et al.
Published: (2026)
by: Zhu, Haina, et al.
Published: (2026)
Revealing the Role of Audio Channels in ASR Performance Degradation
by: Huang, Kuan-Tang, et al.
Published: (2025)
by: Huang, Kuan-Tang, et al.
Published: (2025)
SloPal: A 60-Million-Word Slovak Parliamentary Corpus with Aligned Speech and Fine-Tuned ASR Models
by: Božík, Erik, et al.
Published: (2025)
by: Božík, Erik, et al.
Published: (2025)
ORCA: Open-ended Response Correctness Assessment for Audio Question Answering
by: Sedláček, Šimon, et al.
Published: (2025)
by: Sedláček, Šimon, et al.
Published: (2025)
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
by: Ghosh, Sreyan, et al.
Published: (2023)
by: Ghosh, Sreyan, et al.
Published: (2023)
Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models
by: Xie, Zhifei, et al.
Published: (2025)
by: Xie, Zhifei, et al.
Published: (2025)
MedMosaic: A Challenging Large Scale Benchmark of Diverse Medical Audio
by: Rajgarhia, Harshit, et al.
Published: (2026)
by: Rajgarhia, Harshit, et al.
Published: (2026)
EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs
by: Lin, Liang, et al.
Published: (2026)
by: Lin, Liang, et al.
Published: (2026)
CAF-Score: Calibrating CLAP with LALMs for Reference-free Audio Captioning Evaluation
by: Lee, Insung, et al.
Published: (2026)
by: Lee, Insung, et al.
Published: (2026)
Multi-class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum
by: Zhang, Yuanming, et al.
Published: (2024)
by: Zhang, Yuanming, et al.
Published: (2024)
Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding
by: Naveen, Vakada, et al.
Published: (2024)
by: Naveen, Vakada, et al.
Published: (2024)
Bona fide Cross Testing Reveals Weak Spot in Audio Deepfake Detection Systems
by: Kwok, Chin Yuen, et al.
Published: (2025)
by: Kwok, Chin Yuen, et al.
Published: (2025)
MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks
by: Niu, Yadong, et al.
Published: (2025)
by: Niu, Yadong, et al.
Published: (2025)
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
by: Yan, Canxiang, et al.
Published: (2025)
by: Yan, Canxiang, et al.
Published: (2025)
FLM-Audio: Natural Monologues Improves Native Full-Duplex Chatbots via Dual Training
by: Yao, Yiqun, et al.
Published: (2025)
by: Yao, Yiqun, et al.
Published: (2025)
Do Compact SSL Backbones Matter for Audio Deepfake Detection? A Controlled Study with RAPTOR
by: Kulkarni, Ajinkya, et al.
Published: (2026)
by: Kulkarni, Ajinkya, et al.
Published: (2026)
CMDAR: A Chinese Multi-scene Dynamic Audio Reasoning Benchmark with Diverse Challenges
by: Li, Hui, et al.
Published: (2025)
by: Li, Hui, et al.
Published: (2025)
KoALa-Bench: Evaluating Large Audio Language Models on Korean Speech Understanding and Faithfulness
by: Kim, Jinyoung, et al.
Published: (2026)
by: Kim, Jinyoung, et al.
Published: (2026)
Harmonic Reasoning in Large Language Models
by: Kruspe, Anna
Published: (2024)
by: Kruspe, Anna
Published: (2024)
All That Glitters Is Not Audio: Rethinking Text Priors and Audio Reliance in Audio-Language Evaluation
by: Foo, Leonardo Haw-Yang, et al.
Published: (2026)
by: Foo, Leonardo Haw-Yang, et al.
Published: (2026)
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
by: Huang, Ailin, et al.
Published: (2025)
by: Huang, Ailin, et al.
Published: (2025)
BAT: Learning to Reason about Spatial Sounds with Large Language Models
by: Zheng, Zhisheng, et al.
Published: (2024)
by: Zheng, Zhisheng, et al.
Published: (2024)
SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models
by: Wang, Qiaolin, et al.
Published: (2025)
by: Wang, Qiaolin, et al.
Published: (2025)
Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models
by: Goel, Arushi, et al.
Published: (2025)
by: Goel, Arushi, et al.
Published: (2025)
AudioBERT: Audio Knowledge Augmented Language Model
by: Ok, Hyunjong, et al.
Published: (2024)
by: Ok, Hyunjong, et al.
Published: (2024)
Aligning Audio Captions with Human Preferences
by: Hegde, Kartik, et al.
Published: (2025)
by: Hegde, Kartik, et al.
Published: (2025)
MAEB: Massive Audio Embedding Benchmark
by: Assadi, Adnan El, et al.
Published: (2026)
by: Assadi, Adnan El, et al.
Published: (2026)
Similar Items
-
Spatial Audio Question Answering and Reasoning on Dynamic Source Movements
by: Sridhar, Arvind Krishna, et al.
Published: (2026) -
Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models
by: Sridhar, Arvind Krishna, et al.
Published: (2024) -
LongAudio-RAG: Event-Grounded Question Answering over Multi-Hour Long Audio
by: Vakada, Naveen, et al.
Published: (2026) -
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
by: Ghosh, Sreyan, et al.
Published: (2024) -
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
by: Sakshi, S, et al.
Published: (2024)