Saved in:
| Main Authors: | Sridhar, Arvind Krishna, Guo, Yinyi, Visser, Erik |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.16334 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Spatial Audio Motion Understanding and Reasoning
by: Sridhar, Arvind Krishna, et al.
Published: (2025)
by: Sridhar, Arvind Krishna, et al.
Published: (2025)
Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models
by: Sridhar, Arvind Krishna, et al.
Published: (2024)
by: Sridhar, Arvind Krishna, et al.
Published: (2024)
LongAudio-RAG: Event-Grounded Question Answering over Multi-Hour Long Audio
by: Vakada, Naveen, et al.
Published: (2026)
by: Vakada, Naveen, et al.
Published: (2026)
Resource-Efficient Reference-Free Evaluation of Audio Captions
by: Mahfuz, Rehana, et al.
Published: (2024)
by: Mahfuz, Rehana, et al.
Published: (2024)
RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering
by: Bertolino, Gaia A., et al.
Published: (2026)
by: Bertolino, Gaia A., et al.
Published: (2026)
Aligning Audio Captions with Human Preferences
by: Hegde, Kartik, et al.
Published: (2025)
by: Hegde, Kartik, et al.
Published: (2025)
ORCA: Open-ended Response Correctness Assessment for Audio Question Answering
by: Sedláček, Šimon, et al.
Published: (2025)
by: Sedláček, Šimon, et al.
Published: (2025)
Multi-Domain Audio Question Answering Benchmark Toward Acoustic Content Reasoning
by: Yang, Chao-Han Huck, et al.
Published: (2025)
by: Yang, Chao-Han Huck, et al.
Published: (2025)
Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding
by: Naveen, Vakada, et al.
Published: (2024)
by: Naveen, Vakada, et al.
Published: (2024)
OWL: Geometry-Aware Spatial Reasoning for Audio Large Language Models
by: Biswas, Subrata, et al.
Published: (2025)
by: Biswas, Subrata, et al.
Published: (2025)
SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering
by: Yang, Zhe, et al.
Published: (2024)
by: Yang, Zhe, et al.
Published: (2024)
DGFNet: End-to-End Audio-Visual Source Separation Based on Dynamic Gating Fusion
by: Yu, Yinfeng, et al.
Published: (2025)
by: Yu, Yinfeng, et al.
Published: (2025)
AQAScore: Evaluating Semantic Alignment in Text-to-Audio Generation via Audio Question Answering
by: Kuan, Chun-Yi, et al.
Published: (2026)
by: Kuan, Chun-Yi, et al.
Published: (2026)
AQUALLM: Audio Question Answering Data Generation Using Large Language Models
by: Behera, Swarup Ranjan, et al.
Published: (2023)
by: Behera, Swarup Ranjan, et al.
Published: (2023)
Rebellion: Noise-Robust Reasoning Training for Audio Reasoning Models
by: Huang, Tiansheng, et al.
Published: (2025)
by: Huang, Tiansheng, et al.
Published: (2025)
Audio-Maestro: Enhancing Large Audio-Language Models with Tool-Augmented Reasoning
by: Lee, Kuan-Yi, et al.
Published: (2025)
by: Lee, Kuan-Yi, et al.
Published: (2025)
Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering
by: Li, Gang, et al.
Published: (2025)
by: Li, Gang, et al.
Published: (2025)
DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models
by: Wilkinghoff, Kevin, et al.
Published: (2025)
by: Wilkinghoff, Kevin, et al.
Published: (2025)
The World is Not Mono: Enabling Spatial Understanding in Large Audio-Language Models
by: You, Yuhuan, et al.
Published: (2026)
by: You, Yuhuan, et al.
Published: (2026)
Audio-Guided Dynamic Modality Fusion with Stereo-Aware Attention for Audio-Visual Navigation
by: Li, Jia, et al.
Published: (2025)
by: Li, Jia, et al.
Published: (2025)
AQUA-Bench: Beyond Finding Answers to Knowing When There Are None in Audio Question Answering
by: Kuan, Chun-Yi, et al.
Published: (2026)
by: Kuan, Chun-Yi, et al.
Published: (2026)
Audio Spatially-Guided Fusion for Audio-Visual Navigation
by: Zhou, Xinyu, et al.
Published: (2026)
by: Zhou, Xinyu, et al.
Published: (2026)
Audio Source Separation in Reverberant Environments using $β$-divergence based Nonnegative Factorization
by: Fakhry, Mahmoud, et al.
Published: (2026)
by: Fakhry, Mahmoud, et al.
Published: (2026)
AuTAgent: A Reinforcement Learning Framework for Tool-Augmented Audio Reasoning
by: Tong, Siqian, et al.
Published: (2026)
by: Tong, Siqian, et al.
Published: (2026)
Interpretable All-Type Audio Deepfake Detection with Audio LLMs via Frequency-Time Reinforcement Learning
by: Xie, Yuankun, et al.
Published: (2026)
by: Xie, Yuankun, et al.
Published: (2026)
Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering
by: Glazer, Neta, et al.
Published: (2026)
by: Glazer, Neta, et al.
Published: (2026)
ViSAGe: Video-to-Spatial Audio Generation
by: Kim, Jaeyeon, et al.
Published: (2025)
by: Kim, Jaeyeon, et al.
Published: (2025)
Patient-Level Multimodal Question Answering from Multi-Site Auscultation Recordings
by: Wu, Fan, et al.
Published: (2026)
by: Wu, Fan, et al.
Published: (2026)
Spatial-Aware Conditioned Fusion for Audio-Visual Navigation
by: Wu, Shaohang, et al.
Published: (2026)
by: Wu, Shaohang, et al.
Published: (2026)
In-the-wild Audio Spatialization with Flexible Text-guided Localization
by: Pan, Tianrui, et al.
Published: (2025)
by: Pan, Tianrui, et al.
Published: (2025)
DegDiT: Controllable Audio Generation with Dynamic Event Graph Guided Diffusion Transformer
by: Liu, Yisu, et al.
Published: (2025)
by: Liu, Yisu, et al.
Published: (2025)
Eureka-Audio: Triggering Audio Intelligence in Compact Language Models
by: Zhang, Dan, et al.
Published: (2026)
by: Zhang, Dan, et al.
Published: (2026)
AudioMoG: Guiding Audio Generation with Mixture-of-Guidance
by: Wang, Junyou, et al.
Published: (2025)
by: Wang, Junyou, et al.
Published: (2025)
DialogGraph-LLM: Graph-Informed LLMs for End-to-End Audio Dialogue Intent Recognition
by: Liu, HongYu, et al.
Published: (2025)
by: Liu, HongYu, et al.
Published: (2025)
AudioRouter: Data Efficient Audio Understanding via RL based Dual Reasoning
by: Chen, Liyang, et al.
Published: (2026)
by: Chen, Liyang, et al.
Published: (2026)
The Sonar Moment: Benchmarking Audio-Language Models in Audio Geo-Localization
by: Zhang, Ruixing, et al.
Published: (2026)
by: Zhang, Ruixing, et al.
Published: (2026)
TTMBA: Towards Text To Multiple Sources Binaural Audio Generation
by: He, Yuxuan, et al.
Published: (2025)
by: He, Yuxuan, et al.
Published: (2025)
MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
by: Li, Xiquan, et al.
Published: (2025)
by: Li, Xiquan, et al.
Published: (2025)
AudioMotionBench: Evaluating Auditory Motion Perception in Audio LLMs
by: Sun, Zhe, et al.
Published: (2025)
by: Sun, Zhe, et al.
Published: (2025)
Stable Audio 3
by: Evans, Zach, et al.
Published: (2026)
by: Evans, Zach, et al.
Published: (2026)
Similar Items
-
Spatial Audio Motion Understanding and Reasoning
by: Sridhar, Arvind Krishna, et al.
Published: (2025) -
Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models
by: Sridhar, Arvind Krishna, et al.
Published: (2024) -
LongAudio-RAG: Event-Grounded Question Answering over Multi-Hour Long Audio
by: Vakada, Naveen, et al.
Published: (2026) -
Resource-Efficient Reference-Free Evaluation of Audio Captions
by: Mahfuz, Rehana, et al.
Published: (2024) -
RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering
by: Bertolino, Gaia A., et al.
Published: (2026)