Saved in:
| Main Authors: | Seth, Ashish, Kumar, Sonal, Selvakumar, Ramaneswaran, Anand, Nishit, Tyagi, Utkarsh, Seetharaman, Prem, Duraiswami, Ramani, Manocha, Dinesh |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.29263 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
by: Sakshi, S, et al.
Published: (2024)
by: Sakshi, S, et al.
Published: (2024)
Do Audio-Language Models Understand Linguistic Variations?
by: Selvakumar, Ramaneswaran, et al.
Published: (2024)
by: Selvakumar, Ramaneswaran, et al.
Published: (2024)
TSPE: Task-Specific Prompt Ensemble for Improved Zero-Shot Audio Classification
by: Anand, Nishit, et al.
Published: (2024)
by: Anand, Nishit, et al.
Published: (2024)
PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification
by: Seth, Ashish, et al.
Published: (2024)
by: Seth, Ashish, et al.
Published: (2024)
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
by: Ghosh, Sreyan, et al.
Published: (2023)
by: Ghosh, Sreyan, et al.
Published: (2023)
EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding
by: Seth, Ashish, et al.
Published: (2025)
by: Seth, Ashish, et al.
Published: (2025)
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
by: Ghosh, Sreyan, et al.
Published: (2024)
by: Ghosh, Sreyan, et al.
Published: (2024)
MultiVox: A Benchmark for Evaluating Voice Assistants for Multimodal Interactions
by: Selvakumar, Ramaneswaran, et al.
Published: (2025)
by: Selvakumar, Ramaneswaran, et al.
Published: (2025)
SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation
by: Kumar, Sonal, et al.
Published: (2024)
by: Kumar, Sonal, et al.
Published: (2024)
EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
by: Seth, Ashish, et al.
Published: (2024)
by: Seth, Ashish, et al.
Published: (2024)
Do Audio-Visual Large Language Models Really See and Hear?
by: Selvakumar, Ramaneswaran, et al.
Published: (2026)
by: Selvakumar, Ramaneswaran, et al.
Published: (2026)
RECAP: Retrieval-Augmented Audio Captioning
by: Ghosh, Sreyan, et al.
Published: (2023)
by: Ghosh, Sreyan, et al.
Published: (2023)
ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
by: Ghosh, Sreyan, et al.
Published: (2024)
by: Ghosh, Sreyan, et al.
Published: (2024)
Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models
by: Goel, Arushi, et al.
Published: (2025)
by: Goel, Arushi, et al.
Published: (2025)
TAC: Timestamped Audio Captioning
by: Kumar, Sonal, et al.
Published: (2026)
by: Kumar, Sonal, et al.
Published: (2026)
LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
by: Ghosh, Sreyan, et al.
Published: (2024)
by: Ghosh, Sreyan, et al.
Published: (2024)
Generative Audio Extension and Morphing
by: Seetharaman, Prem, et al.
Published: (2026)
by: Seetharaman, Prem, et al.
Published: (2026)
Room Impulse Response Synthesis via Differentiable Feedback Delay Networks for Efficient Spatial Audio Rendering
by: Gerami, Armin, et al.
Published: (2025)
by: Gerami, Armin, et al.
Published: (2025)
Taming Audio VAEs via Target-KL Regularization
by: Seetharaman, Prem, et al.
Published: (2026)
by: Seetharaman, Prem, et al.
Published: (2026)
Analytical Exploration of Spatial Audio Cues: A Differentiable Multi-Sphere Scattering Model
by: Galougah, Siminfar Samakoush, et al.
Published: (2026)
by: Galougah, Siminfar Samakoush, et al.
Published: (2026)
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music
by: Ghosh, Sreyan, et al.
Published: (2026)
by: Ghosh, Sreyan, et al.
Published: (2026)
AV-RIR: Audio-Visual Room Impulse Response Estimation
by: Ratnarajah, Anton, et al.
Published: (2023)
by: Ratnarajah, Anton, et al.
Published: (2023)
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
by: Ghosh, Sreyan, et al.
Published: (2025)
by: Ghosh, Sreyan, et al.
Published: (2025)
Audiocards: Structured Metadata Improves Audio Language Models For Sound Design
by: Sridhar, Sripathi, et al.
Published: (2026)
by: Sridhar, Sripathi, et al.
Published: (2026)
AudioChat: Unified Audio Storytelling, Editing, and Understanding with Transfusion Forcing
by: Chen, William, et al.
Published: (2026)
by: Chen, William, et al.
Published: (2026)
Learning Illumination Control in Diffusion Models
by: Anand, Nishit, et al.
Published: (2026)
by: Anand, Nishit, et al.
Published: (2026)
FLAM: Frame-Wise Language-Audio Modeling
by: Wu, Yusong, et al.
Published: (2025)
by: Wu, Yusong, et al.
Published: (2025)
Biomimetic Frontend for Differentiable Audio Processing
by: Famularo, Ruolan Leslie, et al.
Published: (2024)
by: Famularo, Ruolan Leslie, et al.
Published: (2024)
Code Drift: Towards Idempotent Neural Audio Codecs
by: O'Reilly, Patrick, et al.
Published: (2024)
by: O'Reilly, Patrick, et al.
Published: (2024)
Multi-Domain Audio Question Answering Benchmark Toward Acoustic Content Reasoning
by: Yang, Chao-Han Huck, et al.
Published: (2025)
by: Yang, Chao-Han Huck, et al.
Published: (2025)
The Rhythm In Anything: Audio-Prompted Drums Generation with Masked Language Modeling
by: O'Reilly, Patrick, et al.
Published: (2025)
by: O'Reilly, Patrick, et al.
Published: (2025)
SPUR: A Plug-and-Play Framework for Integrating Spatial Audio Understanding and Reasoning into Large Audio-Language Models
by: Sakshi, S, et al.
Published: (2025)
by: Sakshi, S, et al.
Published: (2025)
ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions
by: Ghosh, Sreyan, et al.
Published: (2024)
by: Ghosh, Sreyan, et al.
Published: (2024)
CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP
by: Evuru, Chandra Kiran Reddy, et al.
Published: (2024)
by: Evuru, Chandra Kiran Reddy, et al.
Published: (2024)
Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations
by: García, Hugo Flores, et al.
Published: (2024)
by: García, Hugo Flores, et al.
Published: (2024)
Do Vision-Language Models Understand Compound Nouns?
by: Kumar, Sonal, et al.
Published: (2024)
by: Kumar, Sonal, et al.
Published: (2024)
ProSE: Diffusion Priors for Speech Enhancement
by: Kumar, Sonal, et al.
Published: (2025)
by: Kumar, Sonal, et al.
Published: (2025)
MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence
by: Kumar, Sonal, et al.
Published: (2025)
by: Kumar, Sonal, et al.
Published: (2025)
Applying Automatic Differentiation to Optimize Differential Microphone Array Designs
by: Galougah, Siminfar Samakoush, et al.
Published: (2024)
by: Galougah, Siminfar Samakoush, et al.
Published: (2024)
HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models
by: Zhao, Feiyu, et al.
Published: (2026)
by: Zhao, Feiyu, et al.
Published: (2026)
Similar Items
-
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
by: Sakshi, S, et al.
Published: (2024) -
Do Audio-Language Models Understand Linguistic Variations?
by: Selvakumar, Ramaneswaran, et al.
Published: (2024) -
TSPE: Task-Specific Prompt Ensemble for Improved Zero-Shot Audio Classification
by: Anand, Nishit, et al.
Published: (2024) -
PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification
by: Seth, Ashish, et al.
Published: (2024) -
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
by: Ghosh, Sreyan, et al.
Published: (2023)