:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Seth, Ashish, Kumar, Sonal, Selvakumar, Ramaneswaran, Anand, Nishit, Tyagi, Utkarsh, Seetharaman, Prem, Duraiswami, Ramani, Manocha, Dinesh
Format:	Preprint
Published:	2026
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2603.29263
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
by: Sakshi, S, et al.
Published: (2024)

Do Audio-Language Models Understand Linguistic Variations?
by: Selvakumar, Ramaneswaran, et al.
Published: (2024)

TSPE: Task-Specific Prompt Ensemble for Improved Zero-Shot Audio Classification
by: Anand, Nishit, et al.
Published: (2024)

PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification
by: Seth, Ashish, et al.
Published: (2024)

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
by: Ghosh, Sreyan, et al.
Published: (2023)

EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding
by: Seth, Ashish, et al.
Published: (2025)

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
by: Ghosh, Sreyan, et al.
Published: (2024)

MultiVox: A Benchmark for Evaluating Voice Assistants for Multimodal Interactions
by: Selvakumar, Ramaneswaran, et al.
Published: (2025)

SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation
by: Kumar, Sonal, et al.
Published: (2024)

EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
by: Seth, Ashish, et al.
Published: (2024)

Do Audio-Visual Large Language Models Really See and Hear?
by: Selvakumar, Ramaneswaran, et al.
Published: (2026)

RECAP: Retrieval-Augmented Audio Captioning
by: Ghosh, Sreyan, et al.
Published: (2023)

ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
by: Ghosh, Sreyan, et al.
Published: (2024)

Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models
by: Goel, Arushi, et al.
Published: (2025)

TAC: Timestamped Audio Captioning
by: Kumar, Sonal, et al.
Published: (2026)

LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
by: Ghosh, Sreyan, et al.
Published: (2024)

Generative Audio Extension and Morphing
by: Seetharaman, Prem, et al.
Published: (2026)

Room Impulse Response Synthesis via Differentiable Feedback Delay Networks for Efficient Spatial Audio Rendering
by: Gerami, Armin, et al.
Published: (2025)

Taming Audio VAEs via Target-KL Regularization
by: Seetharaman, Prem, et al.
Published: (2026)

Analytical Exploration of Spatial Audio Cues: A Differentiable Multi-Sphere Scattering Model
by: Galougah, Siminfar Samakoush, et al.
Published: (2026)

Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music
by: Ghosh, Sreyan, et al.
Published: (2026)

AV-RIR: Audio-Visual Room Impulse Response Estimation
by: Ratnarajah, Anton, et al.
Published: (2023)

Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
by: Ghosh, Sreyan, et al.
Published: (2025)

Audiocards: Structured Metadata Improves Audio Language Models For Sound Design
by: Sridhar, Sripathi, et al.
Published: (2026)

AudioChat: Unified Audio Storytelling, Editing, and Understanding with Transfusion Forcing
by: Chen, William, et al.
Published: (2026)

Learning Illumination Control in Diffusion Models
by: Anand, Nishit, et al.
Published: (2026)

FLAM: Frame-Wise Language-Audio Modeling
by: Wu, Yusong, et al.
Published: (2025)

Biomimetic Frontend for Differentiable Audio Processing
by: Famularo, Ruolan Leslie, et al.
Published: (2024)

Code Drift: Towards Idempotent Neural Audio Codecs
by: O'Reilly, Patrick, et al.
Published: (2024)

Multi-Domain Audio Question Answering Benchmark Toward Acoustic Content Reasoning
by: Yang, Chao-Han Huck, et al.
Published: (2025)

The Rhythm In Anything: Audio-Prompted Drums Generation with Masked Language Modeling
by: O'Reilly, Patrick, et al.
Published: (2025)

SPUR: A Plug-and-Play Framework for Integrating Spatial Audio Understanding and Reasoning into Large Audio-Language Models
by: Sakshi, S, et al.
Published: (2025)

ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions
by: Ghosh, Sreyan, et al.
Published: (2024)

CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP
by: Evuru, Chandra Kiran Reddy, et al.
Published: (2024)

Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations
by: García, Hugo Flores, et al.
Published: (2024)

Do Vision-Language Models Understand Compound Nouns?
by: Kumar, Sonal, et al.
Published: (2024)

ProSE: Diffusion Priors for Speech Enhancement
by: Kumar, Sonal, et al.
Published: (2025)

MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence
by: Kumar, Sonal, et al.
Published: (2025)

Applying Automatic Differentiation to Optimize Differential Microphone Array Designs
by: Galougah, Siminfar Samakoush, et al.
Published: (2024)

HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models
by: Zhao, Feiyu, et al.
Published: (2026)