:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Galougah, Siminfar Samakoush, Pulijala, Pranav, Duraiswami, Ramani
Format:	Preprint
Published:	2026
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2603.02205
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Applying Automatic Differentiation to Optimize Differential Microphone Array Designs
by: Galougah, Siminfar Samakoush, et al.
Published: (2024)

AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning
by: Galougah, Siminfar Samakoush, et al.
Published: (2025)

Spectrum Coexistence, Network Dimensioning, and Cell-Free Architectures in 5G and 5G-Advanced Wireless Networks
by: Galougah, Siminfar Samakoush
Published: (2026)

Room Impulse Response Synthesis via Differentiable Feedback Delay Networks for Efficient Spatial Audio Rendering
by: Gerami, Armin, et al.
Published: (2025)

Biomimetic Frontend for Differentiable Audio Processing
by: Famularo, Ruolan Leslie, et al.
Published: (2024)

Audio Hallucination Attacks: Probing the Reliability of Large Audio Language Models
by: Seth, Ashish, et al.
Published: (2026)

TSPE: Task-Specific Prompt Ensemble for Improved Zero-Shot Audio Classification
by: Anand, Nishit, et al.
Published: (2024)

RECAP: Retrieval-Augmented Audio Captioning
by: Ghosh, Sreyan, et al.
Published: (2023)

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
by: Sakshi, S, et al.
Published: (2024)

ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
by: Ghosh, Sreyan, et al.
Published: (2024)

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
by: Ghosh, Sreyan, et al.
Published: (2024)

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
by: Ghosh, Sreyan, et al.
Published: (2023)

Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models
by: Goel, Arushi, et al.
Published: (2025)

ORCA: Open-ended Response Correctness Assessment for Audio Question Answering
by: Sedláček, Šimon, et al.
Published: (2025)

Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music
by: Ghosh, Sreyan, et al.
Published: (2026)

Multi-Domain Audio Question Answering Benchmark Toward Acoustic Content Reasoning
by: Yang, Chao-Han Huck, et al.
Published: (2025)

Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings
by: Wang, Ruoyu, et al.
Published: (2024)

Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues
by: Chen, Tianxiang, et al.
Published: (2024)

Cinematic Audio Source Separation Using Visual Cues
by: Zhang, Kang, et al.
Published: (2026)

RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios with Missing Visual Cues
by: Pan, Tianrui, et al.
Published: (2024)

MUKA: Multi Kernel Audio Adaptation Of Audio-Language Models
by: Bensaid, Reda, et al.
Published: (2026)

Auptimize: Optimal Placement of Spatial Audio Cues for Extended Reality
by: Cho, Hyunsung, et al.
Published: (2024)

SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation
by: Pham, Kien T., et al.
Published: (2025)

Sonic4D: Spatial Audio Generation for Immersive 4D Scene Exploration
by: Xie, Siyi, et al.
Published: (2025)

AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual Cues
by: Zhou, Dingkun, et al.
Published: (2025)

A Comprehensive Corpus of Biomechanically Constrained Piano Chords: Generation, Analysis, and Implications for Voicing and Psychoacoustics
by: Ramani, Mahesh
Published: (2026)

MMEDIT: A Unified Framework for Multi-Type Audio Editing via Audio Language Model
by: Tao, Ye, et al.
Published: (2025)

Audio-Visual Speech Enhancement in Noisy Environments via Emotion-Based Contextual Cues
by: Hussain, Tassadaq, et al.
Published: (2024)

NablAFx: A Framework for Differentiable Black-box and Gray-box Modeling of Audio Effects
by: Comunità, Marco, et al.
Published: (2025)

A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation
by: Wang, Jingyuan, et al.
Published: (2024)

A Sensitivity Analysis of Multi-Event Audio Grounding in Audio LLMs
by: Lee, Taehan, et al.
Published: (2026)

SIREN: Spatially-Informed Reconstruction of Binaural Audio with Vision
by: Song, Mingyeong, et al.
Published: (2026)

Fundamental Survey on Neuromorphic Based Audio Classification
by: Basu, Amlan, et al.
Published: (2025)

Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation
by: Lokegaonkar, Vaibhavi, et al.
Published: (2026)

Exploring Audio Cues for Enhanced Test-Time Video Model Adaptation
by: Zeng, Runhao, et al.
Published: (2025)

BreathNet: Generalizable Audio Deepfake Detection via Breath-Cue-Guided Feature Refinement
by: Ye, Zhe, et al.
Published: (2026)

A Lightweight Fourier-based Network for Binaural Speech Enhancement with Spatial Cue Preservation
by: Lu, Xikun, et al.
Published: (2025)

WTFormer: A Wavelet Conformer Network for MIMO Speech Enhancement with Spatial Cues Peservation
by: Han, Lu, et al.
Published: (2025)

Can Large Language Models Understand Spatial Audio?
by: Tang, Changli, et al.
Published: (2024)

Universal Spatial Audio Transcoder
by: Sagasti, Amaia, et al.
Published: (2024)