Saved in:
| Main Authors: | Galougah, Siminfar Samakoush, Pulijala, Pranav, Duraiswami, Ramani |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.02205 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Applying Automatic Differentiation to Optimize Differential Microphone Array Designs
by: Galougah, Siminfar Samakoush, et al.
Published: (2024)
by: Galougah, Siminfar Samakoush, et al.
Published: (2024)
AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning
by: Galougah, Siminfar Samakoush, et al.
Published: (2025)
by: Galougah, Siminfar Samakoush, et al.
Published: (2025)
Spectrum Coexistence, Network Dimensioning, and Cell-Free Architectures in 5G and 5G-Advanced Wireless Networks
by: Galougah, Siminfar Samakoush
Published: (2026)
by: Galougah, Siminfar Samakoush
Published: (2026)
Room Impulse Response Synthesis via Differentiable Feedback Delay Networks for Efficient Spatial Audio Rendering
by: Gerami, Armin, et al.
Published: (2025)
by: Gerami, Armin, et al.
Published: (2025)
Biomimetic Frontend for Differentiable Audio Processing
by: Famularo, Ruolan Leslie, et al.
Published: (2024)
by: Famularo, Ruolan Leslie, et al.
Published: (2024)
Audio Hallucination Attacks: Probing the Reliability of Large Audio Language Models
by: Seth, Ashish, et al.
Published: (2026)
by: Seth, Ashish, et al.
Published: (2026)
TSPE: Task-Specific Prompt Ensemble for Improved Zero-Shot Audio Classification
by: Anand, Nishit, et al.
Published: (2024)
by: Anand, Nishit, et al.
Published: (2024)
RECAP: Retrieval-Augmented Audio Captioning
by: Ghosh, Sreyan, et al.
Published: (2023)
by: Ghosh, Sreyan, et al.
Published: (2023)
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
by: Sakshi, S, et al.
Published: (2024)
by: Sakshi, S, et al.
Published: (2024)
ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
by: Ghosh, Sreyan, et al.
Published: (2024)
by: Ghosh, Sreyan, et al.
Published: (2024)
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
by: Ghosh, Sreyan, et al.
Published: (2024)
by: Ghosh, Sreyan, et al.
Published: (2024)
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
by: Ghosh, Sreyan, et al.
Published: (2023)
by: Ghosh, Sreyan, et al.
Published: (2023)
Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models
by: Goel, Arushi, et al.
Published: (2025)
by: Goel, Arushi, et al.
Published: (2025)
ORCA: Open-ended Response Correctness Assessment for Audio Question Answering
by: Sedláček, Šimon, et al.
Published: (2025)
by: Sedláček, Šimon, et al.
Published: (2025)
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music
by: Ghosh, Sreyan, et al.
Published: (2026)
by: Ghosh, Sreyan, et al.
Published: (2026)
Multi-Domain Audio Question Answering Benchmark Toward Acoustic Content Reasoning
by: Yang, Chao-Han Huck, et al.
Published: (2025)
by: Yang, Chao-Han Huck, et al.
Published: (2025)
Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings
by: Wang, Ruoyu, et al.
Published: (2024)
by: Wang, Ruoyu, et al.
Published: (2024)
Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues
by: Chen, Tianxiang, et al.
Published: (2024)
by: Chen, Tianxiang, et al.
Published: (2024)
Cinematic Audio Source Separation Using Visual Cues
by: Zhang, Kang, et al.
Published: (2026)
by: Zhang, Kang, et al.
Published: (2026)
RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios with Missing Visual Cues
by: Pan, Tianrui, et al.
Published: (2024)
by: Pan, Tianrui, et al.
Published: (2024)
MUKA: Multi Kernel Audio Adaptation Of Audio-Language Models
by: Bensaid, Reda, et al.
Published: (2026)
by: Bensaid, Reda, et al.
Published: (2026)
Auptimize: Optimal Placement of Spatial Audio Cues for Extended Reality
by: Cho, Hyunsung, et al.
Published: (2024)
by: Cho, Hyunsung, et al.
Published: (2024)
SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation
by: Pham, Kien T., et al.
Published: (2025)
by: Pham, Kien T., et al.
Published: (2025)
Sonic4D: Spatial Audio Generation for Immersive 4D Scene Exploration
by: Xie, Siyi, et al.
Published: (2025)
by: Xie, Siyi, et al.
Published: (2025)
AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual Cues
by: Zhou, Dingkun, et al.
Published: (2025)
by: Zhou, Dingkun, et al.
Published: (2025)
A Comprehensive Corpus of Biomechanically Constrained Piano Chords: Generation, Analysis, and Implications for Voicing and Psychoacoustics
by: Ramani, Mahesh
Published: (2026)
by: Ramani, Mahesh
Published: (2026)
MMEDIT: A Unified Framework for Multi-Type Audio Editing via Audio Language Model
by: Tao, Ye, et al.
Published: (2025)
by: Tao, Ye, et al.
Published: (2025)
Audio-Visual Speech Enhancement in Noisy Environments via Emotion-Based Contextual Cues
by: Hussain, Tassadaq, et al.
Published: (2024)
by: Hussain, Tassadaq, et al.
Published: (2024)
NablAFx: A Framework for Differentiable Black-box and Gray-box Modeling of Audio Effects
by: Comunità, Marco, et al.
Published: (2025)
by: Comunità, Marco, et al.
Published: (2025)
A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation
by: Wang, Jingyuan, et al.
Published: (2024)
by: Wang, Jingyuan, et al.
Published: (2024)
A Sensitivity Analysis of Multi-Event Audio Grounding in Audio LLMs
by: Lee, Taehan, et al.
Published: (2026)
by: Lee, Taehan, et al.
Published: (2026)
SIREN: Spatially-Informed Reconstruction of Binaural Audio with Vision
by: Song, Mingyeong, et al.
Published: (2026)
by: Song, Mingyeong, et al.
Published: (2026)
Fundamental Survey on Neuromorphic Based Audio Classification
by: Basu, Amlan, et al.
Published: (2025)
by: Basu, Amlan, et al.
Published: (2025)
Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation
by: Lokegaonkar, Vaibhavi, et al.
Published: (2026)
by: Lokegaonkar, Vaibhavi, et al.
Published: (2026)
Exploring Audio Cues for Enhanced Test-Time Video Model Adaptation
by: Zeng, Runhao, et al.
Published: (2025)
by: Zeng, Runhao, et al.
Published: (2025)
BreathNet: Generalizable Audio Deepfake Detection via Breath-Cue-Guided Feature Refinement
by: Ye, Zhe, et al.
Published: (2026)
by: Ye, Zhe, et al.
Published: (2026)
A Lightweight Fourier-based Network for Binaural Speech Enhancement with Spatial Cue Preservation
by: Lu, Xikun, et al.
Published: (2025)
by: Lu, Xikun, et al.
Published: (2025)
WTFormer: A Wavelet Conformer Network for MIMO Speech Enhancement with Spatial Cues Peservation
by: Han, Lu, et al.
Published: (2025)
by: Han, Lu, et al.
Published: (2025)
Can Large Language Models Understand Spatial Audio?
by: Tang, Changli, et al.
Published: (2024)
by: Tang, Changli, et al.
Published: (2024)
Universal Spatial Audio Transcoder
by: Sagasti, Amaia, et al.
Published: (2024)
by: Sagasti, Amaia, et al.
Published: (2024)
Similar Items
-
Applying Automatic Differentiation to Optimize Differential Microphone Array Designs
by: Galougah, Siminfar Samakoush, et al.
Published: (2024) -
AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning
by: Galougah, Siminfar Samakoush, et al.
Published: (2025) -
Spectrum Coexistence, Network Dimensioning, and Cell-Free Architectures in 5G and 5G-Advanced Wireless Networks
by: Galougah, Siminfar Samakoush
Published: (2026) -
Room Impulse Response Synthesis via Differentiable Feedback Delay Networks for Efficient Spatial Audio Rendering
by: Gerami, Armin, et al.
Published: (2025) -
Biomimetic Frontend for Differentiable Audio Processing
by: Famularo, Ruolan Leslie, et al.
Published: (2024)