Saved in:
| Main Authors: | Sridhar, Sripathi, Seetharaman, Prem, Nieto, Oriol, Cartwright, Mark, Salamon, Justin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.13835 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Generative Audio Extension and Morphing
by: Seetharaman, Prem, et al.
Published: (2026)
by: Seetharaman, Prem, et al.
Published: (2026)
SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation
by: Kumar, Sonal, et al.
Published: (2024)
by: Kumar, Sonal, et al.
Published: (2024)
Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations
by: García, Hugo Flores, et al.
Published: (2024)
by: García, Hugo Flores, et al.
Published: (2024)
Compositional Audio Representation Learning
by: Sridhar, Sripathi, et al.
Published: (2024)
by: Sridhar, Sripathi, et al.
Published: (2024)
AudioChat: Unified Audio Storytelling, Editing, and Understanding with Transfusion Forcing
by: Chen, William, et al.
Published: (2026)
by: Chen, William, et al.
Published: (2026)
Mix2Morph: Learning Sound Morphing from Noisy Mixes
by: Chu, Annie, et al.
Published: (2026)
by: Chu, Annie, et al.
Published: (2026)
FLAM: Frame-Wise Language-Audio Modeling
by: Wu, Yusong, et al.
Published: (2025)
by: Wu, Yusong, et al.
Published: (2025)
Video-Guided Foley Sound Generation with Multimodal Controls
by: Chen, Ziyang, et al.
Published: (2024)
by: Chen, Ziyang, et al.
Published: (2024)
TAC: Timestamped Audio Captioning
by: Kumar, Sonal, et al.
Published: (2026)
by: Kumar, Sonal, et al.
Published: (2026)
PromptSep: Generative Audio Separation via Multimodal Prompting
by: Wen, Yutong, et al.
Published: (2025)
by: Wen, Yutong, et al.
Published: (2025)
Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning
by: Manco, Ilaria, et al.
Published: (2024)
by: Manco, Ilaria, et al.
Published: (2024)
Taming Audio VAEs via Target-KL Regularization
by: Seetharaman, Prem, et al.
Published: (2026)
by: Seetharaman, Prem, et al.
Published: (2026)
Audio Hallucination Attacks: Probing the Reliability of Large Audio Language Models
by: Seth, Ashish, et al.
Published: (2026)
by: Seth, Ashish, et al.
Published: (2026)
The Rhythm In Anything: Audio-Prompted Drums Generation with Masked Language Modeling
by: O'Reilly, Patrick, et al.
Published: (2025)
by: O'Reilly, Patrick, et al.
Published: (2025)
ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
by: Ghosh, Sreyan, et al.
Published: (2024)
by: Ghosh, Sreyan, et al.
Published: (2024)
Code Drift: Towards Idempotent Neural Audio Codecs
by: O'Reilly, Patrick, et al.
Published: (2024)
by: O'Reilly, Patrick, et al.
Published: (2024)
Fusing Audio and Metadata Embeddings Improves Language-based Audio Retrieval
by: Primus, Paul, et al.
Published: (2024)
by: Primus, Paul, et al.
Published: (2024)
Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models
by: Sridhar, Arvind Krishna, et al.
Published: (2024)
by: Sridhar, Arvind Krishna, et al.
Published: (2024)
Expressive Range Characterization of Open Text-to-Audio Models
by: Morse, Jonathan, et al.
Published: (2025)
by: Morse, Jonathan, et al.
Published: (2025)
EmotionCaps: Enhancing Audio Captioning Through Emotion-Augmented Data Generation
by: Manivannan, Mithun, et al.
Published: (2024)
by: Manivannan, Mithun, et al.
Published: (2024)
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
by: Ghosh, Sreyan, et al.
Published: (2024)
by: Ghosh, Sreyan, et al.
Published: (2024)
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
by: Ghosh, Sreyan, et al.
Published: (2023)
by: Ghosh, Sreyan, et al.
Published: (2023)
First-Shot Unsupervised Anomalous Sound Detection With Unknown Anomalies Estimated by Metadata-Assisted Audio Generation
by: Zhang, Hejing, et al.
Published: (2023)
by: Zhang, Hejing, et al.
Published: (2023)
Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions
by: Yuan, Yi, et al.
Published: (2024)
by: Yuan, Yi, et al.
Published: (2024)
ChildVox: A Speech, Audio, and Large Audio-Language Model Benchmark in Understanding and Characterizing Sound across Childhood
by: Feng, Tiantian, et al.
Published: (2026)
by: Feng, Tiantian, et al.
Published: (2026)
Thinking with Sound: Audio Chain-of-Thought Enables Multimodal Reasoning in Large Audio-Language Models
by: Xiong, Zhen, et al.
Published: (2025)
by: Xiong, Zhen, et al.
Published: (2025)
AudioRAG+: Feedback-driven Retrieval-augmented Audio Generation with Large Audio Language Models
by: Zhao, Junqi, et al.
Published: (2025)
by: Zhao, Junqi, et al.
Published: (2025)
TW-Sound580K: A Regional Audio-Text Dataset with Verification-Guided Curation for Localized Audio-Language Modeling
by: Xie, Hao-Hui, et al.
Published: (2026)
by: Xie, Hao-Hui, et al.
Published: (2026)
Audio Flamingo Sound-CoT Technical Report: Improving Chain-of-Thought Reasoning in Sound Understanding
by: Kong, Zhifeng, et al.
Published: (2025)
by: Kong, Zhifeng, et al.
Published: (2025)
SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding
by: Sun, Luoyi, et al.
Published: (2026)
by: Sun, Luoyi, et al.
Published: (2026)
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer
by: Wang, Helin, et al.
Published: (2024)
by: Wang, Helin, et al.
Published: (2024)
Spatial Audio Question Answering and Reasoning on Dynamic Source Movements
by: Sridhar, Arvind Krishna, et al.
Published: (2026)
by: Sridhar, Arvind Krishna, et al.
Published: (2026)
Region-Specific Audio Tagging for Spatial Sound
by: Zhao, Jinzheng, et al.
Published: (2025)
by: Zhao, Jinzheng, et al.
Published: (2025)
AudioGS: Spectrogram-Based Audio Gaussian Splatting for Sound Field Reconstruction
by: Bi, Chunhao, et al.
Published: (2026)
by: Bi, Chunhao, et al.
Published: (2026)
Spatial Audio Motion Understanding and Reasoning
by: Sridhar, Arvind Krishna, et al.
Published: (2025)
by: Sridhar, Arvind Krishna, et al.
Published: (2025)
BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification
by: Kim, June-Woo, et al.
Published: (2024)
by: Kim, June-Woo, et al.
Published: (2024)
Improving Sound Source Localization with Joint Slot Attention on Image and Audio
by: Kim, Inho, et al.
Published: (2025)
by: Kim, Inho, et al.
Published: (2025)
The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection
by: Bibbó, Gabriel, et al.
Published: (2024)
by: Bibbó, Gabriel, et al.
Published: (2024)
MUKA: Multi Kernel Audio Adaptation Of Audio-Language Models
by: Bensaid, Reda, et al.
Published: (2026)
by: Bensaid, Reda, et al.
Published: (2026)
AudioToolAgent: An Agentic Framework for Audio-Language Models
by: Wijngaard, Gijs, et al.
Published: (2025)
by: Wijngaard, Gijs, et al.
Published: (2025)
Similar Items
-
Generative Audio Extension and Morphing
by: Seetharaman, Prem, et al.
Published: (2026) -
SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation
by: Kumar, Sonal, et al.
Published: (2024) -
Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations
by: García, Hugo Flores, et al.
Published: (2024) -
Compositional Audio Representation Learning
by: Sridhar, Sripathi, et al.
Published: (2024) -
AudioChat: Unified Audio Storytelling, Editing, and Understanding with Transfusion Forcing
by: Chen, William, et al.
Published: (2026)