Saved in:
| Main Authors: | Chen, Liyang, Chen, Hongkai, Cai, Yujun, Li, Sifan, Ye, Qingwen, Wang, Yiwei |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.10439 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Audio Entailment: Assessing Deductive Reasoning for Audio Understanding
by: Deshmukh, Soham, et al.
Published: (2024)
by: Deshmukh, Soham, et al.
Published: (2024)
Representation-Regularized Convolutional Audio Transformer for Audio Understanding
by: Han, Bing, et al.
Published: (2026)
by: Han, Bing, et al.
Published: (2026)
Streaming Audio Transformers for Online Audio Tagging
by: Dinkel, Heinrich, et al.
Published: (2023)
by: Dinkel, Heinrich, et al.
Published: (2023)
Efficient Autoregressive Audio Modeling via Next-Scale Prediction
by: Qiu, Kai, et al.
Published: (2024)
by: Qiu, Kai, et al.
Published: (2024)
MiDashengLM: Efficient Audio Understanding with General Audio Captions
by: Dinkel, Heinrich, et al.
Published: (2025)
by: Dinkel, Heinrich, et al.
Published: (2025)
AudioRAG: A Challenging Benchmark for Audio Reasoning and Information Retrieval
by: Lin, Jingru, et al.
Published: (2026)
by: Lin, Jingru, et al.
Published: (2026)
Tri-Ergon: Fine-grained Video-to-Audio Generation with Multi-modal Conditions and LUFS Control
by: Li, Bingliang, et al.
Published: (2024)
by: Li, Bingliang, et al.
Published: (2024)
AHAMask: Reliable Task Specification for Large Audio Language Models without Instructions
by: Guo, Yiwei, et al.
Published: (2025)
by: Guo, Yiwei, et al.
Published: (2025)
Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?
by: Xie, Yuankun, et al.
Published: (2024)
by: Xie, Yuankun, et al.
Published: (2024)
AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion
by: Zhao, Junqi, et al.
Published: (2025)
by: Zhao, Junqi, et al.
Published: (2025)
DreamAudio: Customized Text-to-Audio Generation with Diffusion Models
by: Yuan, Yi, et al.
Published: (2025)
by: Yuan, Yi, et al.
Published: (2025)
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
by: Song, Zirui, et al.
Published: (2025)
by: Song, Zirui, et al.
Published: (2025)
Audio Mamba: Pretrained Audio State Space Model For Audio Tagging
by: Lin, Jiaju, et al.
Published: (2024)
by: Lin, Jiaju, et al.
Published: (2024)
Audio-Mind: An Auditable Agentic Framework for Audio Understanding
by: Wang, Yucheng, et al.
Published: (2026)
by: Wang, Yucheng, et al.
Published: (2026)
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
by: Ghosh, Sreyan, et al.
Published: (2024)
by: Ghosh, Sreyan, et al.
Published: (2024)
Harder or Different? Understanding Generalization of Audio Deepfake Detection
by: Müller, Nicolas M., et al.
Published: (2024)
by: Müller, Nicolas M., et al.
Published: (2024)
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion
by: Chen, Shunian, et al.
Published: (2025)
by: Chen, Shunian, et al.
Published: (2025)
Inference-time Scaling for Diffusion-based Audio Super-resolution
by: Jin, Yizhu, et al.
Published: (2025)
by: Jin, Yizhu, et al.
Published: (2025)
Who Can Withstand Chat-Audio Attacks? An Evaluation Benchmark for Large Audio-Language Models
by: Yang, Wanqi, et al.
Published: (2024)
by: Yang, Wanqi, et al.
Published: (2024)
MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models
by: Gong, Yitian, et al.
Published: (2026)
by: Gong, Yitian, et al.
Published: (2026)
Audio Atlas: Visualizing and Exploring Audio Datasets
by: Lanzendörfer, Luca A., et al.
Published: (2024)
by: Lanzendörfer, Luca A., et al.
Published: (2024)
Leveraging Self-supervised Audio Representations for Data-Efficient Acoustic Scene Classification
by: Cai, Yiqiang, et al.
Published: (2024)
by: Cai, Yiqiang, et al.
Published: (2024)
Audio Spatially-Guided Fusion for Audio-Visual Navigation
by: Zhou, Xinyu, et al.
Published: (2026)
by: Zhou, Xinyu, et al.
Published: (2026)
Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning
by: Zhao, Junchuan, et al.
Published: (2025)
by: Zhao, Junchuan, et al.
Published: (2025)
From Contrast to Commonality: Audio Commonality Captioning for Enhanced Audio-Text Cross-modal Understanding in Multimodal LLMs
by: Jia, Yuhang, et al.
Published: (2025)
by: Jia, Yuhang, et al.
Published: (2025)
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
by: Kim, Jaeyeon, et al.
Published: (2024)
by: Kim, Jaeyeon, et al.
Published: (2024)
AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
by: He, Peize, et al.
Published: (2025)
by: He, Peize, et al.
Published: (2025)
When Scaling Fails: Mitigating Audio Perception Decay of LALMs via Multi-Step Perception-Aware Reasoning
by: Mao, Ruixiang, et al.
Published: (2026)
by: Mao, Ruixiang, et al.
Published: (2026)
Audio Deepfake Attribution: An Initial Dataset and Investigation
by: Yan, Xinrui, et al.
Published: (2022)
by: Yan, Xinrui, et al.
Published: (2022)
Towards Spatial Audio Understanding via Question Answering
by: Sudarsanam, Parthasaarathy, et al.
Published: (2025)
by: Sudarsanam, Parthasaarathy, et al.
Published: (2025)
Hyperdimensional Intelligent Sensing for Efficient Real-Time Audio Processing on Extreme Edge
by: Yun, Sanggeon, et al.
Published: (2025)
by: Yun, Sanggeon, et al.
Published: (2025)
Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event Detection
by: Cai, Pengfei, et al.
Published: (2024)
by: Cai, Pengfei, et al.
Published: (2024)
CORD: Bridging the Audio-Text Reasoning Gap via Weighted On-policy Cross-modal Distillation
by: Hu, Jing, et al.
Published: (2026)
by: Hu, Jing, et al.
Published: (2026)
Understanding Pedestrian Movement Using Urban Sensing Technologies: The Promise of Audio-based Sensors
by: Han, Chaeyeon, et al.
Published: (2024)
by: Han, Chaeyeon, et al.
Published: (2024)
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
by: Erol, Mehmet Hamza, et al.
Published: (2024)
by: Erol, Mehmet Hamza, et al.
Published: (2024)
AudioScene: Integrating Object-Event Audio into 3D Scenes
by: Yuan, Shuaihang, et al.
Published: (2025)
by: Yuan, Shuaihang, et al.
Published: (2025)
Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations
by: Yadav, Sarthak, et al.
Published: (2024)
by: Yadav, Sarthak, et al.
Published: (2024)
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
by: Sakshi, S, et al.
Published: (2024)
by: Sakshi, S, et al.
Published: (2024)
Disentangling Reasoning in Large Audio-Language Models for Ambiguous Emotion Prediction
by: Yu, Xiaofeng, et al.
Published: (2026)
by: Yu, Xiaofeng, et al.
Published: (2026)
Can Large Language Models Understand Spatial Audio?
by: Tang, Changli, et al.
Published: (2024)
by: Tang, Changli, et al.
Published: (2024)
Similar Items
-
Audio Entailment: Assessing Deductive Reasoning for Audio Understanding
by: Deshmukh, Soham, et al.
Published: (2024) -
Representation-Regularized Convolutional Audio Transformer for Audio Understanding
by: Han, Bing, et al.
Published: (2026) -
Streaming Audio Transformers for Online Audio Tagging
by: Dinkel, Heinrich, et al.
Published: (2023) -
Efficient Autoregressive Audio Modeling via Next-Scale Prediction
by: Qiu, Kai, et al.
Published: (2024) -
MiDashengLM: Efficient Audio Understanding with General Audio Captions
by: Dinkel, Heinrich, et al.
Published: (2025)