Saved in:
| Main Authors: | Sun, Yirong, Chen, Yanjun, Qiu, Xin, Zhang, Gang, Chen, Hongyu, Wu, Daokuan, Li, Chengming, Yang, Min, Zhu, Dawei, Zhang, Wei, Shen, Xiaoyu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.11039 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model
by: Sun, Yirong, et al.
Published: (2025)
by: Sun, Yirong, et al.
Published: (2025)
Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
by: Ji, Xiaozhong, et al.
Published: (2024)
by: Ji, Xiaozhong, et al.
Published: (2024)
Sonic4D: Spatial Audio Generation for Immersive 4D Scene Exploration
by: Xie, Siyi, et al.
Published: (2025)
by: Xie, Siyi, et al.
Published: (2025)
AudioBench: A Universal Benchmark for Audio Large Language Models
by: Wang, Bin, et al.
Published: (2024)
by: Wang, Bin, et al.
Published: (2024)
ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models
by: Li, Bohan, et al.
Published: (2025)
by: Li, Bohan, et al.
Published: (2025)
AudioCapBench: Quick Evaluation on Audio Captioning across Sound, Music, and Speech
by: Qiu, Jielin, et al.
Published: (2026)
by: Qiu, Jielin, et al.
Published: (2026)
SonicSense: Object Perception from In-Hand Acoustic Vibration
by: Liu, Jiaxun, et al.
Published: (2024)
by: Liu, Jiaxun, et al.
Published: (2024)
AudioMotionBench: Evaluating Auditory Motion Perception in Audio LLMs
by: Sun, Zhe, et al.
Published: (2025)
by: Sun, Zhe, et al.
Published: (2025)
RSA-Bench: Benchmarking Audio Large Models in Real-World Acoustic Scenarios
by: Zhang, Yibo, et al.
Published: (2026)
by: Zhang, Yibo, et al.
Published: (2026)
Resonate: Reinforcing Text-to-Audio Generation via Online Feedback from Large Audio Language Models
by: Li, Xiquan, et al.
Published: (2026)
by: Li, Xiquan, et al.
Published: (2026)
VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational Agents
by: Hu, Jiliang, et al.
Published: (2025)
by: Hu, Jiliang, et al.
Published: (2025)
LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models
by: Zhao, Xiaohan, et al.
Published: (2025)
by: Zhao, Xiaohan, et al.
Published: (2025)
AudioCodecBench: A Comprehensive Benchmark for Audio Codec Evaluation
by: Wang, Lu, et al.
Published: (2025)
by: Wang, Lu, et al.
Published: (2025)
VidAudio-Bench: Benchmarking V2A and VT2A Generation across Four Audio Categories
by: Zhang, Qian, et al.
Published: (2026)
by: Zhang, Qian, et al.
Published: (2026)
OmniSonic: Towards Universal and Holistic Audio Generation from Video and Text
by: Pian, Weiguo, et al.
Published: (2026)
by: Pian, Weiguo, et al.
Published: (2026)
Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders
by: Shan, Weiqiao, et al.
Published: (2025)
by: Shan, Weiqiao, et al.
Published: (2025)
HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models
by: Zhao, Feiyu, et al.
Published: (2026)
by: Zhao, Feiyu, et al.
Published: (2026)
ChronosAudio: A Comprehensive Long-Audio Benchmark for Evaluating Audio-Large Language Models
by: Luo, Kaiwen, et al.
Published: (2026)
by: Luo, Kaiwen, et al.
Published: (2026)
AudioRAG+: Feedback-driven Retrieval-augmented Audio Generation with Large Audio Language Models
by: Zhao, Junqi, et al.
Published: (2025)
by: Zhao, Junqi, et al.
Published: (2025)
Audio-VLA: Adding Contact Audio Perception to Vision-Language-Action Model for Robotic Manipulation
by: Wei, Xiangyi, et al.
Published: (2025)
by: Wei, Xiangyi, et al.
Published: (2025)
AudioCIL: A Python Toolbox for Audio Class-Incremental Learning with Multiple Scenes
by: Xu, Qisheng, et al.
Published: (2024)
by: Xu, Qisheng, et al.
Published: (2024)
Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models
by: Cheng, Hao, et al.
Published: (2025)
by: Cheng, Hao, et al.
Published: (2025)
AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives
by: Chen, Yanxi, et al.
Published: (2025)
by: Chen, Yanxi, et al.
Published: (2025)
Causal Tracing of Audio-Text Fusion in Large Audio Language Models
by: Chen, Wei-Chih, et al.
Published: (2026)
by: Chen, Wei-Chih, et al.
Published: (2026)
AudioKV: KV Cache Eviction in Efficient Large Audio Language Models
by: Wang, Yuxuan, et al.
Published: (2026)
by: Wang, Yuxuan, et al.
Published: (2026)
Audio-Visual Speech Separation via Bottleneck Iterative Network
by: Zhang, Sidong, et al.
Published: (2025)
by: Zhang, Sidong, et al.
Published: (2025)
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
by: Liu, Jizhong, et al.
Published: (2024)
by: Liu, Jizhong, et al.
Published: (2024)
A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook
by: Luo, Kaiwen, et al.
Published: (2026)
by: Luo, Kaiwen, et al.
Published: (2026)
Can Large Language Models Understand Spatial Audio?
by: Tang, Changli, et al.
Published: (2024)
by: Tang, Changli, et al.
Published: (2024)
The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models
by: Chen, Yanjun, et al.
Published: (2024)
by: Chen, Yanjun, et al.
Published: (2024)
Beyond Transcription: Unified Audio Schema for Perception-Aware AudioLLMs
by: Zhang, Linhao, et al.
Published: (2026)
by: Zhang, Linhao, et al.
Published: (2026)
Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations
by: García, Hugo Flores, et al.
Published: (2024)
by: García, Hugo Flores, et al.
Published: (2024)
EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos
by: Rai, Aashish, et al.
Published: (2024)
by: Rai, Aashish, et al.
Published: (2024)
Towards Fine-grained Temporal Perception: Post-Training Large Audio-Language Models with Audio-Side Time Prompt
by: Shi, Yanfeng, et al.
Published: (2026)
by: Shi, Yanfeng, et al.
Published: (2026)
Dissecting Performance Degradation in Audio Source Separation under Sampling Frequency Mismatch
by: Imamura, Kanami, et al.
Published: (2026)
by: Imamura, Kanami, et al.
Published: (2026)
PitchBench: Measuring Pitch Hearing in Audio-Language Models
by: Dujardin, Milan Liessens, et al.
Published: (2026)
by: Dujardin, Milan Liessens, et al.
Published: (2026)
Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
by: Chen, Yiming, et al.
Published: (2024)
by: Chen, Yiming, et al.
Published: (2024)
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
by: Song, Zirui, et al.
Published: (2025)
by: Song, Zirui, et al.
Published: (2025)
ProAV-DiT: A Projected Latent Diffusion Transformer for Efficient Synchronized Audio-Video Generation
by: Sun, Jiahui, et al.
Published: (2025)
by: Sun, Jiahui, et al.
Published: (2025)
From Coarse to Fine: Recursive Audio-Visual Semantic Enhancement for Speech Separation
by: Xue, Ke, et al.
Published: (2025)
by: Xue, Ke, et al.
Published: (2025)
Similar Items
-
LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model
by: Sun, Yirong, et al.
Published: (2025) -
Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
by: Ji, Xiaozhong, et al.
Published: (2024) -
Sonic4D: Spatial Audio Generation for Immersive 4D Scene Exploration
by: Xie, Siyi, et al.
Published: (2025) -
AudioBench: A Universal Benchmark for Audio Large Language Models
by: Wang, Bin, et al.
Published: (2024) -
ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models
by: Li, Bohan, et al.
Published: (2025)