Saved in:
| Main Authors: | Ieong, Lok-Lam, Chen, Chia-Chien, Yang, Chih-Kai, Huang, Yu-Han, Cheng, An-Yu, Lee, Hung-yi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.14636 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information
by: Yang, Chih-Kai, et al.
Published: (2025)
by: Yang, Chih-Kai, et al.
Published: (2025)
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey
by: Yang, Chih-Kai, et al.
Published: (2025)
by: Yang, Chih-Kai, et al.
Published: (2025)
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning
by: Kuan, Chun-Yi, et al.
Published: (2024)
by: Kuan, Chun-Yi, et al.
Published: (2024)
AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2025)
by: Yang, Chih-Kai, et al.
Published: (2025)
Enhancing Automatic Chord Recognition through LLM Chain-of-Thought Reasoning
by: Chang, Chih-Cheng, et al.
Published: (2025)
by: Chang, Chih-Cheng, et al.
Published: (2025)
Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model
by: Ma, Ziyang, et al.
Published: (2025)
by: Ma, Ziyang, et al.
Published: (2025)
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems
by: Wu, Haibin, et al.
Published: (2024)
by: Wu, Haibin, et al.
Published: (2024)
Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples
by: Kuan, Chun-Yi, et al.
Published: (2025)
by: Kuan, Chun-Yi, et al.
Published: (2025)
AudioGenie-Reasoner: A Training-Free Multi-Agent Framework for Coarse-to-Fine Audio Deep Reasoning
by: Rong, Yan, et al.
Published: (2025)
by: Rong, Yan, et al.
Published: (2025)
How Contrastive Decoding Enhances Large Audio Language Models?
by: Lin, Tzu-Quan, et al.
Published: (2026)
by: Lin, Tzu-Quan, et al.
Published: (2026)
MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2026)
by: Yang, Chih-Kai, et al.
Published: (2026)
MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large-Audio Language Model
by: Huang, Hsiao-Ying, et al.
Published: (2025)
by: Huang, Hsiao-Ying, et al.
Published: (2025)
A Preliminary Exploration with GPT-4o Voice Mode
by: Lin, Yu-Xiang, et al.
Published: (2025)
by: Lin, Yu-Xiang, et al.
Published: (2025)
Interpretable Audio Editing Evaluation via Chain-of-Thought Difference-Commonality Reasoning with Multimodal LLMs
by: Jia, Yuhang, et al.
Published: (2025)
by: Jia, Yuhang, et al.
Published: (2025)
All That Glitters Is Not Audio: Rethinking Text Priors and Audio Reliance in Audio-Language Evaluation
by: Foo, Leonardo Haw-Yang, et al.
Published: (2026)
by: Foo, Leonardo Haw-Yang, et al.
Published: (2026)
Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
by: Kuan, Chun-Yi, et al.
Published: (2024)
by: Kuan, Chun-Yi, et al.
Published: (2024)
Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper
by: Yang, Chih-Kai, et al.
Published: (2024)
by: Yang, Chih-Kai, et al.
Published: (2024)
Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations
by: Feng, Bo-Han, et al.
Published: (2025)
by: Feng, Bo-Han, et al.
Published: (2025)
VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech
by: Lin, Yi-Cheng, et al.
Published: (2026)
by: Lin, Yi-Cheng, et al.
Published: (2026)
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
by: Kuan, Chun-Yi, et al.
Published: (2024)
by: Kuan, Chun-Yi, et al.
Published: (2024)
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
by: Liu, Huadai, et al.
Published: (2025)
by: Liu, Huadai, et al.
Published: (2025)
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
by: Lu, Ke-Han, et al.
Published: (2025)
by: Lu, Ke-Han, et al.
Published: (2025)
Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages
by: Huang, Kuan-Po, et al.
Published: (2023)
by: Huang, Kuan-Po, et al.
Published: (2023)
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
by: Wu, Haibin, et al.
Published: (2024)
by: Wu, Haibin, et al.
Published: (2024)
CodecFake+: A Large-Scale Neural Audio Codec-Based Deepfake Speech Dataset
by: Chen, Xuanjun, et al.
Published: (2025)
by: Chen, Xuanjun, et al.
Published: (2025)
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2025)
by: Yang, Chih-Kai, et al.
Published: (2025)
LEMAS: Large A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models
by: Zhao, Zhiyuan, et al.
Published: (2026)
by: Zhao, Zhiyuan, et al.
Published: (2026)
Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
by: Wang, Shih-heng, et al.
Published: (2024)
by: Wang, Shih-heng, et al.
Published: (2024)
Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought
by: Zhao, Zhixian, et al.
Published: (2025)
by: Zhao, Zhixian, et al.
Published: (2025)
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
by: Du, Jiawei, et al.
Published: (2024)
by: Du, Jiawei, et al.
Published: (2024)
Joint Fullband-Subband Modeling for High-Resolution SingFake Detection
by: Chen, Xuanjun, et al.
Published: (2026)
by: Chen, Xuanjun, et al.
Published: (2026)
How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation
by: Lu, Ke-Han, et al.
Published: (2026)
by: Lu, Ke-Han, et al.
Published: (2026)
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models
by: Hsiao, Chi-Yuan, et al.
Published: (2025)
by: Hsiao, Chi-Yuan, et al.
Published: (2025)
IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling
by: Huang, Kuan-Po, et al.
Published: (2025)
by: Huang, Kuan-Po, et al.
Published: (2025)
Disentangling Reasoning in Large Audio-Language Models for Ambiguous Emotion Prediction
by: Yu, Xiaofeng, et al.
Published: (2026)
by: Yu, Xiaofeng, et al.
Published: (2026)
AudioEditor: A Training-Free Diffusion-Based Audio Editing Framework
by: Jia, Yuhang, et al.
Published: (2024)
by: Jia, Yuhang, et al.
Published: (2024)
Towards audio language modeling -- an overview
by: Wu, Haibin, et al.
Published: (2024)
by: Wu, Haibin, et al.
Published: (2024)
Can Large Language Models Understand Spatial Audio?
by: Tang, Changli, et al.
Published: (2024)
by: Tang, Changli, et al.
Published: (2024)
Codec-Based Deepfake Source Tracing via Neural Audio Codec Taxonomy
by: Chen, Xuanjun, et al.
Published: (2025)
by: Chen, Xuanjun, et al.
Published: (2025)
EmoSteer-TTS: Fine-Grained and Training-Free Emotion-Controllable Text-to-Speech via Activation Steering
by: Xie, Tianxin, et al.
Published: (2025)
by: Xie, Tianxin, et al.
Published: (2025)
Similar Items
-
SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information
by: Yang, Chih-Kai, et al.
Published: (2025) -
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey
by: Yang, Chih-Kai, et al.
Published: (2025) -
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning
by: Kuan, Chun-Yi, et al.
Published: (2024) -
AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2025) -
Enhancing Automatic Chord Recognition through LLM Chain-of-Thought Reasoning
by: Chang, Chih-Cheng, et al.
Published: (2025)