Saved in:
| Main Authors: | Yang, Chih-Kai, Tsai, Yun-Shao, Guo, Yu-Kai, Tsai, Ping-Le, Piao, Yen-Ting, Chen, Hung-Wei, Hsiao, Ting-Lin, Hsu, Yun-Man, Lu, Ke-Han, Lee, Hung-yi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.09714 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information
by: Yang, Chih-Kai, et al.
Published: (2025)
by: Yang, Chih-Kai, et al.
Published: (2025)
Toward Fair Speech Technologies: A Comprehensive Survey of Bias and Fairness in Speech AI
by: Lin, Yi-Cheng, et al.
Published: (2026)
by: Lin, Yi-Cheng, et al.
Published: (2026)
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey
by: Yang, Chih-Kai, et al.
Published: (2025)
by: Yang, Chih-Kai, et al.
Published: (2025)
All That Glitters Is Not Audio: Rethinking Text Priors and Audio Reliance in Audio-Language Evaluation
by: Foo, Leonardo Haw-Yang, et al.
Published: (2026)
by: Foo, Leonardo Haw-Yang, et al.
Published: (2026)
Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper
by: Yang, Chih-Kai, et al.
Published: (2024)
by: Yang, Chih-Kai, et al.
Published: (2024)
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2025)
by: Yang, Chih-Kai, et al.
Published: (2025)
Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech
by: Lin, Guan-Ting, et al.
Published: (2024)
by: Lin, Guan-Ting, et al.
Published: (2024)
Towards audio language modeling -- an overview
by: Wu, Haibin, et al.
Published: (2024)
by: Wu, Haibin, et al.
Published: (2024)
AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2025)
by: Yang, Chih-Kai, et al.
Published: (2025)
MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large-Audio Language Model
by: Huang, Hsiao-Ying, et al.
Published: (2025)
by: Huang, Hsiao-Ying, et al.
Published: (2025)
Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
by: Kuan, Chun-Yi, et al.
Published: (2024)
by: Kuan, Chun-Yi, et al.
Published: (2024)
ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models
by: Hsiao, Chi-Yuan, et al.
Published: (2026)
by: Hsiao, Chi-Yuan, et al.
Published: (2026)
AQAScore: Evaluating Semantic Alignment in Text-to-Audio Generation via Audio Question Answering
by: Kuan, Chun-Yi, et al.
Published: (2026)
by: Kuan, Chun-Yi, et al.
Published: (2026)
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems
by: Wu, Haibin, et al.
Published: (2024)
by: Wu, Haibin, et al.
Published: (2024)
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
by: Kuan, Chun-Yi, et al.
Published: (2024)
by: Kuan, Chun-Yi, et al.
Published: (2024)
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning
by: Kuan, Chun-Yi, et al.
Published: (2024)
by: Kuan, Chun-Yi, et al.
Published: (2024)
SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition
by: Hsu, Ming-Hao, et al.
Published: (2024)
by: Hsu, Ming-Hao, et al.
Published: (2024)
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models
by: Hsiao, Chi-Yuan, et al.
Published: (2025)
by: Hsiao, Chi-Yuan, et al.
Published: (2025)
SUTA-LM: Bridging Test-Time Adaptation and Language Model Rescoring for Robust ASR
by: Huang, Wei-Ping, et al.
Published: (2025)
by: Huang, Wei-Ping, et al.
Published: (2025)
Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples
by: Kuan, Chun-Yi, et al.
Published: (2025)
by: Kuan, Chun-Yi, et al.
Published: (2025)
How Contrastive Decoding Enhances Large Audio Language Models?
by: Lin, Tzu-Quan, et al.
Published: (2026)
by: Lin, Tzu-Quan, et al.
Published: (2026)
Parallel Synthesis for Autoregressive Speech Generation
by: Hsu, Po-chun, et al.
Published: (2022)
by: Hsu, Po-chun, et al.
Published: (2022)
A Study on Synthesizing Expressive Violin Performances: Approaches and Comparisons
by: Hung, Tzu-Yun, et al.
Published: (2024)
by: Hung, Tzu-Yun, et al.
Published: (2024)
Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages
by: Huang, Kuan-Po, et al.
Published: (2023)
by: Huang, Kuan-Po, et al.
Published: (2023)
Spectral-Aware Low-Rank Adaptation for Speaker Verification
by: Li, Zhe, et al.
Published: (2025)
by: Li, Zhe, et al.
Published: (2025)
ALICE: A Multifaceted Evaluation Framework of Large Audio-Language Models' In-Context Learning Ability
by: Piao, Yen-Ting, et al.
Published: (2026)
by: Piao, Yen-Ting, et al.
Published: (2026)
VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech
by: Lin, Yi-Cheng, et al.
Published: (2026)
by: Lin, Yi-Cheng, et al.
Published: (2026)
Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models
by: Ieong, Lok-Lam, et al.
Published: (2026)
by: Ieong, Lok-Lam, et al.
Published: (2026)
SynthCloner: Synthesizer-style Audio Transfer via Factorized Codec with ADSR Envelope Control
by: Liu, Jeng-Yue, et al.
Published: (2025)
by: Liu, Jeng-Yue, et al.
Published: (2025)
How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation
by: Lu, Ke-Han, et al.
Published: (2026)
by: Lu, Ke-Han, et al.
Published: (2026)
A Preliminary Exploration with GPT-4o Voice Mode
by: Lin, Yu-Xiang, et al.
Published: (2025)
by: Lin, Yu-Xiang, et al.
Published: (2025)
Moises-Light: Resource-efficient Band-split U-Net For Music Source Separation
by: Yun-Ning, et al.
Published: (2025)
by: Yun-Ning, et al.
Published: (2025)
AudioRepInceptionNeXt: A lightweight single-stream architecture for efficient audio recognition
by: Lau, Kin Wai, et al.
Published: (2024)
by: Lau, Kin Wai, et al.
Published: (2024)
Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
by: Wu, Haibin, et al.
Published: (2024)
by: Wu, Haibin, et al.
Published: (2024)
Codec-Based Deepfake Source Tracing via Neural Audio Codec Taxonomy
by: Chen, Xuanjun, et al.
Published: (2025)
by: Chen, Xuanjun, et al.
Published: (2025)
ToxicTone: A Mandarin Audio Dataset Annotated for Toxicity and Toxic Utterance Tonality
by: Luo, Yu-Xiang, et al.
Published: (2025)
by: Luo, Yu-Xiang, et al.
Published: (2025)
ASTAR-NTU solution to AudioMOS Challenge 2025 Track1
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
PyNeuralFx: A Python Package for Neural Audio Effect Modeling
by: Yeh, Yen-Tung, et al.
Published: (2024)
by: Yeh, Yen-Tung, et al.
Published: (2024)
Hyper Recurrent Neural Network: Condition Mechanisms for Black-box Audio Effect Modeling
by: Yeh, Yen-Tung, et al.
Published: (2024)
by: Yeh, Yen-Tung, et al.
Published: (2024)
Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models
by: Kuan, Chun-Yi, et al.
Published: (2026)
by: Kuan, Chun-Yi, et al.
Published: (2026)
Similar Items
-
SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information
by: Yang, Chih-Kai, et al.
Published: (2025) -
Toward Fair Speech Technologies: A Comprehensive Survey of Bias and Fairness in Speech AI
by: Lin, Yi-Cheng, et al.
Published: (2026) -
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey
by: Yang, Chih-Kai, et al.
Published: (2025) -
All That Glitters Is Not Audio: Rethinking Text Priors and Audio Reliance in Audio-Language Evaluation
by: Foo, Leonardo Haw-Yang, et al.
Published: (2026) -
Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper
by: Yang, Chih-Kai, et al.
Published: (2024)