Saved in:
| Main Authors: | Wang, Tsai-Ning, Dekker, Herman Teun den, Chen, Lin-Lin, Zeghidour, Neil, Saeed, Aaqib |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.12647 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Language Models as Semantic Teachers: Post-Training Alignment for Medical Audio Understanding
by: Wang, Tsai-Ning, et al.
Published: (2025)
by: Wang, Tsai-Ning, et al.
Published: (2025)
CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning
by: Wang, Tsai-Ning, et al.
Published: (2025)
by: Wang, Tsai-Ning, et al.
Published: (2025)
ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
by: Ghosh, Sreyan, et al.
Published: (2024)
by: Ghosh, Sreyan, et al.
Published: (2024)
Aligning Spoken Dialogue Models from User Interactions
by: Wu, Anne, et al.
Published: (2025)
by: Wu, Anne, et al.
Published: (2025)
StethoLM: Audio Language Model for Cardiopulmonary Analysis Across Clinical Tasks
by: Wang, Yishan, et al.
Published: (2026)
by: Wang, Yishan, et al.
Published: (2026)
Simultaneous Speech-to-Speech Translation Without Aligned Data
by: Labiausse, Tom, et al.
Published: (2026)
by: Labiausse, Tom, et al.
Published: (2026)
High-Fidelity Simultaneous Speech-To-Speech Translation
by: Labiausse, Tom, et al.
Published: (2025)
by: Labiausse, Tom, et al.
Published: (2025)
When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models
by: Li, Chen-An, et al.
Published: (2025)
by: Li, Chen-An, et al.
Published: (2025)
TSPE: Task-Specific Prompt Ensemble for Improved Zero-Shot Audio Classification
by: Anand, Nishit, et al.
Published: (2024)
by: Anand, Nishit, et al.
Published: (2024)
MiMo-Audio: Audio Language Models are Few-Shot Learners
by: Core Team, et al.
Published: (2025)
by: Core Team, et al.
Published: (2025)
RespLLM: Unifying Audio and Text with Multimodal LLMs for Generalized Respiratory Health Prediction
by: Zhang, Yuwei, et al.
Published: (2024)
by: Zhang, Yuwei, et al.
Published: (2024)
MimicLM: Zero-Shot Voice Imitation through Autoregressive Modeling of Pseudo-Parallel Speech Corpora
by: Feng, Tao, et al.
Published: (2026)
by: Feng, Tao, et al.
Published: (2026)
Hearing the Order: Investigating Position Bias in Large Audio-Language Models
by: Lin, Yu-Xiang, et al.
Published: (2025)
by: Lin, Yu-Xiang, et al.
Published: (2025)
Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs
by: Zhang, Wenyu, et al.
Published: (2025)
by: Zhang, Wenyu, et al.
Published: (2025)
HeadRouter: Dynamic Head-Weight Routing for Task-Adaptive Audio Token Pruning in Large Audio Language Models
by: He, Peize, et al.
Published: (2026)
by: He, Peize, et al.
Published: (2026)
Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion
by: Zhang, Yu, et al.
Published: (2025)
by: Zhang, Yu, et al.
Published: (2025)
Few-Shot Contrastive Adaptation for Audio Abuse Detection in Low-Resource Indic Languages
by: Sankaran, Aditya Narayan, et al.
Published: (2026)
by: Sankaran, Aditya Narayan, et al.
Published: (2026)
ToxicTone: A Mandarin Audio Dataset Annotated for Toxicity and Toxic Utterance Tonality
by: Luo, Yu-Xiang, et al.
Published: (2025)
by: Luo, Yu-Xiang, et al.
Published: (2025)
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
by: Hu, Yuchen, et al.
Published: (2024)
by: Hu, Yuchen, et al.
Published: (2024)
Continuous Audio Language Models
by: Rouard, Simon, et al.
Published: (2025)
by: Rouard, Simon, et al.
Published: (2025)
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
by: Xue, Jinlong, et al.
Published: (2024)
by: Xue, Jinlong, et al.
Published: (2024)
Zero-Shot Cognitive Impairment Detection from Speech Using AudioLLM
by: Shahin, Mostafa, et al.
Published: (2025)
by: Shahin, Mostafa, et al.
Published: (2025)
Audio Contrastive-based Fine-tuning: Decoupling Representation Learning and Classification
by: Wang, Yang, et al.
Published: (2023)
by: Wang, Yang, et al.
Published: (2023)
Zero-Shot Text-to-Speech for Vietnamese
by: Vu, Thi, et al.
Published: (2025)
by: Vu, Thi, et al.
Published: (2025)
AudioBench: A Universal Benchmark for Audio Large Language Models
by: Wang, Bin, et al.
Published: (2024)
by: Wang, Bin, et al.
Published: (2024)
Causal Tracing of Audio-Text Fusion in Large Audio Language Models
by: Chen, Wei-Chih, et al.
Published: (2026)
by: Chen, Wei-Chih, et al.
Published: (2026)
Moshi: a speech-text foundation model for real-time dialogue
by: Défossez, Alexandre, et al.
Published: (2024)
by: Défossez, Alexandre, et al.
Published: (2024)
Assessing Factual Music Comprehension in Large Audio Language Models
by: Lin, Daniel Chenyu, et al.
Published: (2025)
by: Lin, Daniel Chenyu, et al.
Published: (2025)
Words at Play: Benchmarking Audio Pun Understanding in Large Audio-Language Models
by: Su, Yuchen, et al.
Published: (2026)
by: Su, Yuchen, et al.
Published: (2026)
Not that Groove: Zero-Shot Symbolic Music Editing
by: Zhang, Li
Published: (2025)
by: Zhang, Li
Published: (2025)
Beyond Transcription: Unified Audio Schema for Perception-Aware AudioLLMs
by: Zhang, Linhao, et al.
Published: (2026)
by: Zhang, Linhao, et al.
Published: (2026)
How Contrastive Decoding Enhances Large Audio Language Models?
by: Lin, Tzu-Quan, et al.
Published: (2026)
by: Lin, Tzu-Quan, et al.
Published: (2026)
MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2026)
by: Yang, Chih-Kai, et al.
Published: (2026)
Unified Acoustic Representations for Screening Neurological and Respiratory Pathologies from Voice
by: Piao, Ran, et al.
Published: (2025)
by: Piao, Ran, et al.
Published: (2025)
Thinking with Sound: Audio Chain-of-Thought Enables Multimodal Reasoning in Large Audio-Language Models
by: Xiong, Zhen, et al.
Published: (2025)
by: Xiong, Zhen, et al.
Published: (2025)
Classification of Spontaneous and Scripted Speech for Multilingual Audio
by: Elisha, Shahar, et al.
Published: (2024)
by: Elisha, Shahar, et al.
Published: (2024)
Towards Zero-Shot Text-To-Speech for Arabic Dialects
by: Doan, Khai Duy, et al.
Published: (2024)
by: Doan, Khai Duy, et al.
Published: (2024)
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
by: Kanda, Naoyuki, et al.
Published: (2024)
by: Kanda, Naoyuki, et al.
Published: (2024)
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
by: Li, Junjie, et al.
Published: (2023)
by: Li, Junjie, et al.
Published: (2023)
Audio-DeepThinker: Progressive Reasoning-Aware Reinforcement Learning for High-Quality Chain-of-Thought Emergence in Audio Language Models
by: He, Xiang, et al.
Published: (2026)
by: He, Xiang, et al.
Published: (2026)
Similar Items
-
Language Models as Semantic Teachers: Post-Training Alignment for Medical Audio Understanding
by: Wang, Tsai-Ning, et al.
Published: (2025) -
CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning
by: Wang, Tsai-Ning, et al.
Published: (2025) -
ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
by: Ghosh, Sreyan, et al.
Published: (2024) -
Aligning Spoken Dialogue Models from User Interactions
by: Wu, Anne, et al.
Published: (2025) -
StethoLM: Audio Language Model for Cardiopulmonary Analysis Across Clinical Tasks
by: Wang, Yishan, et al.
Published: (2026)