Saved in:
| Main Authors: | Wang, Tsai-Ning, Chen, Lin-Lin, Zeghidour, Neil, Saeed, Aaqib |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.04847 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Adaptive Test-Time Scaling for Zero-Shot Respiratory Audio Classification
by: Wang, Tsai-Ning, et al.
Published: (2026)
by: Wang, Tsai-Ning, et al.
Published: (2026)
CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning
by: Wang, Tsai-Ning, et al.
Published: (2025)
by: Wang, Tsai-Ning, et al.
Published: (2025)
StethoLM: Audio Language Model for Cardiopulmonary Analysis Across Clinical Tasks
by: Wang, Yishan, et al.
Published: (2026)
by: Wang, Yishan, et al.
Published: (2026)
RespLLM: Unifying Audio and Text with Multimodal LLMs for Generalized Respiratory Health Prediction
by: Zhang, Yuwei, et al.
Published: (2024)
by: Zhang, Yuwei, et al.
Published: (2024)
Towards Fine-grained Temporal Perception: Post-Training Large Audio-Language Models with Audio-Side Time Prompt
by: Shi, Yanfeng, et al.
Published: (2026)
by: Shi, Yanfeng, et al.
Published: (2026)
MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2026)
by: Yang, Chih-Kai, et al.
Published: (2026)
Audio-Maestro: Enhancing Large Audio-Language Models with Tool-Augmented Reasoning
by: Lee, Kuan-Yi, et al.
Published: (2025)
by: Lee, Kuan-Yi, et al.
Published: (2025)
Continuous Audio Language Models
by: Rouard, Simon, et al.
Published: (2025)
by: Rouard, Simon, et al.
Published: (2025)
The World is Not Mono: Enabling Spatial Understanding in Large Audio-Language Models
by: You, Yuhuan, et al.
Published: (2026)
by: You, Yuhuan, et al.
Published: (2026)
AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders
by: Aparin, Georgii, et al.
Published: (2026)
by: Aparin, Georgii, et al.
Published: (2026)
Weakly-supervised Audio Separation via Bi-modal Semantic Similarity
by: Mahmud, Tanvir, et al.
Published: (2024)
by: Mahmud, Tanvir, et al.
Published: (2024)
Audio Mamba: Pretrained Audio State Space Model For Audio Tagging
by: Lin, Jiaju, et al.
Published: (2024)
by: Lin, Jiaju, et al.
Published: (2024)
Eureka-Audio: Triggering Audio Intelligence in Compact Language Models
by: Zhang, Dan, et al.
Published: (2026)
by: Zhang, Dan, et al.
Published: (2026)
ERIS: Evolutionary Real-world Interference Scheme for Jailbreaking Audio Large Models
by: Zhang, Yibo, et al.
Published: (2025)
by: Zhang, Yibo, et al.
Published: (2025)
Temporal Contrastive Decoding: A Training-Free Method for Large Audio-Language Models
by: Li, Yanda, et al.
Published: (2026)
by: Li, Yanda, et al.
Published: (2026)
Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering
by: Glazer, Neta, et al.
Published: (2026)
by: Glazer, Neta, et al.
Published: (2026)
HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models
by: Zhao, Feiyu, et al.
Published: (2026)
by: Zhao, Feiyu, et al.
Published: (2026)
EvA: An Evidence-First Audio Understanding Paradigm for LALMs
by: Xie, Xinyuan, et al.
Published: (2026)
by: Xie, Xinyuan, et al.
Published: (2026)
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
by: Ye, Zhen, et al.
Published: (2024)
by: Ye, Zhen, et al.
Published: (2024)
The Sonar Moment: Benchmarking Audio-Language Models in Audio Geo-Localization
by: Zhang, Ruixing, et al.
Published: (2026)
by: Zhang, Ruixing, et al.
Published: (2026)
EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs
by: Lin, Liang, et al.
Published: (2026)
by: Lin, Liang, et al.
Published: (2026)
Rebellion: Noise-Robust Reasoning Training for Audio Reasoning Models
by: Huang, Tiansheng, et al.
Published: (2025)
by: Huang, Tiansheng, et al.
Published: (2025)
Training-Efficient Text-to-Music Generation with State-Space Modeling
by: Lee, Wei-Jaw, et al.
Published: (2026)
by: Lee, Wei-Jaw, et al.
Published: (2026)
SCENEBench: An Audio Understanding Benchmark Grounded in Assistive and Industrial Use Cases
by: Iyer, Laya, et al.
Published: (2026)
by: Iyer, Laya, et al.
Published: (2026)
MOSS-Audio Technical Report
by: Yang, Chen, et al.
Published: (2026)
by: Yang, Chen, et al.
Published: (2026)
Generalizable Prompt Tuning for Audio-Language Models via Semantic Expansion
by: Jang, Jaehyuk, et al.
Published: (2026)
by: Jang, Jaehyuk, et al.
Published: (2026)
VocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Models
by: Chen, Yukun, et al.
Published: (2026)
by: Chen, Yukun, et al.
Published: (2026)
Evaluation of Audio Language Models for Fairness, Safety, and Security
by: Aloufi, Ranya, et al.
Published: (2026)
by: Aloufi, Ranya, et al.
Published: (2026)
AudioRouter: Data Efficient Audio Understanding via RL based Dual Reasoning
by: Chen, Liyang, et al.
Published: (2026)
by: Chen, Liyang, et al.
Published: (2026)
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
by: Song, Zirui, et al.
Published: (2025)
by: Song, Zirui, et al.
Published: (2025)
Latent-Mark: An Audio Watermark Robust to Neural Resynthesis
by: Chen, Yen-Shan, et al.
Published: (2026)
by: Chen, Yen-Shan, et al.
Published: (2026)
Extending Audio Context for Long-Form Understanding in Large Audio-Language Models
by: Chaichana, Yuatyong, et al.
Published: (2025)
by: Chaichana, Yuatyong, et al.
Published: (2025)
Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models
by: Gao, Kuofeng, et al.
Published: (2024)
by: Gao, Kuofeng, et al.
Published: (2024)
Quantum-Trained Convolutional Neural Network for Deepfake Audio Detection
by: Lin, Chu-Hsuan Abraham, et al.
Published: (2024)
by: Lin, Chu-Hsuan Abraham, et al.
Published: (2024)
Representation-Regularized Convolutional Audio Transformer for Audio Understanding
by: Han, Bing, et al.
Published: (2026)
by: Han, Bing, et al.
Published: (2026)
AudioGuard: Toward Comprehensive Audio Safety Protection Across Diverse Threat Models
by: Kang, Mintong, et al.
Published: (2026)
by: Kang, Mintong, et al.
Published: (2026)
CALM: Class-Conditional Sparse Attention Vectors for Large Audio-Language Models
by: Mehta, Videet, et al.
Published: (2026)
by: Mehta, Videet, et al.
Published: (2026)
PitchBench: Measuring Pitch Hearing in Audio-Language Models
by: Dujardin, Milan Liessens, et al.
Published: (2026)
by: Dujardin, Milan Liessens, et al.
Published: (2026)
UniWhisper: Efficient Continual Multi-task Training for Robust Universal Audio Representation
by: Chen, Yuxuan, et al.
Published: (2026)
by: Chen, Yuxuan, et al.
Published: (2026)
SLAP: Scalable Language-Audio Pretraining with Variable-Duration Audio and Multi-Objective Training
by: Mei, Xinhao, et al.
Published: (2026)
by: Mei, Xinhao, et al.
Published: (2026)
Similar Items
-
Adaptive Test-Time Scaling for Zero-Shot Respiratory Audio Classification
by: Wang, Tsai-Ning, et al.
Published: (2026) -
CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning
by: Wang, Tsai-Ning, et al.
Published: (2025) -
StethoLM: Audio Language Model for Cardiopulmonary Analysis Across Clinical Tasks
by: Wang, Yishan, et al.
Published: (2026) -
RespLLM: Unifying Audio and Text with Multimodal LLMs for Generalized Respiratory Health Prediction
by: Zhang, Yuwei, et al.
Published: (2024) -
Towards Fine-grained Temporal Perception: Post-Training Large Audio-Language Models with Audio-Side Time Prompt
by: Shi, Yanfeng, et al.
Published: (2026)