Saved in:
| Main Authors: | Hu, Xiaolin, Yuan, Hang, Sang, Xinzhu, Yan, Binbin, Yu, Zhou, Huang, Cong, Chen, Kai |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.04913 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DialogGraph-LLM: Graph-Informed LLMs for End-to-End Audio Dialogue Intent Recognition
by: Liu, HongYu, et al.
Published: (2025)
by: Liu, HongYu, et al.
Published: (2025)
TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion
by: Pegg, Samuel, et al.
Published: (2024)
by: Pegg, Samuel, et al.
Published: (2024)
Beyond Monologue: Interactive Talking-Listening Avatar Generation with Conversational Audio Context-Aware Kernels
by: Weng, Yuzhe, et al.
Published: (2026)
by: Weng, Yuzhe, et al.
Published: (2026)
An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits
by: Li, Kai, et al.
Published: (2022)
by: Li, Kai, et al.
Published: (2022)
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
by: Zhang, Qinglin, et al.
Published: (2024)
by: Zhang, Qinglin, et al.
Published: (2024)
A Fast and Lightweight Model for Causal Audio-Visual Speech Separation
by: Sang, Wendi, et al.
Published: (2025)
by: Sang, Wendi, et al.
Published: (2025)
Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models
by: Gao, Kuofeng, et al.
Published: (2024)
by: Gao, Kuofeng, et al.
Published: (2024)
AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models
by: Chen, Guangke, et al.
Published: (2025)
by: Chen, Guangke, et al.
Published: (2025)
LearnAFE: Circuit-Algorithm Co-design Framework for Learnable Audio Analog Front-End
by: Hu, Jinhai, et al.
Published: (2025)
by: Hu, Jinhai, et al.
Published: (2025)
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
by: Li, Kai, et al.
Published: (2025)
by: Li, Kai, et al.
Published: (2025)
HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models
by: Zhao, Feiyu, et al.
Published: (2026)
by: Zhao, Feiyu, et al.
Published: (2026)
AHAMask: Reliable Task Specification for Large Audio Language Models without Instructions
by: Guo, Yiwei, et al.
Published: (2025)
by: Guo, Yiwei, et al.
Published: (2025)
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
by: Huang, Ailin, et al.
Published: (2025)
by: Huang, Ailin, et al.
Published: (2025)
DGFNet: End-to-End Audio-Visual Source Separation Based on Dynamic Gating Fusion
by: Yu, Yinfeng, et al.
Published: (2025)
by: Yu, Yinfeng, et al.
Published: (2025)
SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models
by: Yin, Han, et al.
Published: (2025)
by: Yin, Han, et al.
Published: (2025)
A Framework for Synthetic Audio Conversations Generation using Large Language Models
by: Kyaw, Kaung Myat, et al.
Published: (2024)
by: Kyaw, Kaung Myat, et al.
Published: (2024)
Eureka-Audio: Triggering Audio Intelligence in Compact Language Models
by: Zhang, Dan, et al.
Published: (2026)
by: Zhang, Dan, et al.
Published: (2026)
Audio-Maestro: Enhancing Large Audio-Language Models with Tool-Augmented Reasoning
by: Lee, Kuan-Yi, et al.
Published: (2025)
by: Lee, Kuan-Yi, et al.
Published: (2025)
Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models
by: Ieong, Lok-Lam, et al.
Published: (2026)
by: Ieong, Lok-Lam, et al.
Published: (2026)
Towards Fine-grained Temporal Perception: Post-Training Large Audio-Language Models with Audio-Side Time Prompt
by: Shi, Yanfeng, et al.
Published: (2026)
by: Shi, Yanfeng, et al.
Published: (2026)
SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios
by: Li, Kai, et al.
Published: (2024)
by: Li, Kai, et al.
Published: (2024)
Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation
by: Feng, Bo-Han, et al.
Published: (2026)
by: Feng, Bo-Han, et al.
Published: (2026)
Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation
by: Chen, Guo, et al.
Published: (2025)
by: Chen, Guo, et al.
Published: (2025)
Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models
by: Goel, Arushi, et al.
Published: (2025)
by: Goel, Arushi, et al.
Published: (2025)
UniLS: End-to-End Audio-Driven Avatars for Unified Listening and Speaking
by: Chu, Xuangeng, et al.
Published: (2025)
by: Chu, Xuangeng, et al.
Published: (2025)
VAPO: End-to-end Slide-Enhanced Speech Recognition with Omni-modal Large Language Models
by: Hu, Rui, et al.
Published: (2025)
by: Hu, Rui, et al.
Published: (2025)
Multimodal Emotion Regression with Multi-Objective Optimization and VAD-Aware Audio Modeling for the 10th ABAW EMI Track
by: Huang, Jiawen, et al.
Published: (2026)
by: Huang, Jiawen, et al.
Published: (2026)
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
by: Song, Zirui, et al.
Published: (2025)
by: Song, Zirui, et al.
Published: (2025)
The World is Not Mono: Enabling Spatial Understanding in Large Audio-Language Models
by: You, Yuhuan, et al.
Published: (2026)
by: You, Yuhuan, et al.
Published: (2026)
Do Audio-Visual Large Language Models Really See and Hear?
by: Selvakumar, Ramaneswaran, et al.
Published: (2026)
by: Selvakumar, Ramaneswaran, et al.
Published: (2026)
OWL: Geometry-Aware Spatial Reasoning for Audio Large Language Models
by: Biswas, Subrata, et al.
Published: (2025)
by: Biswas, Subrata, et al.
Published: (2025)
VocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Models
by: Chen, Yukun, et al.
Published: (2026)
by: Chen, Yukun, et al.
Published: (2026)
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2025)
by: Yang, Chih-Kai, et al.
Published: (2025)
MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2026)
by: Yang, Chih-Kai, et al.
Published: (2026)
Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering
by: Glazer, Neta, et al.
Published: (2026)
by: Glazer, Neta, et al.
Published: (2026)
Neuro-MSBG: An End-to-End Neural Model for Hearing Loss Simulation
by: Yuan, Hui-Guan, et al.
Published: (2025)
by: Yuan, Hui-Guan, et al.
Published: (2025)
AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2025)
by: Yang, Chih-Kai, et al.
Published: (2025)
UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction
by: Li, Yadong, et al.
Published: (2026)
by: Li, Yadong, et al.
Published: (2026)
Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations
by: Feng, Bo-Han, et al.
Published: (2025)
by: Feng, Bo-Han, et al.
Published: (2025)
CALM: Class-Conditional Sparse Attention Vectors for Large Audio-Language Models
by: Mehta, Videet, et al.
Published: (2026)
by: Mehta, Videet, et al.
Published: (2026)
Similar Items
-
DialogGraph-LLM: Graph-Informed LLMs for End-to-End Audio Dialogue Intent Recognition
by: Liu, HongYu, et al.
Published: (2025) -
TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion
by: Pegg, Samuel, et al.
Published: (2024) -
Beyond Monologue: Interactive Talking-Listening Avatar Generation with Conversational Audio Context-Aware Kernels
by: Weng, Yuzhe, et al.
Published: (2026) -
An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits
by: Li, Kai, et al.
Published: (2022) -
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
by: Zhang, Qinglin, et al.
Published: (2024)