Saved in:
| Main Authors: | Wang, Yunqiang, Na, Hengyuan, Wu, Di, Hu, Miao, Quan, Guocong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.09222 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation
by: Feng, Bo-Han, et al.
Published: (2026)
by: Feng, Bo-Han, et al.
Published: (2026)
Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization
by: Fang, Zheng, et al.
Published: (2026)
by: Fang, Zheng, et al.
Published: (2026)
Rethinking Multimodal Point Cloud Completion: A Completion-by-Correction Perspective
by: Luo, Wang, et al.
Published: (2025)
by: Luo, Wang, et al.
Published: (2025)
Codec-Robust Attacks on Audio LLMs
by: Roh, Jaechul, et al.
Published: (2026)
by: Roh, Jaechul, et al.
Published: (2026)
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
by: Song, Zirui, et al.
Published: (2025)
by: Song, Zirui, et al.
Published: (2025)
AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models
by: Chen, Guangke, et al.
Published: (2025)
by: Chen, Guangke, et al.
Published: (2025)
Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard
by: Yang, Yudong, et al.
Published: (2025)
by: Yang, Yudong, et al.
Published: (2025)
Multilingual and Multi-Accent Jailbreaking of Audio LLMs
by: Roh, Jaechul, et al.
Published: (2025)
by: Roh, Jaechul, et al.
Published: (2025)
AudioMosaic: Contrastive Masked Audio Representation Learning
by: Huang, Hanxun, et al.
Published: (2026)
by: Huang, Hanxun, et al.
Published: (2026)
HarmonicAttack: An Adaptive Cross-Domain Audio Watermark Removal
by: Li, Kexin, et al.
Published: (2025)
by: Li, Kexin, et al.
Published: (2025)
Interpretable All-Type Audio Deepfake Detection with Audio LLMs via Frequency-Time Reinforcement Learning
by: Xie, Yuankun, et al.
Published: (2026)
by: Xie, Yuankun, et al.
Published: (2026)
ERIS: Evolutionary Real-world Interference Scheme for Jailbreaking Audio Large Models
by: Zhang, Yibo, et al.
Published: (2025)
by: Zhang, Yibo, et al.
Published: (2025)
AudioMotionBench: Evaluating Auditory Motion Perception in Audio LLMs
by: Sun, Zhe, et al.
Published: (2025)
by: Sun, Zhe, et al.
Published: (2025)
Protecting Bystander Privacy via Selective Hearing in Audio LLMs
by: Zhan, Xiao, et al.
Published: (2025)
by: Zhan, Xiao, et al.
Published: (2025)
AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models
by: Kang, Mintong, et al.
Published: (2024)
by: Kang, Mintong, et al.
Published: (2024)
Self Voice Conversion as an Attack against Neural Audio Watermarking
by: Özer, Yigitcan, et al.
Published: (2026)
by: Özer, Yigitcan, et al.
Published: (2026)
Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models
by: Cheng, Hao, et al.
Published: (2025)
by: Cheng, Hao, et al.
Published: (2025)
Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning
by: Quelennec, Aurian, et al.
Published: (2025)
by: Quelennec, Aurian, et al.
Published: (2025)
Full-Frequency Temporal Patching and Structured Masking for Enhanced Audio Classification
by: Makineni, Aditya, et al.
Published: (2025)
by: Makineni, Aditya, et al.
Published: (2025)
Audio-Guided Dynamic Modality Fusion with Stereo-Aware Attention for Audio-Visual Navigation
by: Li, Jia, et al.
Published: (2025)
by: Li, Jia, et al.
Published: (2025)
Sirens' Whisper: Inaudible Near-Ultrasonic Jailbreaks of Speech-Driven LLMs
by: Ling, Zijian, et al.
Published: (2026)
by: Ling, Zijian, et al.
Published: (2026)
MATPAC++: Enhanced Masked Latent Prediction for Self-Supervised Audio Representation Learning
by: Quelennec, Aurian, et al.
Published: (2025)
by: Quelennec, Aurian, et al.
Published: (2025)
Unifying Speech Editing Detection and Content Localization via Prior-Enhanced Audio LLMs
by: Xue, Jun, et al.
Published: (2026)
by: Xue, Jun, et al.
Published: (2026)
Beyond Monologue: Interactive Talking-Listening Avatar Generation with Conversational Audio Context-Aware Kernels
by: Weng, Yuzhe, et al.
Published: (2026)
by: Weng, Yuzhe, et al.
Published: (2026)
Eureka-Audio: Triggering Audio Intelligence in Compact Language Models
by: Zhang, Dan, et al.
Published: (2026)
by: Zhang, Dan, et al.
Published: (2026)
Structured-Noise Masked Modeling for Video, Audio and Beyond
by: Bhowmik, Aritra, et al.
Published: (2025)
by: Bhowmik, Aritra, et al.
Published: (2025)
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models
by: Peng, Zifan, et al.
Published: (2025)
by: Peng, Zifan, et al.
Published: (2025)
Spectral Masking and Interpolation Attack (SMIA): A Black-box Adversarial Attack against Voice Authentication and Anti-Spoofing Systems
by: Kamel, Kamel, et al.
Published: (2025)
by: Kamel, Kamel, et al.
Published: (2025)
Speech Emotion Recognition via Entropy-Aware Score Selection
by: Chua, ChenYi, et al.
Published: (2025)
by: Chua, ChenYi, et al.
Published: (2025)
Universal Sound Separation with Self-Supervised Audio Masked Autoencoder
by: Zhao, Junqi, et al.
Published: (2024)
by: Zhao, Junqi, et al.
Published: (2024)
Breaking Audio Large Language Models by Attacking Only the Encoder: A Universal Targeted Latent-Space Audio Attack
by: Ziv, Roee, et al.
Published: (2025)
by: Ziv, Roee, et al.
Published: (2025)
PAL: Probing Audio Encoders via LLMs -- Audio Information Transfer into LLMs
by: Alex, Tony, et al.
Published: (2025)
by: Alex, Tony, et al.
Published: (2025)
Spatial-Aware Conditioned Fusion for Audio-Visual Navigation
by: Wu, Shaohang, et al.
Published: (2026)
by: Wu, Shaohang, et al.
Published: (2026)
Hierarchical Semantic Correlation-Aware Masked Autoencoder for Unsupervised Audio-Visual Representation Learning
by: Zeng, Donghuo, et al.
Published: (2026)
by: Zeng, Donghuo, et al.
Published: (2026)
MoE Adapter for Large Audio Language Models: Sparsity, Disentanglement, and Gradient-Conflict-Free
by: Lei, Yishu, et al.
Published: (2026)
by: Lei, Yishu, et al.
Published: (2026)
Speaker Verification with Speech-Aware LLMs: Evaluation and Augmentation
by: Thebaud, Thomas, et al.
Published: (2026)
by: Thebaud, Thomas, et al.
Published: (2026)
Domain-Agnostic Causal-Aware Audio Transformer for Infant Cry Classification
by: Owino, Geofrey, et al.
Published: (2025)
by: Owino, Geofrey, et al.
Published: (2025)
OWL: Geometry-Aware Spatial Reasoning for Audio Large Language Models
by: Biswas, Subrata, et al.
Published: (2025)
by: Biswas, Subrata, et al.
Published: (2025)
Towards Explicit Acoustic Evidence Perception in Audio LLMs for Speech Deepfake Detection
by: Guo, Xiaoxuan, et al.
Published: (2026)
by: Guo, Xiaoxuan, et al.
Published: (2026)
JASTIN: Aligning LLMs for Zero-Shot Audio and Speech Evaluation via Natural Language Instructions
by: Zhang, Leying, et al.
Published: (2026)
by: Zhang, Leying, et al.
Published: (2026)
Similar Items
-
Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation
by: Feng, Bo-Han, et al.
Published: (2026) -
Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization
by: Fang, Zheng, et al.
Published: (2026) -
Rethinking Multimodal Point Cloud Completion: A Completion-by-Correction Perspective
by: Luo, Wang, et al.
Published: (2025) -
Codec-Robust Attacks on Audio LLMs
by: Roh, Jaechul, et al.
Published: (2026) -
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
by: Song, Zirui, et al.
Published: (2025)