Saved in:
| Main Authors: | Sadhu, Shanmuka, Wang, Weiran |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.10391 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
EMO-TTA: Improving Test-Time Adaptation of Audio-Language Models for Speech Emotion Recognition
by: Shi, Jiacheng, et al.
Published: (2025)
by: Shi, Jiacheng, et al.
Published: (2025)
Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization
by: Torres, Bernardo, et al.
Published: (2025)
by: Torres, Bernardo, et al.
Published: (2025)
Representation-Regularized Convolutional Audio Transformer for Audio Understanding
by: Han, Bing, et al.
Published: (2026)
by: Han, Bing, et al.
Published: (2026)
DegDiT: Controllable Audio Generation with Dynamic Event Graph Guided Diffusion Transformer
by: Liu, Yisu, et al.
Published: (2025)
by: Liu, Yisu, et al.
Published: (2025)
Context and Transcripts Improve Detection of Deepfake Audios of Public Figures
by: Gao, Chongyang, et al.
Published: (2026)
by: Gao, Chongyang, et al.
Published: (2026)
SupCLAP: Controlling Optimization Trajectory Drift in Audio-Text Contrastive Learning with Support Vector Regularization
by: Luo, Jiehui, et al.
Published: (2025)
by: Luo, Jiehui, et al.
Published: (2025)
AudioScene: Integrating Object-Event Audio into 3D Scenes
by: Yuan, Shuaihang, et al.
Published: (2025)
by: Yuan, Shuaihang, et al.
Published: (2025)
DialogGraph-LLM: Graph-Informed LLMs for End-to-End Audio Dialogue Intent Recognition
by: Liu, HongYu, et al.
Published: (2025)
by: Liu, HongYu, et al.
Published: (2025)
AudioMotionBench: Evaluating Auditory Motion Perception in Audio LLMs
by: Sun, Zhe, et al.
Published: (2025)
by: Sun, Zhe, et al.
Published: (2025)
AudioMoG: Guiding Audio Generation with Mixture-of-Guidance
by: Wang, Junyou, et al.
Published: (2025)
by: Wang, Junyou, et al.
Published: (2025)
Task Decoding based on Eye Movements using Synthetic Data Augmentation
by: Sadhu, Shanmuka, et al.
Published: (2025)
by: Sadhu, Shanmuka, et al.
Published: (2025)
Eureka-Audio: Triggering Audio Intelligence in Compact Language Models
by: Zhang, Dan, et al.
Published: (2026)
by: Zhang, Dan, et al.
Published: (2026)
Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering
by: Glazer, Neta, et al.
Published: (2026)
by: Glazer, Neta, et al.
Published: (2026)
Audio-Guided Dynamic Modality Fusion with Stereo-Aware Attention for Audio-Visual Navigation
by: Li, Jia, et al.
Published: (2025)
by: Li, Jia, et al.
Published: (2025)
Interpretable All-Type Audio Deepfake Detection with Audio LLMs via Frequency-Time Reinforcement Learning
by: Xie, Yuankun, et al.
Published: (2026)
by: Xie, Yuankun, et al.
Published: (2026)
A Neural Model for Contextual Biasing Score Learning and Filtering
by: Huang, Wanting, et al.
Published: (2025)
by: Huang, Wanting, et al.
Published: (2025)
MOSS-Audio Technical Report
by: Yang, Chen, et al.
Published: (2026)
by: Yang, Chen, et al.
Published: (2026)
AudioCapBench: Quick Evaluation on Audio Captioning across Sound, Music, and Speech
by: Qiu, Jielin, et al.
Published: (2026)
by: Qiu, Jielin, et al.
Published: (2026)
WESR: Scaling and Evaluating Word-level Event-Speech Recognition
by: Yang, Chenchen, et al.
Published: (2026)
by: Yang, Chenchen, et al.
Published: (2026)
MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
by: Li, Xiquan, et al.
Published: (2025)
by: Li, Xiquan, et al.
Published: (2025)
The Sonar Moment: Benchmarking Audio-Language Models in Audio Geo-Localization
by: Zhang, Ruixing, et al.
Published: (2026)
by: Zhang, Ruixing, et al.
Published: (2026)
AAT: Adapting Audio Transformer for Various Acoustics Recognition Tasks
by: Liang, Yun, et al.
Published: (2024)
by: Liang, Yun, et al.
Published: (2024)
PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-Audio-Video Generation
by: Xie, Tianxin, et al.
Published: (2025)
by: Xie, Tianxin, et al.
Published: (2025)
Stable Audio 3
by: Evans, Zach, et al.
Published: (2026)
by: Evans, Zach, et al.
Published: (2026)
Audio-Maestro: Enhancing Large Audio-Language Models with Tool-Augmented Reasoning
by: Lee, Kuan-Yi, et al.
Published: (2025)
by: Lee, Kuan-Yi, et al.
Published: (2025)
AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders
by: Aparin, Georgii, et al.
Published: (2026)
by: Aparin, Georgii, et al.
Published: (2026)
AudioGuard: Toward Comprehensive Audio Safety Protection Across Diverse Threat Models
by: Kang, Mintong, et al.
Published: (2026)
by: Kang, Mintong, et al.
Published: (2026)
HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models
by: Zhao, Feiyu, et al.
Published: (2026)
by: Zhao, Feiyu, et al.
Published: (2026)
MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition
by: Wang, He, et al.
Published: (2024)
by: Wang, He, et al.
Published: (2024)
HuBERT-VIC: Improving Noise-Robust Automatic Speech Recognition of Speech Foundation Model via Variance-Invariance-Covariance Regularization
by: Ahn, Hyebin, et al.
Published: (2025)
by: Ahn, Hyebin, et al.
Published: (2025)
Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
by: Xiong, Chenxu, et al.
Published: (2024)
by: Xiong, Chenxu, et al.
Published: (2024)
FLM-Audio: Natural Monologues Improves Native Full-Duplex Chatbots via Dual Training
by: Yao, Yiqun, et al.
Published: (2025)
by: Yao, Yiqun, et al.
Published: (2025)
Self Voice Conversion as an Attack against Neural Audio Watermarking
by: Özer, Yigitcan, et al.
Published: (2026)
by: Özer, Yigitcan, et al.
Published: (2026)
SCENEBench: An Audio Understanding Benchmark Grounded in Assistive and Industrial Use Cases
by: Iyer, Laya, et al.
Published: (2026)
by: Iyer, Laya, et al.
Published: (2026)
AT-ADD: All-Type Audio Deepfake Detection Challenge Evaluation Plan
by: Xie, Yuankun, et al.
Published: (2026)
by: Xie, Yuankun, et al.
Published: (2026)
AudioMosaic: Contrastive Masked Audio Representation Learning
by: Huang, Hanxun, et al.
Published: (2026)
by: Huang, Hanxun, et al.
Published: (2026)
EvA: An Evidence-First Audio Understanding Paradigm for LALMs
by: Xie, Xinyuan, et al.
Published: (2026)
by: Xie, Xinyuan, et al.
Published: (2026)
Codec-Robust Attacks on Audio LLMs
by: Roh, Jaechul, et al.
Published: (2026)
by: Roh, Jaechul, et al.
Published: (2026)
AudioCodecBench: A Comprehensive Benchmark for Audio Codec Evaluation
by: Wang, Lu, et al.
Published: (2025)
by: Wang, Lu, et al.
Published: (2025)
Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event Detection
by: Cai, Pengfei, et al.
Published: (2024)
by: Cai, Pengfei, et al.
Published: (2024)
Similar Items
-
EMO-TTA: Improving Test-Time Adaptation of Audio-Language Models for Speech Emotion Recognition
by: Shi, Jiacheng, et al.
Published: (2025) -
Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization
by: Torres, Bernardo, et al.
Published: (2025) -
Representation-Regularized Convolutional Audio Transformer for Audio Understanding
by: Han, Bing, et al.
Published: (2026) -
DegDiT: Controllable Audio Generation with Dynamic Event Graph Guided Diffusion Transformer
by: Liu, Yisu, et al.
Published: (2025) -
Context and Transcripts Improve Detection of Deepfake Audios of Public Figures
by: Gao, Chongyang, et al.
Published: (2026)