Saved in:
| Main Authors: | Sun, Yulin, Xu, Qisheng, Su, Yi, Zhu, Qian, Dou, Yong, Liu, Xinwang, Xu, Kele |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.15429 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AudioSet-EV: an AudioSet-derived distribution of Emergency Vehicle Siren sounds
by: Giacomelli, Stefano, et al.
Published: (2025)
by: Giacomelli, Stefano, et al.
Published: (2025)
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
by: Xu, Xuenan, et al.
Published: (2023)
by: Xu, Xuenan, et al.
Published: (2023)
Hierarchical Label Propagation: A Model-Size-Dependent Performance Booster for AudioSet Tagging
by: Tuncay, Ludovic, et al.
Published: (2025)
by: Tuncay, Ludovic, et al.
Published: (2025)
Audio-Language Models for Audio-Centric Tasks: A Systematic Survey
by: Su, Yi, et al.
Published: (2025)
by: Su, Yi, et al.
Published: (2025)
AudioCIL: A Python Toolbox for Audio Class-Incremental Learning with Multiple Scenes
by: Xu, Qisheng, et al.
Published: (2024)
by: Xu, Qisheng, et al.
Published: (2024)
Unify Variables in Neural Scaling Laws for General Audio Representations via Embedding Effective Rank
by: Deng, Xuyao, et al.
Published: (2025)
by: Deng, Xuyao, et al.
Published: (2025)
AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted Augmentations
by: Xu, David
Published: (2024)
by: Xu, David
Published: (2024)
Reject Threshold Adaptation for Open-Set Model Attribution of Deepfake Audio
by: Yan, Xinrui, et al.
Published: (2024)
by: Yan, Xinrui, et al.
Published: (2024)
Contrastive Learning-based Chaining-Cluster for Multilingual Voice-Face Association
by: Chen, Wuyang, et al.
Published: (2024)
by: Chen, Wuyang, et al.
Published: (2024)
Open-Set Source Tracing of Audio Deepfake Systems
by: Klein, Nicholas, et al.
Published: (2025)
by: Klein, Nicholas, et al.
Published: (2025)
Discrete Audio Representations for Automated Audio Captioning
by: Tian, Jingguang, et al.
Published: (2025)
by: Tian, Jingguang, et al.
Published: (2025)
Generalizable Audio Deepfake Detection via Latent Space Refinement and Augmentation
by: Huang, Wen, et al.
Published: (2025)
by: Huang, Wen, et al.
Published: (2025)
MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations
by: Guo, Wenxiang, et al.
Published: (2025)
by: Guo, Wenxiang, et al.
Published: (2025)
MMEDIT: A Unified Framework for Multi-Type Audio Editing via Audio Language Model
by: Tao, Ye, et al.
Published: (2025)
by: Tao, Ye, et al.
Published: (2025)
Whisper-AuT: Domain-Adapted Audio Encoder for Efficient Audio-LLM Training
by: Qiu, Jielin, et al.
Published: (2026)
by: Qiu, Jielin, et al.
Published: (2026)
AudioEval: Automatic Dual-Perspective and Multi-Dimensional Evaluation of Text-to-Audio-Generation
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection
by: Xiao, Yixuan, et al.
Published: (2026)
by: Xiao, Yixuan, et al.
Published: (2026)
The Sonar Moment: Benchmarking Audio-Language Models in Audio Geo-Localization
by: Zhang, Ruixing, et al.
Published: (2026)
by: Zhang, Ruixing, et al.
Published: (2026)
ATIR: Towards Audio-Text Interleaved Contextual Retrieval
by: Zhao, Tong, et al.
Published: (2026)
by: Zhao, Tong, et al.
Published: (2026)
Delayed Commitment for Representation Readiness in Stage-wise Audio-Visual Learning
by: Xu, Xinmeng, et al.
Published: (2026)
by: Xu, Xinmeng, et al.
Published: (2026)
Class-Incremental Learning for Multi-Label Audio Classification
by: Mulimani, Manjunath, et al.
Published: (2024)
by: Mulimani, Manjunath, et al.
Published: (2024)
Robust LLM-based Audio-Visual Speech Recognition with Sparse Modality Alignment and Visual Unit-Guided Refinement
by: Su, Fei, et al.
Published: (2026)
by: Su, Fei, et al.
Published: (2026)
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer
by: Hai, Jiarui, et al.
Published: (2024)
by: Hai, Jiarui, et al.
Published: (2024)
Region-Specific Audio Tagging for Spatial Sound
by: Zhao, Jinzheng, et al.
Published: (2025)
by: Zhao, Jinzheng, et al.
Published: (2025)
PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation
by: Shi, Shuchen, et al.
Published: (2024)
by: Shi, Shuchen, et al.
Published: (2024)
A Sensitivity Analysis of Multi-Event Audio Grounding in Audio LLMs
by: Lee, Taehan, et al.
Published: (2026)
by: Lee, Taehan, et al.
Published: (2026)
HoliAntiSpoof: Audio LLM for Holistic Speech Anti-Spoofing
by: Xu, Xuenan, et al.
Published: (2026)
by: Xu, Xuenan, et al.
Published: (2026)
MUKA: Multi Kernel Audio Adaptation Of Audio-Language Models
by: Bensaid, Reda, et al.
Published: (2026)
by: Bensaid, Reda, et al.
Published: (2026)
AudioGS: Spectrogram-Based Audio Gaussian Splatting for Sound Field Reconstruction
by: Bi, Chunhao, et al.
Published: (2026)
by: Bi, Chunhao, et al.
Published: (2026)
ChronosAudio: A Comprehensive Long-Audio Benchmark for Evaluating Audio-Large Language Models
by: Luo, Kaiwen, et al.
Published: (2026)
by: Luo, Kaiwen, et al.
Published: (2026)
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
by: Yang, Dongchao, et al.
Published: (2025)
by: Yang, Dongchao, et al.
Published: (2025)
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
by: Yang, Dongchao, et al.
Published: (2023)
by: Yang, Dongchao, et al.
Published: (2023)
Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval
by: Tsubaki, Shunsuke, et al.
Published: (2024)
by: Tsubaki, Shunsuke, et al.
Published: (2024)
Unmute the Patch Tokens: Rethinking Probing in Multi-Label Audio Classification
by: Rauch, Lukas, et al.
Published: (2025)
by: Rauch, Lukas, et al.
Published: (2025)
SRC-gAudio: Sampling-Rate-Controlled Audio Generation
by: Li, Chenxing, et al.
Published: (2024)
by: Li, Chenxing, et al.
Published: (2024)
Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
by: Rouditchenko, Andrew, et al.
Published: (2025)
by: Rouditchenko, Andrew, et al.
Published: (2025)
Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation
by: Guo, Hongming, et al.
Published: (2024)
by: Guo, Hongming, et al.
Published: (2024)
BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics
by: Rauch, Lukas, et al.
Published: (2024)
by: Rauch, Lukas, et al.
Published: (2024)
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
by: Zhang, Haomin, et al.
Published: (2025)
by: Zhang, Haomin, et al.
Published: (2025)
Eureka-Audio: Triggering Audio Intelligence in Compact Language Models
by: Zhang, Dan, et al.
Published: (2026)
by: Zhang, Dan, et al.
Published: (2026)
Similar Items
-
AudioSet-EV: an AudioSet-derived distribution of Emergency Vehicle Siren sounds
by: Giacomelli, Stefano, et al.
Published: (2025) -
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
by: Xu, Xuenan, et al.
Published: (2023) -
Hierarchical Label Propagation: A Model-Size-Dependent Performance Booster for AudioSet Tagging
by: Tuncay, Ludovic, et al.
Published: (2025) -
Audio-Language Models for Audio-Centric Tasks: A Systematic Survey
by: Su, Yi, et al.
Published: (2025) -
AudioCIL: A Python Toolbox for Audio Class-Incremental Learning with Multiple Scenes
by: Xu, Qisheng, et al.
Published: (2024)