Saved in:
| Main Authors: | Tan, Zhenxiong, Ma, Xinyin, Fang, Gongfan, Wang, Xinchao |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.10468 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AFSS: Artifact-Focused Self-Synthesis for Mitigating Bias in Audio Deepfake Detection
by: Nguyen-Le, Hai-Son, et al.
Published: (2026)
by: Nguyen-Le, Hai-Son, et al.
Published: (2026)
Inference-time Scaling for Diffusion-based Audio Super-resolution
by: Jin, Yizhu, et al.
Published: (2025)
by: Jin, Yizhu, et al.
Published: (2025)
Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding
by: Lin, Zijian, et al.
Published: (2025)
by: Lin, Zijian, et al.
Published: (2025)
MuseControlLite: Multifunctional Music Generation with Lightweight Conditioners
by: Tsai, Fang-Duo, et al.
Published: (2025)
by: Tsai, Fang-Duo, et al.
Published: (2025)
DreamAudio: Customized Text-to-Audio Generation with Diffusion Models
by: Yuan, Yi, et al.
Published: (2025)
by: Yuan, Yi, et al.
Published: (2025)
AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion
by: Zhao, Junqi, et al.
Published: (2025)
by: Zhao, Junqi, et al.
Published: (2025)
Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations
by: Yadav, Sarthak, et al.
Published: (2024)
by: Yadav, Sarthak, et al.
Published: (2024)
Audio Codec Augmentation for Robust Collaborative Watermarking of Speech Synthesis
by: Juvela, Lauri, et al.
Published: (2024)
by: Juvela, Lauri, et al.
Published: (2024)
RDSinger: Reference-based Diffusion Network for Singing Voice Synthesis
by: Sui, Kehan, et al.
Published: (2024)
by: Sui, Kehan, et al.
Published: (2024)
Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing
by: Wang, Mengqi, et al.
Published: (2025)
by: Wang, Mengqi, et al.
Published: (2025)
AudioScene: Integrating Object-Event Audio into 3D Scenes
by: Yuan, Shuaihang, et al.
Published: (2025)
by: Yuan, Shuaihang, et al.
Published: (2025)
Audio Explanation Synthesis with Generative Foundation Models
by: Akman, Alican, et al.
Published: (2024)
by: Akman, Alican, et al.
Published: (2024)
Gibberish is All You Need for Membership Inference Detection in Contrastive Language-Audio Pretraining
by: Cheng, Ruoxi, et al.
Published: (2024)
by: Cheng, Ruoxi, et al.
Published: (2024)
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
by: Nguyen, Tan Dat, et al.
Published: (2024)
by: Nguyen, Tan Dat, et al.
Published: (2024)
Who Can Withstand Chat-Audio Attacks? An Evaluation Benchmark for Large Audio-Language Models
by: Yang, Wanqi, et al.
Published: (2024)
by: Yang, Wanqi, et al.
Published: (2024)
DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models
by: Wilkinghoff, Kevin, et al.
Published: (2025)
by: Wilkinghoff, Kevin, et al.
Published: (2025)
AudioMAE++: learning better masked audio representations with SwiGLU FFNs
by: Yadav, Sarthak, et al.
Published: (2025)
by: Yadav, Sarthak, et al.
Published: (2025)
Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction
by: Pan, Zexu, et al.
Published: (2025)
by: Pan, Zexu, et al.
Published: (2025)
Representation-Regularized Convolutional Audio Transformer for Audio Understanding
by: Han, Bing, et al.
Published: (2026)
by: Han, Bing, et al.
Published: (2026)
Estimating Musical Surprisal from Audio in Autoregressive Diffusion Model Noise Spaces
by: Bjare, Mathias Rose, et al.
Published: (2025)
by: Bjare, Mathias Rose, et al.
Published: (2025)
DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models
by: wu, Weihao, et al.
Published: (2025)
by: wu, Weihao, et al.
Published: (2025)
Audio Mamba: Pretrained Audio State Space Model For Audio Tagging
by: Lin, Jiaju, et al.
Published: (2024)
by: Lin, Jiaju, et al.
Published: (2024)
CoLLAP: Contrastive Long-form Language-Audio Pretraining with Musical Temporal Structure Augmentation
by: Wu, Junda, et al.
Published: (2024)
by: Wu, Junda, et al.
Published: (2024)
DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization
by: Lee, Geonyoung, et al.
Published: (2025)
by: Lee, Geonyoung, et al.
Published: (2025)
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
by: Song, Zirui, et al.
Published: (2025)
by: Song, Zirui, et al.
Published: (2025)
DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis
by: Hong, Fa-Ting, et al.
Published: (2024)
by: Hong, Fa-Ting, et al.
Published: (2024)
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis
by: Liu, Weizhi, et al.
Published: (2024)
by: Liu, Weizhi, et al.
Published: (2024)
UltraEval-Audio: A Unified Framework for Comprehensive Evaluation of Audio Foundation Models
by: Shi, Qundong, et al.
Published: (2026)
by: Shi, Qundong, et al.
Published: (2026)
Audio Atlas: Visualizing and Exploring Audio Datasets
by: Lanzendörfer, Luca A., et al.
Published: (2024)
by: Lanzendörfer, Luca A., et al.
Published: (2024)
Cross-Domain Audio Deepfake Detection: Dataset and Analysis
by: Li, Yuang, et al.
Published: (2024)
by: Li, Yuang, et al.
Published: (2024)
LJ-Spoof: A Generatively Varied Corpus for Audio Anti-Spoofing and Synthesis Source Tracing
by: Subramani, Surya, et al.
Published: (2026)
by: Subramani, Surya, et al.
Published: (2026)
FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation
by: Jiang, Yuxuan, et al.
Published: (2025)
by: Jiang, Yuxuan, et al.
Published: (2025)
AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
by: He, Peize, et al.
Published: (2025)
by: He, Peize, et al.
Published: (2025)
AudioRouter: Data Efficient Audio Understanding via RL based Dual Reasoning
by: Chen, Liyang, et al.
Published: (2026)
by: Chen, Liyang, et al.
Published: (2026)
Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?
by: Xie, Yuankun, et al.
Published: (2024)
by: Xie, Yuankun, et al.
Published: (2024)
Audio Spatially-Guided Fusion for Audio-Visual Navigation
by: Zhou, Xinyu, et al.
Published: (2026)
by: Zhou, Xinyu, et al.
Published: (2026)
Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation and Editing via Content-based Controls
by: Lin, Liwei, et al.
Published: (2024)
by: Lin, Liwei, et al.
Published: (2024)
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
by: Kim, Jaeyeon, et al.
Published: (2024)
by: Kim, Jaeyeon, et al.
Published: (2024)
How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?
by: Liu, Tianchi, et al.
Published: (2024)
by: Liu, Tianchi, et al.
Published: (2024)
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion
by: Chen, Shunian, et al.
Published: (2025)
by: Chen, Shunian, et al.
Published: (2025)
Similar Items
-
AFSS: Artifact-Focused Self-Synthesis for Mitigating Bias in Audio Deepfake Detection
by: Nguyen-Le, Hai-Son, et al.
Published: (2026) -
Inference-time Scaling for Diffusion-based Audio Super-resolution
by: Jin, Yizhu, et al.
Published: (2025) -
Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding
by: Lin, Zijian, et al.
Published: (2025) -
MuseControlLite: Multifunctional Music Generation with Lightweight Conditioners
by: Tsai, Fang-Duo, et al.
Published: (2025) -
DreamAudio: Customized Text-to-Audio Generation with Diffusion Models
by: Yuan, Yi, et al.
Published: (2025)