Saved in:
| Main Authors: | Li, Kai, Luo, Yi |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.08514 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion
by: Pegg, Samuel, et al.
Published: (2024)
by: Pegg, Samuel, et al.
Published: (2024)
DreamAudio: Customized Text-to-Audio Generation with Diffusion Models
by: Yuan, Yi, et al.
Published: (2025)
by: Yuan, Yi, et al.
Published: (2025)
AHAMask: Reliable Task Specification for Large Audio Language Models without Instructions
by: Guo, Yiwei, et al.
Published: (2025)
by: Guo, Yiwei, et al.
Published: (2025)
Efficient Autoregressive Audio Modeling via Next-Scale Prediction
by: Qiu, Kai, et al.
Published: (2024)
by: Qiu, Kai, et al.
Published: (2024)
Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?
by: Xie, Yuankun, et al.
Published: (2024)
by: Xie, Yuankun, et al.
Published: (2024)
Neural-Enhanced Dynamic Range Compression Inversion: A Hybrid Approach for Restoring Audio Dynamics
by: Sun, Haoran, et al.
Published: (2024)
by: Sun, Haoran, et al.
Published: (2024)
UltraEval-Audio: A Unified Framework for Comprehensive Evaluation of Audio Foundation Models
by: Shi, Qundong, et al.
Published: (2026)
by: Shi, Qundong, et al.
Published: (2026)
Audio Mamba: Pretrained Audio State Space Model For Audio Tagging
by: Lin, Jiaju, et al.
Published: (2024)
by: Lin, Jiaju, et al.
Published: (2024)
AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models
by: Li, Wenyu, et al.
Published: (2025)
by: Li, Wenyu, et al.
Published: (2025)
AudioScene: Integrating Object-Event Audio into 3D Scenes
by: Yuan, Shuaihang, et al.
Published: (2025)
by: Yuan, Shuaihang, et al.
Published: (2025)
Enhancing Partially Spoofed Audio Localization with Boundary-aware Attention Mechanism
by: Zhong, Jiafeng, et al.
Published: (2024)
by: Zhong, Jiafeng, et al.
Published: (2024)
SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection
by: Zhu, Yi, et al.
Published: (2024)
by: Zhu, Yi, et al.
Published: (2024)
Reject Threshold Adaptation for Open-Set Model Attribution of Deepfake Audio
by: Yan, Xinrui, et al.
Published: (2024)
by: Yan, Xinrui, et al.
Published: (2024)
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
by: Song, Zirui, et al.
Published: (2025)
by: Song, Zirui, et al.
Published: (2025)
Who Can Withstand Chat-Audio Attacks? An Evaluation Benchmark for Large Audio-Language Models
by: Yang, Wanqi, et al.
Published: (2024)
by: Yang, Wanqi, et al.
Published: (2024)
HyperPotter: Spell the Charm of High-Order Interactions in Audio Deepfake Detection
by: Wen, Qing, et al.
Published: (2026)
by: Wen, Qing, et al.
Published: (2026)
AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2025)
by: Yang, Chih-Kai, et al.
Published: (2025)
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
by: Erol, Mehmet Hamza, et al.
Published: (2024)
by: Erol, Mehmet Hamza, et al.
Published: (2024)
Audio Deepfake Attribution: An Initial Dataset and Investigation
by: Yan, Xinrui, et al.
Published: (2022)
by: Yan, Xinrui, et al.
Published: (2022)
Representation-Regularized Convolutional Audio Transformer for Audio Understanding
by: Han, Bing, et al.
Published: (2026)
by: Han, Bing, et al.
Published: (2026)
Robust Generative Audio Quality Assessment: Disentangling Quality from Spurious Correlations
by: Huang, Kuan-Tang, et al.
Published: (2026)
by: Huang, Kuan-Tang, et al.
Published: (2026)
How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?
by: Liu, Tianchi, et al.
Published: (2024)
by: Liu, Tianchi, et al.
Published: (2024)
The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio
by: Xie, Yuankun, et al.
Published: (2024)
by: Xie, Yuankun, et al.
Published: (2024)
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
by: An, Keyu, et al.
Published: (2024)
by: An, Keyu, et al.
Published: (2024)
Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
by: Wang, Jun, et al.
Published: (2025)
by: Wang, Jun, et al.
Published: (2025)
LHQ-SVC: Lightweight and High Quality Singing Voice Conversion Modeling
by: Huang, Yubo, et al.
Published: (2024)
by: Huang, Yubo, et al.
Published: (2024)
Audio Explanation Synthesis with Generative Foundation Models
by: Akman, Alican, et al.
Published: (2024)
by: Akman, Alican, et al.
Published: (2024)
Audio Atlas: Visualizing and Exploring Audio Datasets
by: Lanzendörfer, Luca A., et al.
Published: (2024)
by: Lanzendörfer, Luca A., et al.
Published: (2024)
AudioRouter: Data Efficient Audio Understanding via RL based Dual Reasoning
by: Chen, Liyang, et al.
Published: (2026)
by: Chen, Liyang, et al.
Published: (2026)
MoE Adapter for Large Audio Language Models: Sparsity, Disentanglement, and Gradient-Conflict-Free
by: Lei, Yishu, et al.
Published: (2026)
by: Lei, Yishu, et al.
Published: (2026)
Decoding Ambiguous Emotions with Test-Time Scaling in Audio-Language Models
by: Jia, Hong, et al.
Published: (2026)
by: Jia, Hong, et al.
Published: (2026)
Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio
by: Lu, Yi, et al.
Published: (2024)
by: Lu, Yi, et al.
Published: (2024)
Audio Spatially-Guided Fusion for Audio-Visual Navigation
by: Zhou, Xinyu, et al.
Published: (2026)
by: Zhou, Xinyu, et al.
Published: (2026)
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
by: Kim, Jaeyeon, et al.
Published: (2024)
by: Kim, Jaeyeon, et al.
Published: (2024)
AND: Audio Network Dissection for Interpreting Deep Acoustic Models
by: Wu, Tung-Yu, et al.
Published: (2024)
by: Wu, Tung-Yu, et al.
Published: (2024)
FoleyBench: A Benchmark For Video-to-Audio Models
by: Dixit, Satvik, et al.
Published: (2025)
by: Dixit, Satvik, et al.
Published: (2025)
Expressive Range Characterization of Open Text-to-Audio Models
by: Morse, Jonathan, et al.
Published: (2025)
by: Morse, Jonathan, et al.
Published: (2025)
Universal Sound Separation with Self-Supervised Audio Masked Autoencoder
by: Zhao, Junqi, et al.
Published: (2024)
by: Zhao, Junqi, et al.
Published: (2024)
OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model
by: Li, Maomao, et al.
Published: (2026)
by: Li, Maomao, et al.
Published: (2026)
AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion
by: Zhao, Junqi, et al.
Published: (2025)
by: Zhao, Junqi, et al.
Published: (2025)
Similar Items
-
TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion
by: Pegg, Samuel, et al.
Published: (2024) -
DreamAudio: Customized Text-to-Audio Generation with Diffusion Models
by: Yuan, Yi, et al.
Published: (2025) -
AHAMask: Reliable Task Specification for Large Audio Language Models without Instructions
by: Guo, Yiwei, et al.
Published: (2025) -
Efficient Autoregressive Audio Modeling via Next-Scale Prediction
by: Qiu, Kai, et al.
Published: (2024) -
Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?
by: Xie, Yuankun, et al.
Published: (2024)