Saved in:
| Main Authors: | Wang, Xiaopeng, Lu, Yi, Qi, Xin, Wang, Zhiyong, Xie, Yuankun, Shi, Shuchen, Fu, Ruibo |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.17801 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection
by: Wang, Xiaopeng, et al.
Published: (2024)
by: Wang, Xiaopeng, et al.
Published: (2024)
Generalized Fake Audio Detection via Deep Stable Learning
by: Wang, Zhiyong, et al.
Published: (2024)
by: Wang, Zhiyong, et al.
Published: (2024)
EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech
by: Qi, Xin, et al.
Published: (2024)
by: Qi, Xin, et al.
Published: (2024)
Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
by: Wang, Zhiyong, et al.
Published: (2024)
by: Wang, Zhiyong, et al.
Published: (2024)
Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio
by: Lu, Yi, et al.
Published: (2024)
by: Lu, Yi, et al.
Published: (2024)
The FruitShell French synthesis system at the Blizzard 2023 Challenge
by: Qi, Xin, et al.
Published: (2023)
by: Qi, Xin, et al.
Published: (2023)
Temporal Variability and Multi-Viewed Self-Supervised Representations to Tackle the ASVspoof5 Deepfake Challenge
by: Xie, Yuankun, et al.
Published: (2024)
by: Xie, Yuankun, et al.
Published: (2024)
A Noval Feature via Color Quantisation for Fake Audio Detection
by: Wang, Zhiyong, et al.
Published: (2024)
by: Wang, Zhiyong, et al.
Published: (2024)
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
by: Qi, Xin, et al.
Published: (2024)
by: Qi, Xin, et al.
Published: (2024)
PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation
by: Shi, Shuchen, et al.
Published: (2024)
by: Shi, Shuchen, et al.
Published: (2024)
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
by: Fu, Ruibo, et al.
Published: (2024)
by: Fu, Ruibo, et al.
Published: (2024)
Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy
by: Xie, Yuankun, et al.
Published: (2024)
by: Xie, Yuankun, et al.
Published: (2024)
The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio
by: Xie, Yuankun, et al.
Published: (2024)
by: Xie, Yuankun, et al.
Published: (2024)
An audio-quality-based multi-strategy approach for target speaker extraction in the MISP 2023 Challenge
by: Han, Runduo, et al.
Published: (2024)
by: Han, Runduo, et al.
Published: (2024)
RPRA-ADD: Forgery Trace Enhancement-Driven Audio Deepfake Detection
by: Fu, Ruibo, et al.
Published: (2025)
by: Fu, Ruibo, et al.
Published: (2025)
Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?
by: Xie, Yuankun, et al.
Published: (2024)
by: Xie, Yuankun, et al.
Published: (2024)
Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition
by: Xie, Yuankun, et al.
Published: (2025)
by: Xie, Yuankun, et al.
Published: (2025)
Visual-based spatial audio generation system for multi-speaker environments
by: Liu, Xiaojing, et al.
Published: (2025)
by: Liu, Xiaojing, et al.
Published: (2025)
Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
by: Xiong, Chenxu, et al.
Published: (2024)
by: Xiong, Chenxu, et al.
Published: (2024)
SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription
by: Grossman, Raymond, et al.
Published: (2025)
by: Grossman, Raymond, et al.
Published: (2025)
Text adaptation for speaker verification with speaker-text factorized embeddings
by: Yang, Yexin, et al.
Published: (2025)
by: Yang, Yexin, et al.
Published: (2025)
Spectral or spatial? Leveraging both for speaker extraction in challenging data conditions
by: Eisenberg, Aviad, et al.
Published: (2025)
by: Eisenberg, Aviad, et al.
Published: (2025)
Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing
by: Li, Jingbei, et al.
Published: (2023)
by: Li, Jingbei, et al.
Published: (2023)
End-to-end multi-channel speaker extraction and binaural speech synthesis
by: Chi, Cheng, et al.
Published: (2024)
by: Chi, Cheng, et al.
Published: (2024)
Hierarchical speaker representation for target speaker extraction
by: He, Shulin, et al.
Published: (2022)
by: He, Shulin, et al.
Published: (2022)
Non-autoregressive real-time Accent Conversion model with voice cloning
by: Nechaev, Vladimir, et al.
Published: (2024)
by: Nechaev, Vladimir, et al.
Published: (2024)
How phonemes contribute to deep speaker models?
by: Li, Pengqi, et al.
Published: (2024)
by: Li, Pengqi, et al.
Published: (2024)
Improving curriculum learning for target speaker extraction with synthetic speakers
by: Liu, Yun, et al.
Published: (2024)
by: Liu, Yun, et al.
Published: (2024)
Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages
by: Arora, Akshit, et al.
Published: (2024)
by: Arora, Akshit, et al.
Published: (2024)
A Benchmark for Multi-speaker Anonymization
by: Miao, Xiaoxiao, et al.
Published: (2024)
by: Miao, Xiaoxiao, et al.
Published: (2024)
Why disentanglement-based speaker anonymization systems fail at preserving emotions?
by: Gaznepoglu, Ünal Ege, et al.
Published: (2025)
by: Gaznepoglu, Ünal Ege, et al.
Published: (2025)
MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation
by: Fu, Ruibo, et al.
Published: (2024)
by: Fu, Ruibo, et al.
Published: (2024)
A robust audio deepfake detection system via multi-view feature
by: Yang, Yujie, et al.
Published: (2024)
by: Yang, Yujie, et al.
Published: (2024)
Gradient weighting for speaker verification in extremely low Signal-to-Noise Ratio
by: Ma, Yi, et al.
Published: (2024)
by: Ma, Yi, et al.
Published: (2024)
A k-space approach to modeling multi-channel parametric array loudspeaker systems
by: Zhuang, Tao, et al.
Published: (2025)
by: Zhuang, Tao, et al.
Published: (2025)
Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation
by: Guo, Hongming, et al.
Published: (2024)
by: Guo, Hongming, et al.
Published: (2024)
Subjective quality evaluation of personalized own voice reconstruction systems
by: Ohlenbusch, Mattes, et al.
Published: (2025)
by: Ohlenbusch, Mattes, et al.
Published: (2025)
Zero-shot Cross-lingual Voice Transfer for TTS
by: Biadsy, Fadi, et al.
Published: (2024)
by: Biadsy, Fadi, et al.
Published: (2024)
Gender-ambiguous voice generation through feminine speaking style transfer in male voices
by: Koutsogiannaki, Maria, et al.
Published: (2024)
by: Koutsogiannaki, Maria, et al.
Published: (2024)
The importance of spatial and spectral information in multiple speaker tracking
by: Beit-On, Hanan, et al.
Published: (2024)
by: Beit-On, Hanan, et al.
Published: (2024)
Similar Items
-
Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection
by: Wang, Xiaopeng, et al.
Published: (2024) -
Generalized Fake Audio Detection via Deep Stable Learning
by: Wang, Zhiyong, et al.
Published: (2024) -
EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech
by: Qi, Xin, et al.
Published: (2024) -
Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
by: Wang, Zhiyong, et al.
Published: (2024) -
Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio
by: Lu, Yi, et al.
Published: (2024)