:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Xiaopeng, Lu, Yi, Qi, Xin, Wang, Zhiyong, Xie, Yuankun, Shi, Shuchen, Fu, Ruibo
Format:	Preprint
Published:	2024
Subjects:	Sound Computation and Language Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2406.17801
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection
by: Wang, Xiaopeng, et al.
Published: (2024)

Generalized Fake Audio Detection via Deep Stable Learning
by: Wang, Zhiyong, et al.
Published: (2024)

EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech
by: Qi, Xin, et al.
Published: (2024)

Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
by: Wang, Zhiyong, et al.
Published: (2024)

Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio
by: Lu, Yi, et al.
Published: (2024)

The FruitShell French synthesis system at the Blizzard 2023 Challenge
by: Qi, Xin, et al.
Published: (2023)

Temporal Variability and Multi-Viewed Self-Supervised Representations to Tackle the ASVspoof5 Deepfake Challenge
by: Xie, Yuankun, et al.
Published: (2024)

A Noval Feature via Color Quantisation for Fake Audio Detection
by: Wang, Zhiyong, et al.
Published: (2024)

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
by: Qi, Xin, et al.
Published: (2024)

PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation
by: Shi, Shuchen, et al.
Published: (2024)

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
by: Fu, Ruibo, et al.
Published: (2024)

Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy
by: Xie, Yuankun, et al.
Published: (2024)

The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio
by: Xie, Yuankun, et al.
Published: (2024)

An audio-quality-based multi-strategy approach for target speaker extraction in the MISP 2023 Challenge
by: Han, Runduo, et al.
Published: (2024)

RPRA-ADD: Forgery Trace Enhancement-Driven Audio Deepfake Detection
by: Fu, Ruibo, et al.
Published: (2025)

Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?
by: Xie, Yuankun, et al.
Published: (2024)

Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition
by: Xie, Yuankun, et al.
Published: (2025)

Visual-based spatial audio generation system for multi-speaker environments
by: Liu, Xiaojing, et al.
Published: (2025)

Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
by: Xiong, Chenxu, et al.
Published: (2024)

SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription
by: Grossman, Raymond, et al.
Published: (2025)

Text adaptation for speaker verification with speaker-text factorized embeddings
by: Yang, Yexin, et al.
Published: (2025)

Spectral or spatial? Leveraging both for speaker extraction in challenging data conditions
by: Eisenberg, Aviad, et al.
Published: (2025)

Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing
by: Li, Jingbei, et al.
Published: (2023)

End-to-end multi-channel speaker extraction and binaural speech synthesis
by: Chi, Cheng, et al.
Published: (2024)

Hierarchical speaker representation for target speaker extraction
by: He, Shulin, et al.
Published: (2022)

Non-autoregressive real-time Accent Conversion model with voice cloning
by: Nechaev, Vladimir, et al.
Published: (2024)

How phonemes contribute to deep speaker models?
by: Li, Pengqi, et al.
Published: (2024)

Improving curriculum learning for target speaker extraction with synthetic speakers
by: Liu, Yun, et al.
Published: (2024)

Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages
by: Arora, Akshit, et al.
Published: (2024)

A Benchmark for Multi-speaker Anonymization
by: Miao, Xiaoxiao, et al.
Published: (2024)

Why disentanglement-based speaker anonymization systems fail at preserving emotions?
by: Gaznepoglu, Ünal Ege, et al.
Published: (2025)

MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation
by: Fu, Ruibo, et al.
Published: (2024)

A robust audio deepfake detection system via multi-view feature
by: Yang, Yujie, et al.
Published: (2024)

Gradient weighting for speaker verification in extremely low Signal-to-Noise Ratio
by: Ma, Yi, et al.
Published: (2024)

A k-space approach to modeling multi-channel parametric array loudspeaker systems
by: Zhuang, Tao, et al.
Published: (2025)

Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation
by: Guo, Hongming, et al.
Published: (2024)

Subjective quality evaluation of personalized own voice reconstruction systems
by: Ohlenbusch, Mattes, et al.
Published: (2025)

Zero-shot Cross-lingual Voice Transfer for TTS
by: Biadsy, Fadi, et al.
Published: (2024)

Gender-ambiguous voice generation through feminine speaking style transfer in male voices
by: Koutsogiannaki, Maria, et al.
Published: (2024)

The importance of spatial and spectral information in multiple speaker tracking
by: Beit-On, Hanan, et al.
Published: (2024)