:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Jiawei, Zhang, Chen, Ren, Yi, Jiang, Ziyue, Ye, Zhenhui, Liu, Jinglin, He, Jinzheng, Yin, Xiang, Zhao, Zhou
Format:	Preprint
Published:	2024
Subjects:	Sound Artificial Intelligence Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2408.04708
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
by: Jiang, Ziyue, et al.
Published: (2023)

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
by: Zhang, Yu, et al.
Published: (2024)

StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
by: Yao, Jixun, et al.
Published: (2024)

Discl-VC: Disentangled Discrete Tokens and In-Context Learning for Controllable Zero-Shot Voice Conversion
by: Wang, Kaidi, et al.
Published: (2025)

MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
by: Ma, Guobin, et al.
Published: (2025)

EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning
by: Liang, Ziqi, et al.
Published: (2024)

ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization
by: Ren, Pengyu, et al.
Published: (2025)

VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion
by: Byun, Kyungguen, et al.
Published: (2024)

SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark
by: Saito, Yuki, et al.
Published: (2024)

DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion
by: Ning, Ziqian, et al.
Published: (2023)

PseudoVC: Improving One-shot Voice Conversion with Pseudo Paired Data
by: Cao, Songjun, et al.
Published: (2025)

Pureformer-VC: Non-parallel Voice Conversion with Pure Stylized Transformer Blocks and Triplet Discriminative Training
by: Yao, Wenhan, et al.
Published: (2025)

ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed
by: Chen, Meiying, et al.
Published: (2022)

Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy
by: Ma, Linhan, et al.
Published: (2024)

Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling
by: Yang, Yuguang, et al.
Published: (2024)

HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts
by: Niu, Xinlei, et al.
Published: (2024)

Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
by: Zuo, Jialong, et al.
Published: (2025)

Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
by: Zuo, Jialong, et al.
Published: (2025)

Zero-shot Cross-lingual Voice Transfer for TTS
by: Biadsy, Fadi, et al.
Published: (2024)

SynthVC: Leveraging Synthetic Data for End-to-End Low Latency Streaming Voice Conversion
by: Guo, Zhao, et al.
Published: (2025)

CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion
by: Li, Yuke, et al.
Published: (2024)

AdaptVC: High Quality Voice Conversion with Adaptive Learning
by: Kim, Jaehun, et al.
Published: (2025)

StreamVC: Real-Time Low-Latency Voice Conversion
by: Yang, Yang, et al.
Published: (2024)

O_O-VC: Synthetic Data-Driven One-to-One Alignment for Any-to-Any Voice Conversion
by: Tu, Huu Tuong, et al.
Published: (2025)

Residual Speaker Representation for One-Shot Voice Conversion
by: Xu, Le, et al.
Published: (2023)

EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion
by: Joglekar, Advait, et al.
Published: (2025)

SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
by: Neekhara, Paarth, et al.
Published: (2023)

Chain-Talker: Chain Understanding and Rendering for Empathetic Conversational Speech Synthesis
by: Hu, Yifan, et al.
Published: (2025)

MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
by: Jiang, Ziyue, et al.
Published: (2025)

Pureformer-VC: Non-parallel One-Shot Voice Conversion with Pure Transformer Blocks and Triplet Discriminative Training
by: Yao, Wenhan, et al.
Published: (2024)

QR-VC: Leveraging Quantization Residuals for Linear Disentanglement in Zero-Shot Voice Conversion
by: Sim, Youngjun, et al.
Published: (2024)

SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
by: Li, Junjie, et al.
Published: (2023)

USM-VC: Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
by: Li, Na, et al.
Published: (2025)

Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals
by: Seki, Kentaro, et al.
Published: (2024)

Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations
by: Zhou, Wangjin, et al.
Published: (2024)

Complex-Cycle-Consistent Diffusion Model for Monaural Speech Enhancement
by: Li, Yi, et al.
Published: (2024)

In This Environment, As That Speaker: A Text-Driven Framework for Multi-Attribute Speech Conversion
by: Jin, Jiawei, et al.
Published: (2025)

Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a Conditional Diffusion Model
by: Du, Zongyang, et al.
Published: (2024)

JoyVoice: Long-Context Conditioning for Anthropomorphic Multi-Speaker Conversational Synthesis
by: Yu, Fan, et al.
Published: (2025)

Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation
by: Ellinas, Nikolaos, et al.
Published: (2022)