Saved in:
| Main Authors: | Liu, Yisi, Lee, Nicholas, Anumanchipalli, Gopala |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.20113 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding
by: Liu, Yisi, et al.
Published: (2025)
by: Liu, Yisi, et al.
Published: (2025)
Fast, High-Quality and Parameter-Efficient Articulatory Synthesis using Differentiable DSP
by: Liu, Yisi, et al.
Published: (2024)
by: Liu, Yisi, et al.
Published: (2024)
Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion
by: Rong, Yan, et al.
Published: (2024)
by: Rong, Yan, et al.
Published: (2024)
Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech
by: Kim, Nam-Gyu
Published: (2025)
by: Kim, Nam-Gyu
Published: (2025)
ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis
by: Li, Haitao, et al.
Published: (2026)
by: Li, Haitao, et al.
Published: (2026)
Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech
by: Kim, Nam-Gyu, et al.
Published: (2025)
by: Kim, Nam-Gyu, et al.
Published: (2025)
Voice "Cloning" is Style Transfer
by: Zhou, Kaitlyn, et al.
Published: (2026)
by: Zhou, Kaitlyn, et al.
Published: (2026)
YingMusic-SVC: Real-World Robust Zero-Shot Singing Voice Conversion with Flow-GRPO and Singing-Specific Inductive Biases
by: Chen, Gongyu, et al.
Published: (2025)
by: Chen, Gongyu, et al.
Published: (2025)
Controllable Singing Style Conversion with Boundary-Aware Information Bottleneck
by: Hu, Zhetao, et al.
Published: (2026)
by: Hu, Zhetao, et al.
Published: (2026)
VoicePrompter: Robust Zero-Shot Voice Conversion with Voice Prompt and Conditional Flow Matching
by: Choi, Ha-Yeong, et al.
Published: (2025)
by: Choi, Ha-Yeong, et al.
Published: (2025)
Towards Better Disentanglement in Non-Autoregressive Zero-Shot Expressive Voice Conversion
by: Akti, Seymanur, et al.
Published: (2025)
by: Akti, Seymanur, et al.
Published: (2025)
Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning
by: Zhao, Junchuan, et al.
Published: (2025)
by: Zhao, Junchuan, et al.
Published: (2025)
QR-VC: Leveraging Quantization Residuals for Linear Disentanglement in Zero-Shot Voice Conversion
by: Sim, Youngjun, et al.
Published: (2024)
by: Sim, Youngjun, et al.
Published: (2024)
TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models
by: Low, Chetwin, et al.
Published: (2025)
by: Low, Chetwin, et al.
Published: (2025)
Zero-Shot Voice Conversion via Content-Aware Timbre Ensemble and Conditional Flow Matching
by: Pan, Yu, et al.
Published: (2024)
by: Pan, Yu, et al.
Published: (2024)
Discl-VC: Disentangled Discrete Tokens and In-Context Learning for Controllable Zero-Shot Voice Conversion
by: Wang, Kaidi, et al.
Published: (2025)
by: Wang, Kaidi, et al.
Published: (2025)
Unsupervised TTS Acoustic Modeling for TTS with Conditional Disentangled Sequential VAE
by: Lian, Jiachen, et al.
Published: (2022)
by: Lian, Jiachen, et al.
Published: (2022)
Fed-PISA: Federated Voice Cloning via Personalized Identity-Style Adaptation
by: Wang, Qi, et al.
Published: (2025)
by: Wang, Qi, et al.
Published: (2025)
AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis
by: Luo, Dan, et al.
Published: (2025)
by: Luo, Dan, et al.
Published: (2025)
VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions
by: Zhan, Jun, et al.
Published: (2025)
by: Zhan, Jun, et al.
Published: (2025)
Music Style Transfer With Diffusion Model
by: Huang, Hong, et al.
Published: (2024)
by: Huang, Hong, et al.
Published: (2024)
HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios
by: Bai, Bingsong, et al.
Published: (2025)
by: Bai, Bingsong, et al.
Published: (2025)
StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
by: Yao, Jixun, et al.
Published: (2024)
by: Yao, Jixun, et al.
Published: (2024)
Coding Speech through Vocal Tract Kinematics
by: Cho, Cheol Jun, et al.
Published: (2024)
by: Cho, Cheol Jun, et al.
Published: (2024)
R2-SVC: Towards Real-World Robust and Expressive Zero-shot Singing Voice Conversion
by: Zheng, Junjie, et al.
Published: (2025)
by: Zheng, Junjie, et al.
Published: (2025)
Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling
by: Yang, Yuguang, et al.
Published: (2024)
by: Yang, Yuguang, et al.
Published: (2024)
Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement
by: Zhang, Xueyao, et al.
Published: (2025)
by: Zhang, Xueyao, et al.
Published: (2025)
Defense Against Synthetic Speech: Real-Time Detection of RVC Voice Conversion Attacks
by: Chinchmalatpure, Prajwal, et al.
Published: (2025)
by: Chinchmalatpure, Prajwal, et al.
Published: (2025)
X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning
by: Xu, Rixi, et al.
Published: (2026)
by: Xu, Rixi, et al.
Published: (2026)
Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model
by: Lall, Vishakha, et al.
Published: (2024)
by: Lall, Vishakha, et al.
Published: (2024)
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
by: Anastassiou, Philip, et al.
Published: (2024)
by: Anastassiou, Philip, et al.
Published: (2024)
DAFMSVC: One-Shot Singing Voice Conversion with Dual Attention Mechanism and Flow Matching
by: Chen, Wei, et al.
Published: (2025)
by: Chen, Wei, et al.
Published: (2025)
DAST: A Dual-Stream Voice Anonymization Attacker with Staged Training
by: Arefeen, Ridwan, et al.
Published: (2026)
by: Arefeen, Ridwan, et al.
Published: (2026)
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
by: Zhang, Yu, et al.
Published: (2024)
by: Zhang, Yu, et al.
Published: (2024)
EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion
by: Joglekar, Advait, et al.
Published: (2025)
by: Joglekar, Advait, et al.
Published: (2025)
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
by: Shi, Yemin, et al.
Published: (2025)
by: Shi, Yemin, et al.
Published: (2025)
Remix the Timbre: Diffusion-Based Style Transfer Across Polyphonic Stems
by: Chen, Leduo, et al.
Published: (2026)
by: Chen, Leduo, et al.
Published: (2026)
SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis
by: Qian, Jiale, et al.
Published: (2026)
by: Qian, Jiale, et al.
Published: (2026)
LAPS-Diff: A Diffusion-Based Framework for Singing Voice Synthesis With Language Aware Prosody-Style Guided Learning
by: Dhar, Sandipan, et al.
Published: (2025)
by: Dhar, Sandipan, et al.
Published: (2025)
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
by: Liu, Jiaxuan, et al.
Published: (2024)
by: Liu, Jiaxuan, et al.
Published: (2024)
Similar Items
-
RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding
by: Liu, Yisi, et al.
Published: (2025) -
Fast, High-Quality and Parameter-Efficient Articulatory Synthesis using Differentiable DSP
by: Liu, Yisi, et al.
Published: (2024) -
Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion
by: Rong, Yan, et al.
Published: (2024) -
Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech
by: Kim, Nam-Gyu
Published: (2025) -
ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis
by: Li, Haitao, et al.
Published: (2026)