Saved in:
| Main Authors: | Li, Ruiqi, Zhang, Yu, Pan, Changhao, Lei, Ke, Yin, Xiang, Yang, Cheng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.30993 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
by: Zhang, Yu, et al.
Published: (2024)
by: Zhang, Yu, et al.
Published: (2024)
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
by: Jiang, Yuepeng, et al.
Published: (2024)
by: Jiang, Yuepeng, et al.
Published: (2024)
REF-VC: Robust, Expressive and Fast Zero-Shot Voice Conversion with Diffusion Transformers
by: Jiang, Yuepeng, et al.
Published: (2025)
by: Jiang, Yuepeng, et al.
Published: (2025)
Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling
by: Yang, Yuguang, et al.
Published: (2024)
by: Yang, Yuguang, et al.
Published: (2024)
Everyone-Can-Sing: Zero-Shot Singing Voice Synthesis and Conversion with Speech Reference
by: Dai, Shuqi, et al.
Published: (2025)
by: Dai, Shuqi, et al.
Published: (2025)
Generative Expressive Conversational Speech Synthesis
by: Liu, Rui, et al.
Published: (2024)
by: Liu, Rui, et al.
Published: (2024)
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
by: Li, Xuyuan, et al.
Published: (2024)
by: Li, Xuyuan, et al.
Published: (2024)
Debatts: Zero-Shot Debating Text-to-Speech Synthesis
by: Huang, Yiqiao, et al.
Published: (2024)
by: Huang, Yiqiao, et al.
Published: (2024)
StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
by: Yao, Jixun, et al.
Published: (2024)
by: Yao, Jixun, et al.
Published: (2024)
TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis
by: Zhang, Yu, et al.
Published: (2025)
by: Zhang, Yu, et al.
Published: (2025)
Text-aware and Context-aware Expressive Audiobook Speech Synthesis
by: Guo, Dake, et al.
Published: (2024)
by: Guo, Dake, et al.
Published: (2024)
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
by: Jiang, Ziyue, et al.
Published: (2023)
by: Jiang, Ziyue, et al.
Published: (2023)
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
by: Jiang, Ziyue, et al.
Published: (2025)
by: Jiang, Ziyue, et al.
Published: (2025)
UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
by: Tu, Wenming, et al.
Published: (2025)
by: Tu, Wenming, et al.
Published: (2025)
Towards Better Disentanglement in Non-Autoregressive Zero-Shot Expressive Voice Conversion
by: Akti, Seymanur, et al.
Published: (2025)
by: Akti, Seymanur, et al.
Published: (2025)
Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer
by: Lei, Ke, et al.
Published: (2026)
by: Lei, Ke, et al.
Published: (2026)
Synthetic Singers: A Review of Deep-Learning-based Singing Voice Synthesis Approaches
by: Pan, Changhao, et al.
Published: (2026)
by: Pan, Changhao, et al.
Published: (2026)
Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis
by: Lu, Ye-Xin, et al.
Published: (2024)
by: Lu, Ye-Xin, et al.
Published: (2024)
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
by: Zhu, Han, et al.
Published: (2025)
by: Zhu, Han, et al.
Published: (2025)
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
by: Anastassiou, Philip, et al.
Published: (2024)
by: Anastassiou, Philip, et al.
Published: (2024)
Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning
by: Zhu, Xinfa, et al.
Published: (2023)
by: Zhu, Xinfa, et al.
Published: (2023)
FlexSpeech: Towards Stable, Controllable and Expressive Text-to-Speech
by: Ma, Linhan, et al.
Published: (2025)
by: Ma, Linhan, et al.
Published: (2025)
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
by: Wang, Zhichao, et al.
Published: (2024)
by: Wang, Zhichao, et al.
Published: (2024)
Zero-Shot Voice Conversion via Content-Aware Timbre Ensemble and Conditional Flow Matching
by: Pan, Yu, et al.
Published: (2024)
by: Pan, Yu, et al.
Published: (2024)
Robust Singing Voice Transcription Serves Synthesis
by: Li, Ruiqi, et al.
Published: (2024)
by: Li, Ruiqi, et al.
Published: (2024)
MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech
by: Bak, Taejun, et al.
Published: (2024)
by: Bak, Taejun, et al.
Published: (2024)
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech
by: Avdeeva, Anastasia, et al.
Published: (2024)
by: Avdeeva, Anastasia, et al.
Published: (2024)
Zero-Shot Duet Singing Voices Separation with Diffusion Models
by: Yu, Chin-Yun, et al.
Published: (2023)
by: Yu, Chin-Yun, et al.
Published: (2023)
Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis
by: Wang, Tianrui, et al.
Published: (2025)
by: Wang, Tianrui, et al.
Published: (2025)
ZSDEVC: Zero-Shot Diffusion-based Emotional Voice Conversion with Disentangled Mechanism
by: Chou, Hsing-Hang, et al.
Published: (2024)
by: Chou, Hsing-Hang, et al.
Published: (2024)
HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis
by: Nishimura, Yuto, et al.
Published: (2024)
by: Nishimura, Yuto, et al.
Published: (2024)
MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
by: Yang, Qian, et al.
Published: (2024)
by: Yang, Qian, et al.
Published: (2024)
RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding
by: Liu, Yisi, et al.
Published: (2025)
by: Liu, Yisi, et al.
Published: (2025)
OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models
by: Zhu, Han, et al.
Published: (2026)
by: Zhu, Han, et al.
Published: (2026)
CONTUNER: Singing Voice Beautifying with Pitch and Expressiveness Condition
by: Wang, Jianzong, et al.
Published: (2024)
by: Wang, Jianzong, et al.
Published: (2024)
SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis
by: Qian, Jiale, et al.
Published: (2026)
by: Qian, Jiale, et al.
Published: (2026)
Train Short, Infer Long: Speech-LLM Enables Zero-Shot Streamable Joint ASR and Diarization on Long Audio
by: Shi, Mohan, et al.
Published: (2025)
by: Shi, Mohan, et al.
Published: (2025)
Zero-Shot Mono-to-Binaural Speech Synthesis
by: Levkovitch, Alon, et al.
Published: (2024)
by: Levkovitch, Alon, et al.
Published: (2024)
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
by: Zhang, Bowen, et al.
Published: (2025)
by: Zhang, Bowen, et al.
Published: (2025)
Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion
by: Li, Ruiqi, et al.
Published: (2024)
by: Li, Ruiqi, et al.
Published: (2024)
Similar Items
-
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
by: Zhang, Yu, et al.
Published: (2024) -
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
by: Jiang, Yuepeng, et al.
Published: (2024) -
REF-VC: Robust, Expressive and Fast Zero-Shot Voice Conversion with Diffusion Transformers
by: Jiang, Yuepeng, et al.
Published: (2025) -
Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling
by: Yang, Yuguang, et al.
Published: (2024) -
Everyone-Can-Sing: Zero-Shot Singing Voice Synthesis and Conversion with Speech Reference
by: Dai, Shuqi, et al.
Published: (2025)