Saved in:
| Main Authors: | Yang, Yifan, Han, Bing, Wang, Hui, Zhou, Long, Wang, Wei, Cui, Mingyu, Tan, Xu, Chen, Xie |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.19928 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
by: Jiang, Yuepeng, et al.
Published: (2024)
by: Jiang, Yuepeng, et al.
Published: (2024)
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
by: Eskimez, Sefik Emre, et al.
Published: (2024)
by: Eskimez, Sefik Emre, et al.
Published: (2024)
PRESENT: Zero-Shot Text-to-Prosody Control
by: Lam, Perry, et al.
Published: (2024)
by: Lam, Perry, et al.
Published: (2024)
Time-Layer Adaptive Alignment for Speaker Similarity in Flow-Matching Based Zero-Shot TTS
by: Li, Haoyu, et al.
Published: (2025)
by: Li, Haoyu, et al.
Published: (2025)
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
by: Deng, Wei, et al.
Published: (2025)
by: Deng, Wei, et al.
Published: (2025)
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
by: Jiang, Ziyue, et al.
Published: (2023)
by: Jiang, Ziyue, et al.
Published: (2023)
Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training
by: Yang, Yifan, et al.
Published: (2026)
by: Yang, Yifan, et al.
Published: (2026)
Zero-Shot TTS With Enhanced Audio Prompts: Bsc Submission For The 2026 Wildspoof Challenge TTS Track
by: Giraldo, Jose, et al.
Published: (2026)
by: Giraldo, Jose, et al.
Published: (2026)
Intelli-Z: Toward Intelligible Zero-Shot TTS
by: Jung, Sunghee, et al.
Published: (2024)
by: Jung, Sunghee, et al.
Published: (2024)
No Verifiable Reward for Prosody: Toward Preference-Guided Prosody Learning in TTS
by: Shin, Seungyoun, et al.
Published: (2025)
by: Shin, Seungyoun, et al.
Published: (2025)
Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning
by: Zhao, Junchuan, et al.
Published: (2025)
by: Zhao, Junchuan, et al.
Published: (2025)
Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM
by: Yu, Jiawei, et al.
Published: (2024)
by: Yu, Jiawei, et al.
Published: (2024)
Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting
by: Han, Wooseok, et al.
Published: (2024)
by: Han, Wooseok, et al.
Published: (2024)
Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion
by: Chen, Zhengyang, et al.
Published: (2024)
by: Chen, Zhengyang, et al.
Published: (2024)
Combining Masked Language Modeling and Cross-Modal Contrastive Learning for Prosody-Aware TTS
by: Borodin, Kirill, et al.
Published: (2026)
by: Borodin, Kirill, et al.
Published: (2026)
WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark
by: Ma, Linhan, et al.
Published: (2024)
by: Ma, Linhan, et al.
Published: (2024)
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
by: Jeon, Yejin, et al.
Published: (2024)
by: Jeon, Yejin, et al.
Published: (2024)
VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation
by: Peng, Puyuan, et al.
Published: (2025)
by: Peng, Puyuan, et al.
Published: (2025)
DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation
by: Chen, Ziqi, et al.
Published: (2025)
by: Chen, Ziqi, et al.
Published: (2025)
Traceable TTS: Toward Watermark-Free TTS with Strong Traceability
by: Zhao, Yuxiang, et al.
Published: (2025)
by: Zhao, Yuxiang, et al.
Published: (2025)
Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition
by: Chevi, Rendi, et al.
Published: (2024)
by: Chevi, Rendi, et al.
Published: (2024)
Counterfactual Activation Editing for Post-hoc Prosody and Mispronunciation Correction in TTS Models
by: Lee, Kyowoon, et al.
Published: (2025)
by: Lee, Kyowoon, et al.
Published: (2025)
ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis
by: Li, Haitao, et al.
Published: (2026)
by: Li, Haitao, et al.
Published: (2026)
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
by: Liu, Jiaxuan, et al.
Published: (2024)
by: Liu, Jiaxuan, et al.
Published: (2024)
Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages
by: Arora, Akshit, et al.
Published: (2024)
by: Arora, Akshit, et al.
Published: (2024)
The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024
by: Zhou, Shuoyi, et al.
Published: (2024)
by: Zhou, Shuoyi, et al.
Published: (2024)
MPE-TTS: Customized Emotion Zero-Shot Text-To-Speech Using Multi-Modal Prompt
by: Wu, Zhichao, et al.
Published: (2025)
by: Wu, Zhichao, et al.
Published: (2025)
HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling
by: Wang, Chunhui, et al.
Published: (2024)
by: Wang, Chunhui, et al.
Published: (2024)
Voice Impression Control in Zero-Shot TTS
by: Fujita, Kenichi, et al.
Published: (2025)
by: Fujita, Kenichi, et al.
Published: (2025)
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
by: Li, Yinghao Aaron, et al.
Published: (2024)
by: Li, Yinghao Aaron, et al.
Published: (2024)
An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-shot Multi-speaker TTS
by: Kunešová, Marie, et al.
Published: (2025)
by: Kunešová, Marie, et al.
Published: (2025)
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
by: Kim, Jaehyeon, et al.
Published: (2024)
by: Kim, Jaehyeon, et al.
Published: (2024)
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS
by: Wang, Xiaofei, et al.
Published: (2024)
by: Wang, Xiaofei, et al.
Published: (2024)
FireRedTTS-1S: An Upgraded Streamable Foundation Text-to-Speech System
by: Guo, Hao-Han, et al.
Published: (2025)
by: Guo, Hao-Han, et al.
Published: (2025)
IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech
by: Zhou, Siyi, et al.
Published: (2025)
by: Zhou, Siyi, et al.
Published: (2025)
Zero-shot Cross-lingual Voice Transfer for TTS
by: Biadsy, Fadi, et al.
Published: (2024)
by: Biadsy, Fadi, et al.
Published: (2024)
Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS
by: Wang, Haoyu, et al.
Published: (2024)
by: Wang, Haoyu, et al.
Published: (2024)
Position: Towards Responsible Evaluation for Text-to-Speech
by: Yang, Yifan, et al.
Published: (2025)
by: Yang, Yifan, et al.
Published: (2025)
Chatterbox-Flash: Prior-Calibrated Block Diffusion for Streaming Zero-Shot TTS
by: Seo, Deokjin, et al.
Published: (2026)
by: Seo, Deokjin, et al.
Published: (2026)
DINO-VITS: Data-Efficient Zero-Shot TTS with Self-Supervised Speaker Verification Loss for Noise Robustness
by: Pankov, Vikentii, et al.
Published: (2023)
by: Pankov, Vikentii, et al.
Published: (2023)
Similar Items
-
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
by: Jiang, Yuepeng, et al.
Published: (2024) -
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
by: Eskimez, Sefik Emre, et al.
Published: (2024) -
PRESENT: Zero-Shot Text-to-Prosody Control
by: Lam, Perry, et al.
Published: (2024) -
Time-Layer Adaptive Alignment for Speaker Similarity in Flow-Matching Based Zero-Shot TTS
by: Li, Haoyu, et al.
Published: (2025) -
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
by: Deng, Wei, et al.
Published: (2025)