Saved in:
| Main Authors: | Li, Haoyu, Han, Mingyang, Xi, Yu, Wang, Dongxiao, Wang, Hankun, Shi, Haoxiang, Li, Boyu, Song, Jun, Zheng, Bo, Wang, Shuai, Yu, Kai |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.09995 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS
by: Wang, Haoyu, et al.
Published: (2024)
by: Wang, Haoyu, et al.
Published: (2024)
Detect, Attend and Extract: Keyword Guided Target Speaker Extraction
by: Li, Haoyu, et al.
Published: (2026)
by: Li, Haoyu, et al.
Published: (2026)
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
by: Jeon, Yejin, et al.
Published: (2024)
by: Jeon, Yejin, et al.
Published: (2024)
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS
by: Wang, Xiaofei, et al.
Published: (2024)
by: Wang, Xiaofei, et al.
Published: (2024)
A Survey on Speech Large Language Models for Understanding
by: Peng, Jing, et al.
Published: (2024)
by: Peng, Jing, et al.
Published: (2024)
What Does the Speaker Embedding Encode?
by: Wang, Shuai, et al.
Published: (2025)
by: Wang, Shuai, et al.
Published: (2025)
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
by: Guo, Yiwei, et al.
Published: (2024)
by: Guo, Yiwei, et al.
Published: (2024)
Traceable TTS: Toward Watermark-Free TTS with Strong Traceability
by: Zhao, Yuxiang, et al.
Published: (2025)
by: Zhao, Yuxiang, et al.
Published: (2025)
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
by: Chen, Yushen, et al.
Published: (2024)
by: Chen, Yushen, et al.
Published: (2024)
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
by: Fu, Ruibo, et al.
Published: (2024)
by: Fu, Ruibo, et al.
Published: (2024)
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
by: Eskimez, Sefik Emre, et al.
Published: (2024)
by: Eskimez, Sefik Emre, et al.
Published: (2024)
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
by: Wang, Hankun, et al.
Published: (2024)
by: Wang, Hankun, et al.
Published: (2024)
Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
by: Chen, Zhengyang, et al.
Published: (2024)
by: Chen, Zhengyang, et al.
Published: (2024)
DialoSpeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching
by: Xie, Hanke, et al.
Published: (2025)
by: Xie, Hanke, et al.
Published: (2025)
Efficient Emotion and Speaker Adaptation in LLM-Based TTS via Characteristic-Specific Partial Fine-Tuning
by: Wang, Tianrui, et al.
Published: (2025)
by: Wang, Tianrui, et al.
Published: (2025)
Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration
by: Yang, Yifan, et al.
Published: (2025)
by: Yang, Yifan, et al.
Published: (2025)
Adaptive Deterministic Flow Matching for Target Speaker Extraction
by: Hsieh, Tsun-An, et al.
Published: (2025)
by: Hsieh, Tsun-An, et al.
Published: (2025)
On the Effectiveness of Acoustic BPE in Decoder-Only TTS
by: Li, Bohan, et al.
Published: (2024)
by: Li, Bohan, et al.
Published: (2024)
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
by: Li, Junjie, et al.
Published: (2023)
by: Li, Junjie, et al.
Published: (2023)
CodecSlime: Temporal Redundancy Compression of Neural Speech Codec via Dynamic Frame Rate
by: Wang, Hankun, et al.
Published: (2025)
by: Wang, Hankun, et al.
Published: (2025)
E1 TTS: Simple and Fast Non-Autoregressive TTS
by: Liu, Zhijun, et al.
Published: (2024)
by: Liu, Zhijun, et al.
Published: (2024)
AHAMask: Reliable Task Specification for Large Audio Language Models without Instructions
by: Guo, Yiwei, et al.
Published: (2025)
by: Guo, Yiwei, et al.
Published: (2025)
DINO-VITS: Data-Efficient Zero-Shot TTS with Self-Supervised Speaker Verification Loss for Noise Robustness
by: Pankov, Vikentii, et al.
Published: (2023)
by: Pankov, Vikentii, et al.
Published: (2023)
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
by: Jiang, Ziyue, et al.
Published: (2025)
by: Jiang, Ziyue, et al.
Published: (2025)
Zero-Shot TTS With Enhanced Audio Prompts: Bsc Submission For The 2026 Wildspoof Challenge TTS Track
by: Giraldo, Jose, et al.
Published: (2026)
by: Giraldo, Jose, et al.
Published: (2026)
Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization
by: Liu, Bei, et al.
Published: (2024)
by: Liu, Bei, et al.
Published: (2024)
Intelli-Z: Toward Intelligible Zero-Shot TTS
by: Jung, Sunghee, et al.
Published: (2024)
by: Jung, Sunghee, et al.
Published: (2024)
Towards Flow-Matching-based TTS without Classifier-Free Guidance
by: Liang, Yuzhe, et al.
Published: (2025)
by: Liang, Yuzhe, et al.
Published: (2025)
Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)
by: Zhang, Ke, et al.
Published: (2024)
Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model
by: Lehečka, Jan, et al.
Published: (2024)
by: Lehečka, Jan, et al.
Published: (2024)
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
by: Sun, Xiaohui, et al.
Published: (2025)
by: Sun, Xiaohui, et al.
Published: (2025)
DS-TTS: Zero-Shot Speaker Style Adaptation from Voice Clips via Dynamic Dual-Style Feature Modulation
by: Meng, Ming, et al.
Published: (2025)
by: Meng, Ming, et al.
Published: (2025)
VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation
by: Peng, Puyuan, et al.
Published: (2025)
by: Peng, Puyuan, et al.
Published: (2025)
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech
by: Avdeeva, Anastasia, et al.
Published: (2024)
by: Avdeeva, Anastasia, et al.
Published: (2024)
vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
by: Guo, Yiwei, et al.
Published: (2024)
by: Guo, Yiwei, et al.
Published: (2024)
TASU2: Controllable CTC Simulation for Alignment and Low-Resource Adaptation of Speech LLMs
by: Peng, Jing, et al.
Published: (2026)
by: Peng, Jing, et al.
Published: (2026)
Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion
by: Chen, Zhengyang, et al.
Published: (2024)
by: Chen, Zhengyang, et al.
Published: (2024)
UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
by: Guan, Wenhao, et al.
Published: (2025)
by: Guan, Wenhao, et al.
Published: (2025)
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
by: Jiang, Ziyue, et al.
Published: (2023)
by: Jiang, Ziyue, et al.
Published: (2023)
Improving Data Augmentation-based Cross-Speaker Style Transfer for TTS with Singing Voice, Style Filtering, and F0 Matching
by: Marques, Leonardo B. de M. M., et al.
Published: (2024)
by: Marques, Leonardo B. de M. M., et al.
Published: (2024)
Similar Items
-
Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS
by: Wang, Haoyu, et al.
Published: (2024) -
Detect, Attend and Extract: Keyword Guided Target Speaker Extraction
by: Li, Haoyu, et al.
Published: (2026) -
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
by: Jeon, Yejin, et al.
Published: (2024) -
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS
by: Wang, Xiaofei, et al.
Published: (2024) -
A Survey on Speech Large Language Models for Understanding
by: Peng, Jing, et al.
Published: (2024)