:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Haoyu, Han, Mingyang, Xi, Yu, Wang, Dongxiao, Wang, Hankun, Shi, Haoxiang, Li, Boyu, Song, Jun, Zheng, Bo, Wang, Shuai, Yu, Kai
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2511.09995
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS
by: Wang, Haoyu, et al.
Published: (2024)

Detect, Attend and Extract: Keyword Guided Target Speaker Extraction
by: Li, Haoyu, et al.
Published: (2026)

Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
by: Jeon, Yejin, et al.
Published: (2024)

An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS
by: Wang, Xiaofei, et al.
Published: (2024)

A Survey on Speech Large Language Models for Understanding
by: Peng, Jing, et al.
Published: (2024)

What Does the Speaker Embedding Encode?
by: Wang, Shuai, et al.
Published: (2025)

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
by: Guo, Yiwei, et al.
Published: (2024)

Traceable TTS: Toward Watermark-Free TTS with Strong Traceability
by: Zhao, Yuxiang, et al.
Published: (2025)

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
by: Chen, Yushen, et al.
Published: (2024)

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
by: Fu, Ruibo, et al.
Published: (2024)

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
by: Eskimez, Sefik Emre, et al.
Published: (2024)

Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
by: Wang, Hankun, et al.
Published: (2024)

Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
by: Chen, Zhengyang, et al.
Published: (2024)

DialoSpeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching
by: Xie, Hanke, et al.
Published: (2025)

Efficient Emotion and Speaker Adaptation in LLM-Based TTS via Characteristic-Specific Partial Fine-Tuning
by: Wang, Tianrui, et al.
Published: (2025)

Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration
by: Yang, Yifan, et al.
Published: (2025)

Adaptive Deterministic Flow Matching for Target Speaker Extraction
by: Hsieh, Tsun-An, et al.
Published: (2025)

On the Effectiveness of Acoustic BPE in Decoder-Only TTS
by: Li, Bohan, et al.
Published: (2024)

SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
by: Li, Junjie, et al.
Published: (2023)

CodecSlime: Temporal Redundancy Compression of Neural Speech Codec via Dynamic Frame Rate
by: Wang, Hankun, et al.
Published: (2025)

E1 TTS: Simple and Fast Non-Autoregressive TTS
by: Liu, Zhijun, et al.
Published: (2024)

AHAMask: Reliable Task Specification for Large Audio Language Models without Instructions
by: Guo, Yiwei, et al.
Published: (2025)

DINO-VITS: Data-Efficient Zero-Shot TTS with Self-Supervised Speaker Verification Loss for Noise Robustness
by: Pankov, Vikentii, et al.
Published: (2023)

MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
by: Jiang, Ziyue, et al.
Published: (2025)

Zero-Shot TTS With Enhanced Audio Prompts: Bsc Submission For The 2026 Wildspoof Challenge TTS Track
by: Giraldo, Jose, et al.
Published: (2026)

Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization
by: Liu, Bei, et al.
Published: (2024)

Intelli-Z: Toward Intelligible Zero-Shot TTS
by: Jung, Sunghee, et al.
Published: (2024)

Towards Flow-Matching-based TTS without Classifier-Free Guidance
by: Liang, Yuzhe, et al.
Published: (2025)

Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)

Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model
by: Lehečka, Jan, et al.
Published: (2024)

F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
by: Sun, Xiaohui, et al.
Published: (2025)

DS-TTS: Zero-Shot Speaker Style Adaptation from Voice Clips via Dynamic Dual-Style Feature Modulation
by: Meng, Ming, et al.
Published: (2025)

VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation
by: Peng, Puyuan, et al.
Published: (2025)

Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech
by: Avdeeva, Anastasia, et al.
Published: (2024)

vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
by: Guo, Yiwei, et al.
Published: (2024)

TASU2: Controllable CTC Simulation for Alignment and Low-Resource Adaptation of Speech LLMs
by: Peng, Jing, et al.
Published: (2026)

Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion
by: Chen, Zhengyang, et al.
Published: (2024)

UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
by: Guan, Wenhao, et al.
Published: (2025)

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
by: Jiang, Ziyue, et al.
Published: (2023)

Improving Data Augmentation-based Cross-Speaker Style Transfer for TTS with Singing Voice, Style Filtering, and F0 Matching
by: Marques, Leonardo B. de M. M., et al.
Published: (2024)