:: Library Catalog

Ảnh bìa

Đã lưu trong:

Chi tiết về thư mục
Những tác giả chính:	Jiang, Ziyue, Ren, Yi, Li, Ruiqi, Ji, Shengpeng, Zhang, Boyang, Ye, Zhenhui, Zhang, Chen, Jionghao, Bai, Yang, Xiaoda, Zuo, Jialong, Zhang, Yu, Liu, Rui, Yin, Xiang, Zhao, Zhou
Định dạng:	Preprint
Được phát hành:	2025
Những chủ đề:	Audio and Speech Processing Machine Learning Sound
Truy cập trực tuyến:	https://arxiv.org/abs/2502.18924
Các nhãn:	Thêm thẻ Không có thẻ, Là người đầu tiên thẻ bản ghi này!

Những quyển sách tương tự

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Bằng: Jiang, Ziyue, et al.
Được phát hành: (2023)

MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
Bằng: Ji, Shengpeng, et al.
Được phát hành: (2024)

Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
Bằng: Zuo, Jialong, et al.
Được phát hành: (2025)

Language-Codec: Bridging Discrete Codec Representations and Speech Language Models
Bằng: Ji, Shengpeng, et al.
Được phát hành: (2024)

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
Bằng: Ji, Shengpeng, et al.
Được phát hành: (2023)

ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control
Bằng: Ji, Shengpeng, et al.
Được phát hành: (2024)

Speech Watermarking with Discrete Intermediate Representations
Bằng: Ji, Shengpeng, et al.
Được phát hành: (2024)

HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling
Bằng: Wang, Chunhui, et al.
Được phát hành: (2024)

Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
Bằng: Wang, Yongqi, et al.
Được phát hành: (2023)

CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
Bằng: Kim, Jaehyeon, et al.
Được phát hành: (2024)

Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
Bằng: Zuo, Jialong, et al.
Được phát hành: (2025)

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Bằng: Eskimez, Sefik Emre, et al.
Được phát hành: (2024)

Intelli-Z: Toward Intelligible Zero-Shot TTS
Bằng: Jung, Sunghee, et al.
Được phát hành: (2024)

MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
Bằng: Yang, Qian, et al.
Được phát hành: (2024)

MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
Bằng: Zhang, Bowen, et al.
Được phát hành: (2025)

Parallel GPT: Harmonizing the Independence and Interdependence of Acoustic and Semantic Information for Zero-Shot Text-to-Speech
Bằng: Xing, Jingyuan, et al.
Được phát hành: (2025)

Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction
Bằng: Zhang, Leying, et al.
Được phát hành: (2025)

OV-InstructTTS: Towards Open-Vocabulary Instruct Text-to-Speech
Bằng: Ren, Yong, et al.
Được phát hành: (2026)

Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
Bằng: Jeon, Yejin, et al.
Được phát hành: (2024)

ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis
Bằng: Li, Haitao, et al.
Được phát hành: (2026)

IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Bằng: Deng, Wei, et al.
Được phát hành: (2025)

GSA-TTS : Toward Zero-Shot Speech Synthesis based on Gradual Style Adaptor
Bằng: Lee, Seokgi, et al.
Được phát hành: (2025)

StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Bằng: Li, Yinghao Aaron, et al.
Được phát hành: (2024)

VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation
Bằng: Peng, Puyuan, et al.
Được phát hành: (2025)

SNIPER Training: Single-Shot Sparse Training for Text-to-Speech
Bằng: Lam, Perry, et al.
Được phát hành: (2022)

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
Bằng: Zhang, Yu, et al.
Được phát hành: (2024)

MPE-TTS: Customized Emotion Zero-Shot Text-To-Speech Using Multi-Modal Prompt
Bằng: Wu, Zhichao, et al.
Được phát hành: (2025)

SF-Speech: Straightened Flow for Zero-Shot Voice Clone
Bằng: Li, Xuyuan, et al.
Được phát hành: (2024)

Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model
Bằng: Lehečka, Jan, et al.
Được phát hành: (2024)

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Bằng: Gong, Cheng, et al.
Được phát hành: (2023)

DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation
Bằng: Chen, Ziqi, et al.
Được phát hành: (2025)

ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis
Bằng: Choi, Youngwon, et al.
Được phát hành: (2026)

Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages
Bằng: Arora, Akshit, et al.
Được phát hành: (2024)

FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis
Bằng: Guo, Yinlin, et al.
Được phát hành: (2024)

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Bằng: Anastassiou, Philip, et al.
Được phát hành: (2024)

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Bằng: Ji, Shengpeng, et al.
Được phát hành: (2024)

Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM
Bằng: Yu, Jiawei, et al.
Được phát hành: (2024)

DINO-VITS: Data-Efficient Zero-Shot TTS with Self-Supervised Speaker Verification Loss for Noise Robustness
Bằng: Pankov, Vikentii, et al.
Được phát hành: (2023)

MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Bằng: Li, Ruiqi, et al.
Được phát hành: (2024)

The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024
Bằng: Zhou, Shuoyi, et al.
Được phát hành: (2024)