:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Arata, Chihiro, Kurihara, Kiyoshi
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2604.01760
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Qwen3-TTS Technical Report
by: Hu, Hangrui, et al.
Published: (2026)

TTS-1 Technical Report
by: Atamanenko, Oleg, et al.
Published: (2025)

StepAudio 2.5 Technical Report
by: Lin, Bin, et al.
Published: (2026)

Traceable TTS: Toward Watermark-Free TTS with Strong Traceability
by: Zhao, Yuxiang, et al.
Published: (2025)

Step-Audio-R1.5 Technical Report
by: Zhang, Yuxin, et al.
Published: (2026)

Nord-Parl-TTS: Finnish and Swedish TTS Dataset from Parliament Speech
by: Li, Zirui, et al.
Published: (2025)

How Open is Open TTS? A Practical Evaluation of Open Source TTS Tools
by: Răgman, Teodora, et al.
Published: (2026)

SponTTS: modeling and transferring spontaneous style for TTS
by: Li, Hanzhao, et al.
Published: (2023)

Scalable Controllable Accented TTS
by: Xinyuan, Henry Li, et al.
Published: (2025)

Technical report: Impact of Duration Prediction on Speaker-specific TTS for Indian Languages
by: Pandey, Isha, et al.
Published: (2025)

E1 TTS: Simple and Fast Non-Autoregressive TTS
by: Liu, Zhijun, et al.
Published: (2024)

Zero-Shot TTS With Enhanced Audio Prompts: Bsc Submission For The 2026 Wildspoof Challenge TTS Track
by: Giraldo, Jose, et al.
Published: (2026)

WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark
by: Ma, Linhan, et al.
Published: (2024)

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
by: Eskimez, Sefik Emre, et al.
Published: (2024)

ManaTTS Persian: a recipe for creating TTS datasets for lower resource languages
by: Qharabagh, Mahta Fetrat, et al.
Published: (2024)

Evaluation of preprocessing pipelines in the creation of in-the-wild TTS datasets
by: Di Bernardo, Matías, et al.
Published: (2025)

Qwen3.5-Omni Technical Report
by: Qwen Team
Published: (2026)

Natural Yet Challenging to Detect: Robust In-the-Wild TTS through EMA and Dual-Scoring Prompt Selection -- Submission for WildSpoof 2026 TTS Track
by: Sun, Renhe, et al.
Published: (2026)

QuarkAudio Technical Report
by: Liu, Chengwei, et al.
Published: (2025)

Index-ASR Technical Report
by: Song, Zheshu, et al.
Published: (2025)

MoE-TTS: Enhancing Out-of-Domain Text Understanding for Description-based TTS via Mixture-of-Experts
by: Xue, Heyang, et al.
Published: (2025)

Towards Prosodically Informed Mizo TTS without Explicit Tone Markings
by: Mohanta, Abhijit, et al.
Published: (2026)

Raon-OpenTTS: Open Models and Data for Robust Text-to-Speech
by: Kim, Semin, et al.
Published: (2026)

Towards Flow-Matching-based TTS without Classifier-Free Guidance
by: Liang, Yuzhe, et al.
Published: (2025)

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
by: Chen, Yushen, et al.
Published: (2024)

Enhancing Conversational TTS with Cascaded Prompting and ICL-Based Online Reinforcement Learning
by: Ouyang, Zhicheng, et al.
Published: (2026)

T-Mimi: A Transformer-based Mimi Decoder for Real-Time On-Phone TTS
by: Wu, Haibin, et al.
Published: (2026)

KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis
by: Abilbekov, Adal, et al.
Published: (2024)

Arabic TTS with FastPitch: Reproducible Baselines, Adversarial Training, and Oversmoothing Analysis
by: Nippert, Lars
Published: (2025)

FNH-TTS: Mixture-of-Experts Duration Modeling for Robust Neural Speech Synthesis
by: Meng, Qingliang, et al.
Published: (2025)

MELA-TTS: Joint transformer-diffusion model with representation alignment for speech synthesis
by: An, Keyu, et al.
Published: (2025)

EE-TTS: Emphatic Expressive TTS with Linguistic Information
by: Zhong, Yi, et al.
Published: (2023)

HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset
by: Langman, Ryan, et al.
Published: (2025)

Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration
by: Yang, Yifan, et al.
Published: (2025)

Accent-VITS:accent transfer for end-to-end TTS
by: Ma, Linhan, et al.
Published: (2023)

A Dataset for Automatic Assessment of TTS Quality in Spanish
by: Welford, Alejandro Sosa, et al.
Published: (2025)

Intelli-Z: Toward Intelligible Zero-Shot TTS
by: Jung, Sunghee, et al.
Published: (2024)

Zero-shot Cross-lingual Voice Transfer for TTS
by: Biadsy, Fadi, et al.
Published: (2024)

Baichuan-Omni-1.5 Technical Report
by: Li, Yadong, et al.
Published: (2025)

Beyond Two-stage Diffusion TTS: Joint Structure and Content Refinement via Jump Diffusion
by: Ai, Jiabao, et al.
Published: (2026)