Saved in:
| Main Authors: | Arata, Chihiro, Kurihara, Kiyoshi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.01760 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Qwen3-TTS Technical Report
by: Hu, Hangrui, et al.
Published: (2026)
by: Hu, Hangrui, et al.
Published: (2026)
TTS-1 Technical Report
by: Atamanenko, Oleg, et al.
Published: (2025)
by: Atamanenko, Oleg, et al.
Published: (2025)
StepAudio 2.5 Technical Report
by: Lin, Bin, et al.
Published: (2026)
by: Lin, Bin, et al.
Published: (2026)
Traceable TTS: Toward Watermark-Free TTS with Strong Traceability
by: Zhao, Yuxiang, et al.
Published: (2025)
by: Zhao, Yuxiang, et al.
Published: (2025)
Step-Audio-R1.5 Technical Report
by: Zhang, Yuxin, et al.
Published: (2026)
by: Zhang, Yuxin, et al.
Published: (2026)
Nord-Parl-TTS: Finnish and Swedish TTS Dataset from Parliament Speech
by: Li, Zirui, et al.
Published: (2025)
by: Li, Zirui, et al.
Published: (2025)
How Open is Open TTS? A Practical Evaluation of Open Source TTS Tools
by: Răgman, Teodora, et al.
Published: (2026)
by: Răgman, Teodora, et al.
Published: (2026)
SponTTS: modeling and transferring spontaneous style for TTS
by: Li, Hanzhao, et al.
Published: (2023)
by: Li, Hanzhao, et al.
Published: (2023)
Scalable Controllable Accented TTS
by: Xinyuan, Henry Li, et al.
Published: (2025)
by: Xinyuan, Henry Li, et al.
Published: (2025)
Technical report: Impact of Duration Prediction on Speaker-specific TTS for Indian Languages
by: Pandey, Isha, et al.
Published: (2025)
by: Pandey, Isha, et al.
Published: (2025)
E1 TTS: Simple and Fast Non-Autoregressive TTS
by: Liu, Zhijun, et al.
Published: (2024)
by: Liu, Zhijun, et al.
Published: (2024)
Zero-Shot TTS With Enhanced Audio Prompts: Bsc Submission For The 2026 Wildspoof Challenge TTS Track
by: Giraldo, Jose, et al.
Published: (2026)
by: Giraldo, Jose, et al.
Published: (2026)
WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark
by: Ma, Linhan, et al.
Published: (2024)
by: Ma, Linhan, et al.
Published: (2024)
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
by: Eskimez, Sefik Emre, et al.
Published: (2024)
by: Eskimez, Sefik Emre, et al.
Published: (2024)
ManaTTS Persian: a recipe for creating TTS datasets for lower resource languages
by: Qharabagh, Mahta Fetrat, et al.
Published: (2024)
by: Qharabagh, Mahta Fetrat, et al.
Published: (2024)
Evaluation of preprocessing pipelines in the creation of in-the-wild TTS datasets
by: Di Bernardo, Matías, et al.
Published: (2025)
by: Di Bernardo, Matías, et al.
Published: (2025)
Qwen3.5-Omni Technical Report
by: Qwen Team
Published: (2026)
by: Qwen Team
Published: (2026)
Natural Yet Challenging to Detect: Robust In-the-Wild TTS through EMA and Dual-Scoring Prompt Selection -- Submission for WildSpoof 2026 TTS Track
by: Sun, Renhe, et al.
Published: (2026)
by: Sun, Renhe, et al.
Published: (2026)
QuarkAudio Technical Report
by: Liu, Chengwei, et al.
Published: (2025)
by: Liu, Chengwei, et al.
Published: (2025)
Index-ASR Technical Report
by: Song, Zheshu, et al.
Published: (2025)
by: Song, Zheshu, et al.
Published: (2025)
MoE-TTS: Enhancing Out-of-Domain Text Understanding for Description-based TTS via Mixture-of-Experts
by: Xue, Heyang, et al.
Published: (2025)
by: Xue, Heyang, et al.
Published: (2025)
Towards Prosodically Informed Mizo TTS without Explicit Tone Markings
by: Mohanta, Abhijit, et al.
Published: (2026)
by: Mohanta, Abhijit, et al.
Published: (2026)
Raon-OpenTTS: Open Models and Data for Robust Text-to-Speech
by: Kim, Semin, et al.
Published: (2026)
by: Kim, Semin, et al.
Published: (2026)
Towards Flow-Matching-based TTS without Classifier-Free Guidance
by: Liang, Yuzhe, et al.
Published: (2025)
by: Liang, Yuzhe, et al.
Published: (2025)
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
by: Chen, Yushen, et al.
Published: (2024)
by: Chen, Yushen, et al.
Published: (2024)
Enhancing Conversational TTS with Cascaded Prompting and ICL-Based Online Reinforcement Learning
by: Ouyang, Zhicheng, et al.
Published: (2026)
by: Ouyang, Zhicheng, et al.
Published: (2026)
T-Mimi: A Transformer-based Mimi Decoder for Real-Time On-Phone TTS
by: Wu, Haibin, et al.
Published: (2026)
by: Wu, Haibin, et al.
Published: (2026)
KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis
by: Abilbekov, Adal, et al.
Published: (2024)
by: Abilbekov, Adal, et al.
Published: (2024)
Arabic TTS with FastPitch: Reproducible Baselines, Adversarial Training, and Oversmoothing Analysis
by: Nippert, Lars
Published: (2025)
by: Nippert, Lars
Published: (2025)
FNH-TTS: Mixture-of-Experts Duration Modeling for Robust Neural Speech Synthesis
by: Meng, Qingliang, et al.
Published: (2025)
by: Meng, Qingliang, et al.
Published: (2025)
MELA-TTS: Joint transformer-diffusion model with representation alignment for speech synthesis
by: An, Keyu, et al.
Published: (2025)
by: An, Keyu, et al.
Published: (2025)
EE-TTS: Emphatic Expressive TTS with Linguistic Information
by: Zhong, Yi, et al.
Published: (2023)
by: Zhong, Yi, et al.
Published: (2023)
HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset
by: Langman, Ryan, et al.
Published: (2025)
by: Langman, Ryan, et al.
Published: (2025)
Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration
by: Yang, Yifan, et al.
Published: (2025)
by: Yang, Yifan, et al.
Published: (2025)
Accent-VITS:accent transfer for end-to-end TTS
by: Ma, Linhan, et al.
Published: (2023)
by: Ma, Linhan, et al.
Published: (2023)
A Dataset for Automatic Assessment of TTS Quality in Spanish
by: Welford, Alejandro Sosa, et al.
Published: (2025)
by: Welford, Alejandro Sosa, et al.
Published: (2025)
Intelli-Z: Toward Intelligible Zero-Shot TTS
by: Jung, Sunghee, et al.
Published: (2024)
by: Jung, Sunghee, et al.
Published: (2024)
Zero-shot Cross-lingual Voice Transfer for TTS
by: Biadsy, Fadi, et al.
Published: (2024)
by: Biadsy, Fadi, et al.
Published: (2024)
Baichuan-Omni-1.5 Technical Report
by: Li, Yadong, et al.
Published: (2025)
by: Li, Yadong, et al.
Published: (2025)
Beyond Two-stage Diffusion TTS: Joint Structure and Content Refinement via Jump Diffusion
by: Ai, Jiabao, et al.
Published: (2026)
by: Ai, Jiabao, et al.
Published: (2026)
Similar Items
-
Qwen3-TTS Technical Report
by: Hu, Hangrui, et al.
Published: (2026) -
TTS-1 Technical Report
by: Atamanenko, Oleg, et al.
Published: (2025) -
StepAudio 2.5 Technical Report
by: Lin, Bin, et al.
Published: (2026) -
Traceable TTS: Toward Watermark-Free TTS with Strong Traceability
by: Zhao, Yuxiang, et al.
Published: (2025) -
Step-Audio-R1.5 Technical Report
by: Zhang, Yuxin, et al.
Published: (2026)