:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hong, Changi, Song, Yoonah, Park, Hwayoung, Bang, Chaewoon, Ku, Dayeon, Lee, Do Hyun, Kim, Hong Kook
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.09111
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Performance Improvement of Language-Queried Audio Source Separation Based on Caption Augmentation From Large Language Models for DCASE Challenge 2024 Task 9
by: Lee, Do Hyun, et al.
Published: (2024)

DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing
by: Sahipjohn, Neha, et al.
Published: (2024)

DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability
by: Park, Hyun Joon, et al.
Published: (2024)

RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching
by: Park, Hyun Joon, et al.
Published: (2025)

Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
by: Choi, Jeongsoo, et al.
Published: (2025)

Raon-OpenTTS: Open Models and Data for Robust Text-to-Speech
by: Kim, Semin, et al.
Published: (2026)

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
by: Sung-Bin, Kim, et al.
Published: (2025)

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
by: Liu, Huadai, et al.
Published: (2023)

ReFlow-TTS: A Rectified Flow Model for High-fidelity Text-to-Speech
by: Guan, Wenhao, et al.
Published: (2023)

Deep Dubbing: End-to-End Auto-Audiobook System with Text-to-Timbre and Context-Aware Instruct-TTS
by: Dai, Ziqi, et al.
Published: (2025)

FC-TTS: Style and Timbre Control in Zero-Shot Text-to-Speech with Disentangled Speech Representations
by: Lee, Yoonhyung, et al.
Published: (2026)

MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
by: Guan, Wenhao, et al.
Published: (2023)

Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
by: Zhou, Kun, et al.
Published: (2024)

Pronunciation Editing for Finnish Speech using Phonetic Posteriorgrams
by: Li, Zirui, et al.
Published: (2025)

WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark
by: Ma, Linhan, et al.
Published: (2024)

Performance improvement of spatial semantic segmentation with enriched audio features and agent-based error correction for DCASE 2025 Challenge Task 4
by: Park, Jongyeon, et al.
Published: (2025)

KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis
by: Abilbekov, Adal, et al.
Published: (2024)

CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
by: Kim, Jaehyeon, et al.
Published: (2024)

MathReader : Text-to-Speech for Mathematical Documents
by: Hyeon, Sieun, et al.
Published: (2025)

Nord-Parl-TTS: Finnish and Swedish TTS Dataset from Parliament Speech
by: Li, Zirui, et al.
Published: (2025)

OV-InstructTTS: Towards Open-Vocabulary Instruct Text-to-Speech
by: Ren, Yong, et al.
Published: (2026)

SupertonicTTS: Towards Highly Efficient and Streamlined Text-to-Speech System
by: Kim, Hyeongju, et al.
Published: (2025)

FireRedTTS-1S: An Upgraded Streamable Foundation Text-to-Speech System
by: Guo, Hao-Han, et al.
Published: (2025)

DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis
by: Lu, Ye-Xin, et al.
Published: (2025)

MunTTS: A Text-to-Speech System for Mundari
by: Gumma, Varun, et al.
Published: (2024)

FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
by: Guo, Hao-Han, et al.
Published: (2024)

NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers
by: Park, Nohil, et al.
Published: (2024)

Sound event detection based on auxiliary decoder and maximum probability aggregation for DCASE Challenge 2024 Task 4
by: Son, Sang Won, et al.
Published: (2024)

Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting
by: Han, Wooseok, et al.
Published: (2024)

MahaTTS: A Unified Framework for Multilingual Text-to-Speech Synthesis
by: Singh, Jaskaran, et al.
Published: (2025)

VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech
by: Gudmalwar, Ashishkumar, et al.
Published: (2024)

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
by: Fu, Ruibo, et al.
Published: (2024)

FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis
by: Guo, Yinlin, et al.
Published: (2024)

Rethinking Speech Representation Aggregation in Speech Enhancement: A Phonetic Mutual Information Perspective
by: Han, Seungu, et al.
Published: (2026)

LoRP-TTS: Low-Rank Personalized Text-To-Speech
by: Bondaruk, Łukasz, et al.
Published: (2025)

MoE-TTS: Enhancing Out-of-Domain Text Understanding for Description-based TTS via Mixture-of-Experts
by: Xue, Heyang, et al.
Published: (2025)

Speech Codec Probing from Semantic and Phonetic Perspectives
by: Shi, Xuan, et al.
Published: (2026)

SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS Model
by: Wang, Kaidi, et al.
Published: (2025)

Reconstruction of the Vocal Tract from Speech via Phonetic Representations Using MRI Data
by: Azzouz, Sofiane, et al.
Published: (2026)

Discrete Diffusion for Generative Modeling of Text-Aligned Speech Tokens
by: Ku, Pin-Jui, et al.
Published: (2025)