Saved in:
| Main Authors: | Hong, Changi, Song, Yoonah, Park, Hwayoung, Bang, Chaewoon, Ku, Dayeon, Lee, Do Hyun, Kim, Hong Kook |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.09111 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Performance Improvement of Language-Queried Audio Source Separation Based on Caption Augmentation From Large Language Models for DCASE Challenge 2024 Task 9
by: Lee, Do Hyun, et al.
Published: (2024)
by: Lee, Do Hyun, et al.
Published: (2024)
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing
by: Sahipjohn, Neha, et al.
Published: (2024)
by: Sahipjohn, Neha, et al.
Published: (2024)
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability
by: Park, Hyun Joon, et al.
Published: (2024)
by: Park, Hyun Joon, et al.
Published: (2024)
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching
by: Park, Hyun Joon, et al.
Published: (2025)
by: Park, Hyun Joon, et al.
Published: (2025)
Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
by: Choi, Jeongsoo, et al.
Published: (2025)
by: Choi, Jeongsoo, et al.
Published: (2025)
Raon-OpenTTS: Open Models and Data for Robust Text-to-Speech
by: Kim, Semin, et al.
Published: (2026)
by: Kim, Semin, et al.
Published: (2026)
VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
by: Sung-Bin, Kim, et al.
Published: (2025)
by: Sung-Bin, Kim, et al.
Published: (2025)
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
by: Liu, Huadai, et al.
Published: (2023)
by: Liu, Huadai, et al.
Published: (2023)
ReFlow-TTS: A Rectified Flow Model for High-fidelity Text-to-Speech
by: Guan, Wenhao, et al.
Published: (2023)
by: Guan, Wenhao, et al.
Published: (2023)
Deep Dubbing: End-to-End Auto-Audiobook System with Text-to-Timbre and Context-Aware Instruct-TTS
by: Dai, Ziqi, et al.
Published: (2025)
by: Dai, Ziqi, et al.
Published: (2025)
FC-TTS: Style and Timbre Control in Zero-Shot Text-to-Speech with Disentangled Speech Representations
by: Lee, Yoonhyung, et al.
Published: (2026)
by: Lee, Yoonhyung, et al.
Published: (2026)
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
by: Guan, Wenhao, et al.
Published: (2023)
by: Guan, Wenhao, et al.
Published: (2023)
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
by: Zhou, Kun, et al.
Published: (2024)
by: Zhou, Kun, et al.
Published: (2024)
Pronunciation Editing for Finnish Speech using Phonetic Posteriorgrams
by: Li, Zirui, et al.
Published: (2025)
by: Li, Zirui, et al.
Published: (2025)
WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark
by: Ma, Linhan, et al.
Published: (2024)
by: Ma, Linhan, et al.
Published: (2024)
Performance improvement of spatial semantic segmentation with enriched audio features and agent-based error correction for DCASE 2025 Challenge Task 4
by: Park, Jongyeon, et al.
Published: (2025)
by: Park, Jongyeon, et al.
Published: (2025)
KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis
by: Abilbekov, Adal, et al.
Published: (2024)
by: Abilbekov, Adal, et al.
Published: (2024)
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
by: Kim, Jaehyeon, et al.
Published: (2024)
by: Kim, Jaehyeon, et al.
Published: (2024)
MathReader : Text-to-Speech for Mathematical Documents
by: Hyeon, Sieun, et al.
Published: (2025)
by: Hyeon, Sieun, et al.
Published: (2025)
Nord-Parl-TTS: Finnish and Swedish TTS Dataset from Parliament Speech
by: Li, Zirui, et al.
Published: (2025)
by: Li, Zirui, et al.
Published: (2025)
OV-InstructTTS: Towards Open-Vocabulary Instruct Text-to-Speech
by: Ren, Yong, et al.
Published: (2026)
by: Ren, Yong, et al.
Published: (2026)
SupertonicTTS: Towards Highly Efficient and Streamlined Text-to-Speech System
by: Kim, Hyeongju, et al.
Published: (2025)
by: Kim, Hyeongju, et al.
Published: (2025)
FireRedTTS-1S: An Upgraded Streamable Foundation Text-to-Speech System
by: Guo, Hao-Han, et al.
Published: (2025)
by: Guo, Hao-Han, et al.
Published: (2025)
DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis
by: Lu, Ye-Xin, et al.
Published: (2025)
by: Lu, Ye-Xin, et al.
Published: (2025)
MunTTS: A Text-to-Speech System for Mundari
by: Gumma, Varun, et al.
Published: (2024)
by: Gumma, Varun, et al.
Published: (2024)
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
by: Guo, Hao-Han, et al.
Published: (2024)
by: Guo, Hao-Han, et al.
Published: (2024)
NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers
by: Park, Nohil, et al.
Published: (2024)
by: Park, Nohil, et al.
Published: (2024)
Sound event detection based on auxiliary decoder and maximum probability aggregation for DCASE Challenge 2024 Task 4
by: Son, Sang Won, et al.
Published: (2024)
by: Son, Sang Won, et al.
Published: (2024)
Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting
by: Han, Wooseok, et al.
Published: (2024)
by: Han, Wooseok, et al.
Published: (2024)
MahaTTS: A Unified Framework for Multilingual Text-to-Speech Synthesis
by: Singh, Jaskaran, et al.
Published: (2025)
by: Singh, Jaskaran, et al.
Published: (2025)
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech
by: Gudmalwar, Ashishkumar, et al.
Published: (2024)
by: Gudmalwar, Ashishkumar, et al.
Published: (2024)
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
by: Fu, Ruibo, et al.
Published: (2024)
by: Fu, Ruibo, et al.
Published: (2024)
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis
by: Guo, Yinlin, et al.
Published: (2024)
by: Guo, Yinlin, et al.
Published: (2024)
Rethinking Speech Representation Aggregation in Speech Enhancement: A Phonetic Mutual Information Perspective
by: Han, Seungu, et al.
Published: (2026)
by: Han, Seungu, et al.
Published: (2026)
LoRP-TTS: Low-Rank Personalized Text-To-Speech
by: Bondaruk, Łukasz, et al.
Published: (2025)
by: Bondaruk, Łukasz, et al.
Published: (2025)
MoE-TTS: Enhancing Out-of-Domain Text Understanding for Description-based TTS via Mixture-of-Experts
by: Xue, Heyang, et al.
Published: (2025)
by: Xue, Heyang, et al.
Published: (2025)
Speech Codec Probing from Semantic and Phonetic Perspectives
by: Shi, Xuan, et al.
Published: (2026)
by: Shi, Xuan, et al.
Published: (2026)
SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS Model
by: Wang, Kaidi, et al.
Published: (2025)
by: Wang, Kaidi, et al.
Published: (2025)
Reconstruction of the Vocal Tract from Speech via Phonetic Representations Using MRI Data
by: Azzouz, Sofiane, et al.
Published: (2026)
by: Azzouz, Sofiane, et al.
Published: (2026)
Discrete Diffusion for Generative Modeling of Text-Aligned Speech Tokens
by: Ku, Pin-Jui, et al.
Published: (2025)
by: Ku, Pin-Jui, et al.
Published: (2025)
Similar Items
-
Performance Improvement of Language-Queried Audio Source Separation Based on Caption Augmentation From Large Language Models for DCASE Challenge 2024 Task 9
by: Lee, Do Hyun, et al.
Published: (2024) -
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing
by: Sahipjohn, Neha, et al.
Published: (2024) -
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability
by: Park, Hyun Joon, et al.
Published: (2024) -
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching
by: Park, Hyun Joon, et al.
Published: (2025) -
Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
by: Choi, Jeongsoo, et al.
Published: (2025)