Saved in:
| Main Authors: | Singh, Jaskaran, Chowdhury, Amartya Roy, Prabhakar, Raghav, W, Varshul C. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.14049 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MunTTS: A Text-to-Speech System for Mundari
by: Gumma, Varun, et al.
Published: (2024)
by: Gumma, Varun, et al.
Published: (2024)
VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing
by: Zheng, Zhisheng, et al.
Published: (2025)
by: Zheng, Zhisheng, et al.
Published: (2025)
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
by: Li, Xiang, et al.
Published: (2024)
by: Li, Xiang, et al.
Published: (2024)
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
by: Liu, Sen, et al.
Published: (2024)
by: Liu, Sen, et al.
Published: (2024)
GSA-TTS : Toward Zero-Shot Speech Synthesis based on Gradual Style Adaptor
by: Lee, Seokgi, et al.
Published: (2025)
by: Lee, Seokgi, et al.
Published: (2025)
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
by: Di, Xinhan, et al.
Published: (2024)
by: Di, Xinhan, et al.
Published: (2024)
DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation
by: Chen, Ziqi, et al.
Published: (2025)
by: Chen, Ziqi, et al.
Published: (2025)
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
by: Ma, Ziyang, et al.
Published: (2023)
by: Ma, Ziyang, et al.
Published: (2023)
TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis
by: Wang, Xi, et al.
Published: (2026)
by: Wang, Xi, et al.
Published: (2026)
TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch
by: Song, Xingchen, et al.
Published: (2024)
by: Song, Xingchen, et al.
Published: (2024)
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
by: Bataev, Vladimir, et al.
Published: (2025)
by: Bataev, Vladimir, et al.
Published: (2025)
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
by: Li, Yingting, et al.
Published: (2024)
by: Li, Yingting, et al.
Published: (2024)
Indonesian-English Code-Switching Speech Synthesizer Utilizing Multilingual STEN-TTS and Bert LID
by: Handoyo, Ahmad Alfani, et al.
Published: (2024)
by: Handoyo, Ahmad Alfani, et al.
Published: (2024)
Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation
by: Lou, Haowei, et al.
Published: (2025)
by: Lou, Haowei, et al.
Published: (2025)
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
by: Casanova, Edresson, et al.
Published: (2024)
by: Casanova, Edresson, et al.
Published: (2024)
Speech Recognition Model Improves Text-to-Speech Synthesis using Fine-Grained Reward
by: Wang, Guansu, et al.
Published: (2025)
by: Wang, Guansu, et al.
Published: (2025)
FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
by: Liu, Yutong, et al.
Published: (2025)
by: Liu, Yutong, et al.
Published: (2025)
A Unified Speech LLM for Diarization and Speech Recognition in Multilingual Conversations
by: Saengthong, Phurich, et al.
Published: (2025)
by: Saengthong, Phurich, et al.
Published: (2025)
Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness
by: Feng, Xincan, et al.
Published: (2024)
by: Feng, Xincan, et al.
Published: (2024)
GOAT-TTS: Expressive and Realistic Speech Generation via A Dual-Branch LLM
by: Song, Yaodong, et al.
Published: (2025)
by: Song, Yaodong, et al.
Published: (2025)
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
by: He, Haorui, et al.
Published: (2024)
by: He, Haorui, et al.
Published: (2024)
SpeechTaxi: On Multilingual Semantic Speech Classification
by: Keller, Lennart, et al.
Published: (2024)
by: Keller, Lennart, et al.
Published: (2024)
Enhancing Multilingual Voice Toxicity Detection with Speech-Text Alignment
by: Liu, Joseph, et al.
Published: (2024)
by: Liu, Joseph, et al.
Published: (2024)
A2TTS: TTS for Low Resource Indian Languages
by: Bhadoriya, Ayush Singh, et al.
Published: (2025)
by: Bhadoriya, Ayush Singh, et al.
Published: (2025)
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond
by: Lee, Beomseok, et al.
Published: (2024)
by: Lee, Beomseok, et al.
Published: (2024)
EE-TTS: Emphatic Expressive TTS with Linguistic Information
by: Zhong, Yi, et al.
Published: (2023)
by: Zhong, Yi, et al.
Published: (2023)
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
by: Zhou, Kun, et al.
Published: (2024)
by: Zhou, Kun, et al.
Published: (2024)
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
by: Łajszczak, Mateusz, et al.
Published: (2024)
by: Łajszczak, Mateusz, et al.
Published: (2024)
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
by: Li, Tianpeng, et al.
Published: (2025)
by: Li, Tianpeng, et al.
Published: (2025)
UtterTune: LoRA-Based Target-Language Pronunciation Edit and Control in Multilingual Text-to-Speech
by: Kato, Shuhei
Published: (2025)
by: Kato, Shuhei
Published: (2025)
StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion
by: Li, Fengjin, et al.
Published: (2025)
by: Li, Fengjin, et al.
Published: (2025)
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
by: Chen, Chen, et al.
Published: (2024)
by: Chen, Chen, et al.
Published: (2024)
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
by: Yang, Yifan, et al.
Published: (2025)
by: Yang, Yifan, et al.
Published: (2025)
Evaluating Speech-to-Text x LLM x Text-to-Speech Combinations for AI Interview Systems
by: Allbert, Rumi, et al.
Published: (2025)
by: Allbert, Rumi, et al.
Published: (2025)
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
by: Kawamura, Masaya, et al.
Published: (2024)
by: Kawamura, Masaya, et al.
Published: (2024)
Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis
by: Do, Cong-Thanh, et al.
Published: (2024)
by: Do, Cong-Thanh, et al.
Published: (2024)
Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation
by: Kim, Minsu, et al.
Published: (2023)
by: Kim, Minsu, et al.
Published: (2023)
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
by: Liu, Jiaxuan, et al.
Published: (2024)
by: Liu, Jiaxuan, et al.
Published: (2024)
Configurable Multilingual ASR with Speech Summary Representations
by: Zhu, Harrison, et al.
Published: (2024)
by: Zhu, Harrison, et al.
Published: (2024)
Classification of Spontaneous and Scripted Speech for Multilingual Audio
by: Elisha, Shahar, et al.
Published: (2024)
by: Elisha, Shahar, et al.
Published: (2024)
Similar Items
-
MunTTS: A Text-to-Speech System for Mundari
by: Gumma, Varun, et al.
Published: (2024) -
VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing
by: Zheng, Zhisheng, et al.
Published: (2025) -
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
by: Li, Xiang, et al.
Published: (2024) -
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
by: Liu, Sen, et al.
Published: (2024) -
GSA-TTS : Toward Zero-Shot Speech Synthesis based on Gradual Style Adaptor
by: Lee, Seokgi, et al.
Published: (2025)