:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Singh, Jaskaran, Chowdhury, Amartya Roy, Prabhakar, Raghav, W, Varshul C.
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Computation and Language
Online Access:	https://arxiv.org/abs/2508.14049
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MunTTS: A Text-to-Speech System for Mundari
by: Gumma, Varun, et al.
Published: (2024)

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing
by: Zheng, Zhisheng, et al.
Published: (2025)

CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
by: Li, Xiang, et al.
Published: (2024)

StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
by: Liu, Sen, et al.
Published: (2024)

GSA-TTS : Toward Zero-Shot Speech Synthesis based on Gradual Style Adaptor
by: Lee, Seokgi, et al.
Published: (2025)

Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
by: Di, Xinhan, et al.
Published: (2024)

DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation
by: Chen, Ziqi, et al.
Published: (2025)

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
by: Ma, Ziyang, et al.
Published: (2023)

TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis
by: Wang, Xi, et al.
Published: (2026)

TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch
by: Song, Xingchen, et al.
Published: (2024)

TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
by: Bataev, Vladimir, et al.
Published: (2025)

HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
by: Li, Yingting, et al.
Published: (2024)

Indonesian-English Code-Switching Speech Synthesizer Utilizing Multilingual STEN-TTS and Bert LID
by: Handoyo, Ahmad Alfani, et al.
Published: (2024)

Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation
by: Lou, Haowei, et al.
Published: (2025)

XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
by: Casanova, Edresson, et al.
Published: (2024)

Speech Recognition Model Improves Text-to-Speech Synthesis using Fine-Grained Reward
by: Wang, Guansu, et al.
Published: (2025)

FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
by: Liu, Yutong, et al.
Published: (2025)

A Unified Speech LLM for Diarization and Speech Recognition in Multilingual Conversations
by: Saengthong, Phurich, et al.
Published: (2025)

Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness
by: Feng, Xincan, et al.
Published: (2024)

GOAT-TTS: Expressive and Realistic Speech Generation via A Dual-Branch LLM
by: Song, Yaodong, et al.
Published: (2025)

Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
by: He, Haorui, et al.
Published: (2024)

SpeechTaxi: On Multilingual Semantic Speech Classification
by: Keller, Lennart, et al.
Published: (2024)

Enhancing Multilingual Voice Toxicity Detection with Speech-Text Alignment
by: Liu, Joseph, et al.
Published: (2024)

A2TTS: TTS for Low Resource Indian Languages
by: Bhadoriya, Ayush Singh, et al.
Published: (2025)

Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond
by: Lee, Beomseok, et al.
Published: (2024)

EE-TTS: Emphatic Expressive TTS with Linguistic Information
by: Zhong, Yi, et al.
Published: (2023)

Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
by: Zhou, Kun, et al.
Published: (2024)

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
by: Łajszczak, Mateusz, et al.
Published: (2024)

Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
by: Li, Tianpeng, et al.
Published: (2025)

UtterTune: LoRA-Based Target-Language Pronunciation Edit and Control in Multilingual Text-to-Speech
by: Kato, Shuhei
Published: (2025)

StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion
by: Li, Fengjin, et al.
Published: (2025)

Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
by: Chen, Chen, et al.
Published: (2024)

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
by: Yang, Yifan, et al.
Published: (2025)

Evaluating Speech-to-Text x LLM x Text-to-Speech Combinations for AI Interview Systems
by: Allbert, Rumi, et al.
Published: (2025)

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
by: Kawamura, Masaya, et al.
Published: (2024)

Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis
by: Do, Cong-Thanh, et al.
Published: (2024)

Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation
by: Kim, Minsu, et al.
Published: (2023)

DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
by: Liu, Jiaxuan, et al.
Published: (2024)

Configurable Multilingual ASR with Speech Summary Representations
by: Zhu, Harrison, et al.
Published: (2024)

Classification of Spontaneous and Scripted Speech for Multilingual Audio
by: Elisha, Shahar, et al.
Published: (2024)