:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tee, Hitomi Jin Ling, Wang, Chaoren, Zhang, Zijie, Wu, Zhizheng
Format:	Preprint
Published:	2025
Subjects:	Sound Computation and Language Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2510.26190
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Closing the Modality Reasoning Gap for Speech Large Language Models
by: Wang, Chaoren, et al.
Published: (2026)

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
by: Ao, Junyi, et al.
Published: (2024)

Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
by: Zhang, Xueyao, et al.
Published: (2025)

Word-wise intonation model for cross-language TTS systems
by: A., Tomilov A., et al.
Published: (2024)

Neurodyne: Neural Pitch Manipulation with Representation Learning and Cycle-Consistency GAN
by: Gu, Yicheng, et al.
Published: (2025)

TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch
by: Song, Xingchen, et al.
Published: (2024)

EE-TTS: Emphatic Expressive TTS with Linguistic Information
by: Zhong, Yi, et al.
Published: (2023)

Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
by: He, Haorui, et al.
Published: (2025)

Qwen3-TTS Technical Report
by: Hu, Hangrui, et al.
Published: (2026)

SingVisio: Visual Analytics of Diffusion Model for Singing Voice Conversion
by: Xue, Liumeng, et al.
Published: (2024)

DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation
by: Chen, Ziqi, et al.
Published: (2025)

SingNet: Towards a Large-Scale, Diverse, and In-the-Wild Singing Voice Dataset
by: Gu, Yicheng, et al.
Published: (2025)

Aliasing-Free Neural Audio Synthesis
by: Gu, Yicheng, et al.
Published: (2025)

JoyTTS: LLM-based Spoken Chatbot With Voice Cloning
by: Zhou, Fangru, et al.
Published: (2025)

Beyond Unified Models: A Service-Oriented Approach to Low Latency, Context Aware Phonemization for Real Time TTS
by: Fetrat, Mahta, et al.
Published: (2025)

Word Level Timestamp Generation for Automatic Speech Recognition and Translation
by: Hu, Ke, et al.
Published: (2025)

Multi-interaction TTS toward professional recording reproduction
by: Kanagawa, Hiroki, et al.
Published: (2025)

RWKVTTS: Yet another TTS based on RWKV-7
by: yueyu, Lin, et al.
Published: (2025)

MunTTS: A Text-to-Speech System for Mundari
by: Gumma, Varun, et al.
Published: (2024)

Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness
by: Feng, Xincan, et al.
Published: (2024)

Transfer the linguistic representations from TTS to accent conversion with non-parallel data
by: Chen, Xi, et al.
Published: (2024)

An investigation of phrase break prediction in an End-to-End TTS system
by: Vadapalli, Anandaswarup
Published: (2023)

A Language Modeling Approach to Diacritic-Free Hebrew TTS
by: Roth, Amit, et al.
Published: (2024)

Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
by: Di, Xinhan, et al.
Published: (2024)

Noro: Noise-Robust One-shot Voice Conversion with Hidden Speaker Representation Learning
by: He, Haorui, et al.
Published: (2024)

Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context
by: Ao, Junyi, et al.
Published: (2025)

GOAT-TTS: Expressive and Realistic Speech Generation via A Dual-Branch LLM
by: Song, Yaodong, et al.
Published: (2025)

DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation
by: Li, Jiaqi, et al.
Published: (2025)

Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM
by: Yu, Jiawei, et al.
Published: (2024)

Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition
by: Chevi, Rendi, et al.
Published: (2024)

HiFi-Glot: High-Fidelity Neural Formant Synthesis with Differentiable Resonant Filters
by: Gu, Yicheng, et al.
Published: (2024)

You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties
by: Tuttösí, Paige, et al.
Published: (2025)

A2TTS: TTS for Low Resource Indian Languages
by: Bhadoriya, Ayush Singh, et al.
Published: (2025)

Can we reconstruct a dysarthric voice with the large speech model Parler TTS?
by: Sanchez, Ariadna, et al.
Published: (2025)

GSA-TTS : Toward Zero-Shot Speech Synthesis based on Gradual Style Adaptor
by: Lee, Seokgi, et al.
Published: (2025)

Leveraging the Interplay Between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation
by: Jeon, Yejin, et al.
Published: (2024)

StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
by: Liu, Sen, et al.
Published: (2024)

Accent conversion using discrete units with parallel data synthesized from controllable accented TTS
by: Nguyen, Tuan Nam, et al.
Published: (2024)

DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
by: Liu, Jiaxuan, et al.
Published: (2024)

J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling
by: Nakata, Wataru, et al.
Published: (2024)