Saved in:
| Main Authors: | Tee, Hitomi Jin Ling, Wang, Chaoren, Zhang, Zijie, Wu, Zhizheng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.26190 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Closing the Modality Reasoning Gap for Speech Large Language Models
by: Wang, Chaoren, et al.
Published: (2026)
by: Wang, Chaoren, et al.
Published: (2026)
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
by: Ao, Junyi, et al.
Published: (2024)
by: Ao, Junyi, et al.
Published: (2024)
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
by: Zhang, Xueyao, et al.
Published: (2025)
by: Zhang, Xueyao, et al.
Published: (2025)
Word-wise intonation model for cross-language TTS systems
by: A., Tomilov A., et al.
Published: (2024)
by: A., Tomilov A., et al.
Published: (2024)
Neurodyne: Neural Pitch Manipulation with Representation Learning and Cycle-Consistency GAN
by: Gu, Yicheng, et al.
Published: (2025)
by: Gu, Yicheng, et al.
Published: (2025)
TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch
by: Song, Xingchen, et al.
Published: (2024)
by: Song, Xingchen, et al.
Published: (2024)
EE-TTS: Emphatic Expressive TTS with Linguistic Information
by: Zhong, Yi, et al.
Published: (2023)
by: Zhong, Yi, et al.
Published: (2023)
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
by: He, Haorui, et al.
Published: (2025)
by: He, Haorui, et al.
Published: (2025)
Qwen3-TTS Technical Report
by: Hu, Hangrui, et al.
Published: (2026)
by: Hu, Hangrui, et al.
Published: (2026)
SingVisio: Visual Analytics of Diffusion Model for Singing Voice Conversion
by: Xue, Liumeng, et al.
Published: (2024)
by: Xue, Liumeng, et al.
Published: (2024)
DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation
by: Chen, Ziqi, et al.
Published: (2025)
by: Chen, Ziqi, et al.
Published: (2025)
SingNet: Towards a Large-Scale, Diverse, and In-the-Wild Singing Voice Dataset
by: Gu, Yicheng, et al.
Published: (2025)
by: Gu, Yicheng, et al.
Published: (2025)
Aliasing-Free Neural Audio Synthesis
by: Gu, Yicheng, et al.
Published: (2025)
by: Gu, Yicheng, et al.
Published: (2025)
JoyTTS: LLM-based Spoken Chatbot With Voice Cloning
by: Zhou, Fangru, et al.
Published: (2025)
by: Zhou, Fangru, et al.
Published: (2025)
Beyond Unified Models: A Service-Oriented Approach to Low Latency, Context Aware Phonemization for Real Time TTS
by: Fetrat, Mahta, et al.
Published: (2025)
by: Fetrat, Mahta, et al.
Published: (2025)
Word Level Timestamp Generation for Automatic Speech Recognition and Translation
by: Hu, Ke, et al.
Published: (2025)
by: Hu, Ke, et al.
Published: (2025)
Multi-interaction TTS toward professional recording reproduction
by: Kanagawa, Hiroki, et al.
Published: (2025)
by: Kanagawa, Hiroki, et al.
Published: (2025)
RWKVTTS: Yet another TTS based on RWKV-7
by: yueyu, Lin, et al.
Published: (2025)
by: yueyu, Lin, et al.
Published: (2025)
MunTTS: A Text-to-Speech System for Mundari
by: Gumma, Varun, et al.
Published: (2024)
by: Gumma, Varun, et al.
Published: (2024)
Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness
by: Feng, Xincan, et al.
Published: (2024)
by: Feng, Xincan, et al.
Published: (2024)
Transfer the linguistic representations from TTS to accent conversion with non-parallel data
by: Chen, Xi, et al.
Published: (2024)
by: Chen, Xi, et al.
Published: (2024)
An investigation of phrase break prediction in an End-to-End TTS system
by: Vadapalli, Anandaswarup
Published: (2023)
by: Vadapalli, Anandaswarup
Published: (2023)
A Language Modeling Approach to Diacritic-Free Hebrew TTS
by: Roth, Amit, et al.
Published: (2024)
by: Roth, Amit, et al.
Published: (2024)
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
by: Di, Xinhan, et al.
Published: (2024)
by: Di, Xinhan, et al.
Published: (2024)
Noro: Noise-Robust One-shot Voice Conversion with Hidden Speaker Representation Learning
by: He, Haorui, et al.
Published: (2024)
by: He, Haorui, et al.
Published: (2024)
Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context
by: Ao, Junyi, et al.
Published: (2025)
by: Ao, Junyi, et al.
Published: (2025)
GOAT-TTS: Expressive and Realistic Speech Generation via A Dual-Branch LLM
by: Song, Yaodong, et al.
Published: (2025)
by: Song, Yaodong, et al.
Published: (2025)
DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation
by: Li, Jiaqi, et al.
Published: (2025)
by: Li, Jiaqi, et al.
Published: (2025)
Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM
by: Yu, Jiawei, et al.
Published: (2024)
by: Yu, Jiawei, et al.
Published: (2024)
Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition
by: Chevi, Rendi, et al.
Published: (2024)
by: Chevi, Rendi, et al.
Published: (2024)
HiFi-Glot: High-Fidelity Neural Formant Synthesis with Differentiable Resonant Filters
by: Gu, Yicheng, et al.
Published: (2024)
by: Gu, Yicheng, et al.
Published: (2024)
You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties
by: Tuttösí, Paige, et al.
Published: (2025)
by: Tuttösí, Paige, et al.
Published: (2025)
A2TTS: TTS for Low Resource Indian Languages
by: Bhadoriya, Ayush Singh, et al.
Published: (2025)
by: Bhadoriya, Ayush Singh, et al.
Published: (2025)
Can we reconstruct a dysarthric voice with the large speech model Parler TTS?
by: Sanchez, Ariadna, et al.
Published: (2025)
by: Sanchez, Ariadna, et al.
Published: (2025)
GSA-TTS : Toward Zero-Shot Speech Synthesis based on Gradual Style Adaptor
by: Lee, Seokgi, et al.
Published: (2025)
by: Lee, Seokgi, et al.
Published: (2025)
Leveraging the Interplay Between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation
by: Jeon, Yejin, et al.
Published: (2024)
by: Jeon, Yejin, et al.
Published: (2024)
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
by: Liu, Sen, et al.
Published: (2024)
by: Liu, Sen, et al.
Published: (2024)
Accent conversion using discrete units with parallel data synthesized from controllable accented TTS
by: Nguyen, Tuan Nam, et al.
Published: (2024)
by: Nguyen, Tuan Nam, et al.
Published: (2024)
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
by: Liu, Jiaxuan, et al.
Published: (2024)
by: Liu, Jiaxuan, et al.
Published: (2024)
J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling
by: Nakata, Wataru, et al.
Published: (2024)
by: Nakata, Wataru, et al.
Published: (2024)
Similar Items
-
Closing the Modality Reasoning Gap for Speech Large Language Models
by: Wang, Chaoren, et al.
Published: (2026) -
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
by: Ao, Junyi, et al.
Published: (2024) -
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
by: Zhang, Xueyao, et al.
Published: (2025) -
Word-wise intonation model for cross-language TTS systems
by: A., Tomilov A., et al.
Published: (2024) -
Neurodyne: Neural Pitch Manipulation with Representation Learning and Cycle-Consistency GAN
by: Gu, Yicheng, et al.
Published: (2025)