Saved in:
| Main Authors: | He, Jiaxu, Wang, Chao, Lian, Jie, Cai, Yuqing, Li, Yongxiang, Duojie, Renzeg, Li, Jie |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.02496 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GOAT-TTS: Expressive and Realistic Speech Generation via A Dual-Branch LLM
by: Song, Yaodong, et al.
Published: (2025)
by: Song, Yaodong, et al.
Published: (2025)
Context-Aware Dynamic Chunking for Streaming Tibetan Speech Recognition
by: Wang, Chao, et al.
Published: (2025)
by: Wang, Chao, et al.
Published: (2025)
FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
by: Liu, Yutong, et al.
Published: (2025)
by: Liu, Yutong, et al.
Published: (2025)
BoSS: Beyond-Semantic Speech
by: Wang, Qing, et al.
Published: (2025)
by: Wang, Qing, et al.
Published: (2025)
A Unified Spoken Language Model with Injected Emotional-Attribution Thinking for Human-like Interaction
by: Wang, Qing, et al.
Published: (2026)
by: Wang, Qing, et al.
Published: (2026)
TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Framework for Ü-Tsang, Amdo and Kham Speech Dataset Generation
by: Liu, Yutong, et al.
Published: (2025)
by: Liu, Yutong, et al.
Published: (2025)
Tibetan Language and AI: A Comprehensive Survey of Resources, Methods and Challenges
by: Huang, Cheng, et al.
Published: (2025)
by: Huang, Cheng, et al.
Published: (2025)
TLUE: A Tibetan Language Understanding Evaluation Benchmark
by: Gao, Fan, et al.
Published: (2025)
by: Gao, Fan, et al.
Published: (2025)
A2TTS: TTS for Low Resource Indian Languages
by: Bhadoriya, Ayush Singh, et al.
Published: (2025)
by: Bhadoriya, Ayush Singh, et al.
Published: (2025)
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
by: Li, Yingting, et al.
Published: (2024)
by: Li, Yingting, et al.
Published: (2024)
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
by: Bataev, Vladimir, et al.
Published: (2025)
by: Bataev, Vladimir, et al.
Published: (2025)
Modeling Sarcastic Speech: Semantic and Prosodic Cues in a Speech Synthesis Framework
by: Li, Zhu, et al.
Published: (2025)
by: Li, Zhu, et al.
Published: (2025)
Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis
by: Xu, Tianyi, et al.
Published: (2025)
by: Xu, Tianyi, et al.
Published: (2025)
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
by: Li, Xiang, et al.
Published: (2024)
by: Li, Xiang, et al.
Published: (2024)
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
by: Di, Xinhan, et al.
Published: (2024)
by: Di, Xinhan, et al.
Published: (2024)
Leveraging Large Language Models for Sarcastic Speech Annotation in Sarcasm Detection
by: Li, Zhu, et al.
Published: (2025)
by: Li, Zhu, et al.
Published: (2025)
GOAT-SLM: A Spoken Language Model with Paralinguistic and Speaker Characteristic Awareness
by: Chen, Hongjie, et al.
Published: (2025)
by: Chen, Hongjie, et al.
Published: (2025)
WenetSpeech-Chuan: A Large-Scale Sichuanese Corpus with Rich Annotation for Dialectal Speech Processing
by: Dai, Yuhang, et al.
Published: (2025)
by: Dai, Yuhang, et al.
Published: (2025)
GSA-TTS : Toward Zero-Shot Speech Synthesis based on Gradual Style Adaptor
by: Lee, Seokgi, et al.
Published: (2025)
by: Lee, Seokgi, et al.
Published: (2025)
RephraseTTS: Dynamic Length Text based Speech Insertion with Speaker Style Transfer
by: Matiyali, Neeraj, et al.
Published: (2025)
by: Matiyali, Neeraj, et al.
Published: (2025)
Unsupervised TTS Acoustic Modeling for TTS with Conditional Disentangled Sequential VAE
by: Lian, Jiachen, et al.
Published: (2022)
by: Lian, Jiachen, et al.
Published: (2022)
TELEVAL: A Dynamic Benchmark Designed for Spoken Language Models in Chinese Interactive Scenarios
by: Li, Zehan, et al.
Published: (2025)
by: Li, Zehan, et al.
Published: (2025)
Boosting Large Language Model for Speech Synthesis: An Empirical Study
by: Hao, Hongkun, et al.
Published: (2023)
by: Hao, Hongkun, et al.
Published: (2023)
MunTTS: A Text-to-Speech System for Mundari
by: Gumma, Varun, et al.
Published: (2024)
by: Gumma, Varun, et al.
Published: (2024)
Borderless Long Speech Synthesis
by: Song, Xingchen, et al.
Published: (2026)
by: Song, Xingchen, et al.
Published: (2026)
DiFlow-TTS: Compact and Low-Latency Zero-Shot Text-to-Speech with Factorized Discrete Flow Matching
by: Nguyen, Ngoc-Son, et al.
Published: (2025)
by: Nguyen, Ngoc-Son, et al.
Published: (2025)
Adapting Where It Matters: Depth-Aware Adaptation for Efficient Multilingual Speech Recognition in Low-Resource Languages
by: Xiao, Yang, et al.
Published: (2026)
by: Xiao, Yang, et al.
Published: (2026)
MOSS-TTS Technical Report
by: Gong, Yitian, et al.
Published: (2026)
by: Gong, Yitian, et al.
Published: (2026)
Ti-Audio: The First Multi-Dialectal End-to-End Speech LLM for Tibetan
by: Wang, Jialing, et al.
Published: (2026)
by: Wang, Jialing, et al.
Published: (2026)
DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation
by: Chen, Ziqi, et al.
Published: (2025)
by: Chen, Ziqi, et al.
Published: (2025)
Few-Shot Contrastive Adaptation for Audio Abuse Detection in Low-Resource Indic Languages
by: Sankaran, Aditya Narayan, et al.
Published: (2026)
by: Sankaran, Aditya Narayan, et al.
Published: (2026)
Learning More with Less: Self-Supervised Approaches for Low-Resource Speech Emotion Recognition
by: Gong, Ziwei, et al.
Published: (2025)
by: Gong, Ziwei, et al.
Published: (2025)
Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness
by: Feng, Xincan, et al.
Published: (2024)
by: Feng, Xincan, et al.
Published: (2024)
TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch
by: Song, Xingchen, et al.
Published: (2024)
by: Song, Xingchen, et al.
Published: (2024)
Speechless: Speech Instruction Training Without Speech for Low Resource Languages
by: Dao, Alan, et al.
Published: (2025)
by: Dao, Alan, et al.
Published: (2025)
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
by: Yang, Yifan, et al.
Published: (2024)
by: Yang, Yifan, et al.
Published: (2024)
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
by: Ma, Ziyang, et al.
Published: (2023)
by: Ma, Ziyang, et al.
Published: (2023)
CLiFT-ASR: A Cross-Lingual Fine-Tuning Framework for Low-Resource Taiwanese Hokkien Speech Recognition
by: Sung, Hung-Yang, et al.
Published: (2025)
by: Sung, Hung-Yang, et al.
Published: (2025)
Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition
by: Yang, Zhengdong, et al.
Published: (2025)
by: Yang, Zhengdong, et al.
Published: (2025)
EE-TTS: Emphatic Expressive TTS with Linguistic Information
by: Zhong, Yi, et al.
Published: (2023)
by: Zhong, Yi, et al.
Published: (2023)
Similar Items
-
GOAT-TTS: Expressive and Realistic Speech Generation via A Dual-Branch LLM
by: Song, Yaodong, et al.
Published: (2025) -
Context-Aware Dynamic Chunking for Streaming Tibetan Speech Recognition
by: Wang, Chao, et al.
Published: (2025) -
FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
by: Liu, Yutong, et al.
Published: (2025) -
BoSS: Beyond-Semantic Speech
by: Wang, Qing, et al.
Published: (2025) -
A Unified Spoken Language Model with Injected Emotional-Attribution Thinking for Human-like Interaction
by: Wang, Qing, et al.
Published: (2026)