:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	He, Jiaxu, Wang, Chao, Lian, Jie, Cai, Yuqing, Li, Yongxiang, Duojie, Renzeg, Li, Jie
Format:	Preprint
Published:	2026
Subjects:	Sound Computation and Language
Online Access:	https://arxiv.org/abs/2605.02496
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GOAT-TTS: Expressive and Realistic Speech Generation via A Dual-Branch LLM
by: Song, Yaodong, et al.
Published: (2025)

Context-Aware Dynamic Chunking for Streaming Tibetan Speech Recognition
by: Wang, Chao, et al.
Published: (2025)

FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
by: Liu, Yutong, et al.
Published: (2025)

BoSS: Beyond-Semantic Speech
by: Wang, Qing, et al.
Published: (2025)

A Unified Spoken Language Model with Injected Emotional-Attribution Thinking for Human-like Interaction
by: Wang, Qing, et al.
Published: (2026)

TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Framework for Ü-Tsang, Amdo and Kham Speech Dataset Generation
by: Liu, Yutong, et al.
Published: (2025)

Tibetan Language and AI: A Comprehensive Survey of Resources, Methods and Challenges
by: Huang, Cheng, et al.
Published: (2025)

TLUE: A Tibetan Language Understanding Evaluation Benchmark
by: Gao, Fan, et al.
Published: (2025)

A2TTS: TTS for Low Resource Indian Languages
by: Bhadoriya, Ayush Singh, et al.
Published: (2025)

HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
by: Li, Yingting, et al.
Published: (2024)

TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
by: Bataev, Vladimir, et al.
Published: (2025)

Modeling Sarcastic Speech: Semantic and Prosodic Cues in a Speech Synthesis Framework
by: Li, Zhu, et al.
Published: (2025)

Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis
by: Xu, Tianyi, et al.
Published: (2025)

CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
by: Li, Xiang, et al.
Published: (2024)

Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
by: Di, Xinhan, et al.
Published: (2024)

Leveraging Large Language Models for Sarcastic Speech Annotation in Sarcasm Detection
by: Li, Zhu, et al.
Published: (2025)

GOAT-SLM: A Spoken Language Model with Paralinguistic and Speaker Characteristic Awareness
by: Chen, Hongjie, et al.
Published: (2025)

WenetSpeech-Chuan: A Large-Scale Sichuanese Corpus with Rich Annotation for Dialectal Speech Processing
by: Dai, Yuhang, et al.
Published: (2025)

GSA-TTS : Toward Zero-Shot Speech Synthesis based on Gradual Style Adaptor
by: Lee, Seokgi, et al.
Published: (2025)

RephraseTTS: Dynamic Length Text based Speech Insertion with Speaker Style Transfer
by: Matiyali, Neeraj, et al.
Published: (2025)

Unsupervised TTS Acoustic Modeling for TTS with Conditional Disentangled Sequential VAE
by: Lian, Jiachen, et al.
Published: (2022)

TELEVAL: A Dynamic Benchmark Designed for Spoken Language Models in Chinese Interactive Scenarios
by: Li, Zehan, et al.
Published: (2025)

Boosting Large Language Model for Speech Synthesis: An Empirical Study
by: Hao, Hongkun, et al.
Published: (2023)

MunTTS: A Text-to-Speech System for Mundari
by: Gumma, Varun, et al.
Published: (2024)

Borderless Long Speech Synthesis
by: Song, Xingchen, et al.
Published: (2026)

DiFlow-TTS: Compact and Low-Latency Zero-Shot Text-to-Speech with Factorized Discrete Flow Matching
by: Nguyen, Ngoc-Son, et al.
Published: (2025)

Adapting Where It Matters: Depth-Aware Adaptation for Efficient Multilingual Speech Recognition in Low-Resource Languages
by: Xiao, Yang, et al.
Published: (2026)

MOSS-TTS Technical Report
by: Gong, Yitian, et al.
Published: (2026)

Ti-Audio: The First Multi-Dialectal End-to-End Speech LLM for Tibetan
by: Wang, Jialing, et al.
Published: (2026)

DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation
by: Chen, Ziqi, et al.
Published: (2025)

Few-Shot Contrastive Adaptation for Audio Abuse Detection in Low-Resource Indic Languages
by: Sankaran, Aditya Narayan, et al.
Published: (2026)

Learning More with Less: Self-Supervised Approaches for Low-Resource Speech Emotion Recognition
by: Gong, Ziwei, et al.
Published: (2025)

Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness
by: Feng, Xincan, et al.
Published: (2024)

TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch
by: Song, Xingchen, et al.
Published: (2024)

Speechless: Speech Instruction Training Without Speech for Low Resource Languages
by: Dao, Alan, et al.
Published: (2025)

GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
by: Yang, Yifan, et al.
Published: (2024)

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
by: Ma, Ziyang, et al.
Published: (2023)

CLiFT-ASR: A Cross-Lingual Fine-Tuning Framework for Low-Resource Taiwanese Hokkien Speech Recognition
by: Sung, Hung-Yang, et al.
Published: (2025)

Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition
by: Yang, Zhengdong, et al.
Published: (2025)

EE-TTS: Emphatic Expressive TTS with Linguistic Information
by: Zhong, Yi, et al.
Published: (2023)