:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Suhua, Wang, Zifan, Sun, Xiaoxin, Wang, D. J., Liu, Zhanbo, Li, Xin
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2512.22491
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ManWav: The First Manchu ASR Model
by: Seo, Jean, et al.
Published: (2024)

Mergen: The First Manchu-Korean Machine Translation Model Trained on Augmented Data
by: Seo, Jean, et al.
Published: (2023)

Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu
by: Pei, Renhao, et al.
Published: (2025)

DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
by: Liu, Jiaxuan, et al.
Published: (2024)

RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching
by: Park, Hyun Joon, et al.
Published: (2025)

FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
by: Liu, Yutong, et al.
Published: (2025)

Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding
by: Liu, Tianyun
Published: (2025)

TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Framework for Ü-Tsang, Amdo and Kham Speech Dataset Generation
by: Liu, Yutong, et al.
Published: (2025)

Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
by: Di, Xinhan, et al.
Published: (2024)

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
by: Ma, Ziyang, et al.
Published: (2023)

Learning the Manchu Writing System: The Role of Intra‐Symbol Processing in Orthography Acquisition
by: Bai Li, et al.
Published: (2026)

TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
by: Bataev, Vladimir, et al.
Published: (2025)

AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis
by: Luo, Dan, et al.
Published: (2025)

MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
by: Zhao, Xingjian, et al.
Published: (2025)

Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
by: Liu, Zhijun, et al.
Published: (2024)

IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech
by: Zhou, Siyi, et al.
Published: (2025)

DiFlow-TTS: Compact and Low-Latency Zero-Shot Text-to-Speech with Factorized Discrete Flow Matching
by: Nguyen, Ngoc-Son, et al.
Published: (2025)

FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow Matching
by: Yun, Jun-Hak, et al.
Published: (2025)

Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis
by: Yang, Dong, et al.
Published: (2025)

Evaluating Ethnic Income Gap in China: The Case of Han, Mongol, and Manchu in Liaoning and Inner Mongolia
by: Deng, Xinyan
Published: (2025)

VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation
by: Wang, Yuhao, et al.
Published: (2025)

Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
by: Wang, Teng, et al.
Published: (2025)

MahaTTS: A Unified Framework for Multilingual Text-to-Speech Synthesis
by: Singh, Jaskaran, et al.
Published: (2025)

Generative Language Models with Retrieval Augmented Generation for Automated Short Answer Scoring
by: Wang, Zifan, et al.
Published: (2024)

DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors
by: Lee, Keon, et al.
Published: (2024)

Finetuning Vision-Language Models as OCR Systems for Low-Resource Languages: A Case Study of Manchu
by: Chung, Yan Hon Michael, et al.
Published: (2025)

TruthFlow: Truthful LLM Generation via Representation Flow Correction
by: Wang, Hanyu, et al.
Published: (2025)

ComCLIP: Training-Free Compositional Image and Text Matching
by: Jiang, Kenan, et al.
Published: (2022)

DiffuSpeech: Silent Thought, Spoken Answer via Unified Speech-Text Diffusion
by: Lou, Yuxuan, et al.
Published: (2026)

MOSS-TTS Technical Report
by: Gong, Yitian, et al.
Published: (2026)

HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack on Text
by: Liu, Han, et al.
Published: (2024)

SpeechJudge: Towards Human-Level Judgment for Speech Naturalness
by: Zhang, Xueyao, et al.
Published: (2025)

FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
by: Wang, Hui, et al.
Published: (2025)

DrDiff: Dynamic Routing Diffusion with Hierarchical Attention for Breaking the Efficiency-Quality Trade-off
by: Zhang, Jusheng, et al.
Published: (2025)

IntelliAsk: Learning to Ask High-Quality Research Questions via RLVR
by: Sharma, Karun, et al.
Published: (2026)

Interpretable Discriminative Text Representations via Agreement and Label Disentanglement
by: Wang, Tong, et al.
Published: (2026)

Soundwave: Less is More for Speech-Text Alignment in LLMs
by: Zhang, Yuhao, et al.
Published: (2025)

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
by: Xin, Detai, et al.
Published: (2024)

Towards Linguistically-informed Representations for English as a Second or Foreign Language: Review, Construction and Application
by: Li, Wenxi, et al.
Published: (2026)

DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization
by: Yang, Jianing, et al.
Published: (2026)