Saved in:
| Main Authors: | Wang, Suhua, Wang, Zifan, Sun, Xiaoxin, Wang, D. J., Liu, Zhanbo, Li, Xin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.22491 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ManWav: The First Manchu ASR Model
by: Seo, Jean, et al.
Published: (2024)
by: Seo, Jean, et al.
Published: (2024)
Mergen: The First Manchu-Korean Machine Translation Model Trained on Augmented Data
by: Seo, Jean, et al.
Published: (2023)
by: Seo, Jean, et al.
Published: (2023)
Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu
by: Pei, Renhao, et al.
Published: (2025)
by: Pei, Renhao, et al.
Published: (2025)
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
by: Liu, Jiaxuan, et al.
Published: (2024)
by: Liu, Jiaxuan, et al.
Published: (2024)
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching
by: Park, Hyun Joon, et al.
Published: (2025)
by: Park, Hyun Joon, et al.
Published: (2025)
FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
by: Liu, Yutong, et al.
Published: (2025)
by: Liu, Yutong, et al.
Published: (2025)
Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding
by: Liu, Tianyun
Published: (2025)
by: Liu, Tianyun
Published: (2025)
TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Framework for Ü-Tsang, Amdo and Kham Speech Dataset Generation
by: Liu, Yutong, et al.
Published: (2025)
by: Liu, Yutong, et al.
Published: (2025)
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
by: Di, Xinhan, et al.
Published: (2024)
by: Di, Xinhan, et al.
Published: (2024)
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
by: Ma, Ziyang, et al.
Published: (2023)
by: Ma, Ziyang, et al.
Published: (2023)
Learning the Manchu Writing System: The Role of Intra‐Symbol Processing in Orthography Acquisition
by: Bai Li, et al.
Published: (2026)
by: Bai Li, et al.
Published: (2026)
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
by: Bataev, Vladimir, et al.
Published: (2025)
by: Bataev, Vladimir, et al.
Published: (2025)
AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis
by: Luo, Dan, et al.
Published: (2025)
by: Luo, Dan, et al.
Published: (2025)
MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
by: Zhao, Xingjian, et al.
Published: (2025)
by: Zhao, Xingjian, et al.
Published: (2025)
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
by: Liu, Zhijun, et al.
Published: (2024)
by: Liu, Zhijun, et al.
Published: (2024)
IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech
by: Zhou, Siyi, et al.
Published: (2025)
by: Zhou, Siyi, et al.
Published: (2025)
DiFlow-TTS: Compact and Low-Latency Zero-Shot Text-to-Speech with Factorized Discrete Flow Matching
by: Nguyen, Ngoc-Son, et al.
Published: (2025)
by: Nguyen, Ngoc-Son, et al.
Published: (2025)
FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow Matching
by: Yun, Jun-Hak, et al.
Published: (2025)
by: Yun, Jun-Hak, et al.
Published: (2025)
Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis
by: Yang, Dong, et al.
Published: (2025)
by: Yang, Dong, et al.
Published: (2025)
Evaluating Ethnic Income Gap in China: The Case of Han, Mongol, and Manchu in Liaoning and Inner Mongolia
by: Deng, Xinyan
Published: (2025)
by: Deng, Xinyan
Published: (2025)
VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation
by: Wang, Yuhao, et al.
Published: (2025)
by: Wang, Yuhao, et al.
Published: (2025)
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
by: Wang, Teng, et al.
Published: (2025)
by: Wang, Teng, et al.
Published: (2025)
MahaTTS: A Unified Framework for Multilingual Text-to-Speech Synthesis
by: Singh, Jaskaran, et al.
Published: (2025)
by: Singh, Jaskaran, et al.
Published: (2025)
Generative Language Models with Retrieval Augmented Generation for Automated Short Answer Scoring
by: Wang, Zifan, et al.
Published: (2024)
by: Wang, Zifan, et al.
Published: (2024)
DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors
by: Lee, Keon, et al.
Published: (2024)
by: Lee, Keon, et al.
Published: (2024)
Finetuning Vision-Language Models as OCR Systems for Low-Resource Languages: A Case Study of Manchu
by: Chung, Yan Hon Michael, et al.
Published: (2025)
by: Chung, Yan Hon Michael, et al.
Published: (2025)
TruthFlow: Truthful LLM Generation via Representation Flow Correction
by: Wang, Hanyu, et al.
Published: (2025)
by: Wang, Hanyu, et al.
Published: (2025)
ComCLIP: Training-Free Compositional Image and Text Matching
by: Jiang, Kenan, et al.
Published: (2022)
by: Jiang, Kenan, et al.
Published: (2022)
DiffuSpeech: Silent Thought, Spoken Answer via Unified Speech-Text Diffusion
by: Lou, Yuxuan, et al.
Published: (2026)
by: Lou, Yuxuan, et al.
Published: (2026)
MOSS-TTS Technical Report
by: Gong, Yitian, et al.
Published: (2026)
by: Gong, Yitian, et al.
Published: (2026)
HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack on Text
by: Liu, Han, et al.
Published: (2024)
by: Liu, Han, et al.
Published: (2024)
SpeechJudge: Towards Human-Level Judgment for Speech Naturalness
by: Zhang, Xueyao, et al.
Published: (2025)
by: Zhang, Xueyao, et al.
Published: (2025)
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
DrDiff: Dynamic Routing Diffusion with Hierarchical Attention for Breaking the Efficiency-Quality Trade-off
by: Zhang, Jusheng, et al.
Published: (2025)
by: Zhang, Jusheng, et al.
Published: (2025)
IntelliAsk: Learning to Ask High-Quality Research Questions via RLVR
by: Sharma, Karun, et al.
Published: (2026)
by: Sharma, Karun, et al.
Published: (2026)
Interpretable Discriminative Text Representations via Agreement and Label Disentanglement
by: Wang, Tong, et al.
Published: (2026)
by: Wang, Tong, et al.
Published: (2026)
Soundwave: Less is More for Speech-Text Alignment in LLMs
by: Zhang, Yuhao, et al.
Published: (2025)
by: Zhang, Yuhao, et al.
Published: (2025)
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
by: Xin, Detai, et al.
Published: (2024)
by: Xin, Detai, et al.
Published: (2024)
Towards Linguistically-informed Representations for English as a Second or Foreign Language: Review, Construction and Application
by: Li, Wenxi, et al.
Published: (2026)
by: Li, Wenxi, et al.
Published: (2026)
DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization
by: Yang, Jianing, et al.
Published: (2026)
by: Yang, Jianing, et al.
Published: (2026)
Similar Items
-
ManWav: The First Manchu ASR Model
by: Seo, Jean, et al.
Published: (2024) -
Mergen: The First Manchu-Korean Machine Translation Model Trained on Augmented Data
by: Seo, Jean, et al.
Published: (2023) -
Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu
by: Pei, Renhao, et al.
Published: (2025) -
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
by: Liu, Jiaxuan, et al.
Published: (2024) -
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching
by: Park, Hyun Joon, et al.
Published: (2025)