Saved in:
| Main Authors: | Yang, Mu, Hansen, John H. L. |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.05977 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Activation Steering for Accent Adaptation in Speech Foundation Models
by: Sun, Jinuo, et al.
Published: (2026)
by: Sun, Jinuo, et al.
Published: (2026)
Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora
by: Nespoli, Francesco, et al.
Published: (2024)
by: Nespoli, Francesco, et al.
Published: (2024)
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
by: Inoue, Sho, et al.
Published: (2024)
by: Inoue, Sho, et al.
Published: (2024)
GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech
by: Wang, Wenbin, et al.
Published: (2024)
by: Wang, Wenbin, et al.
Published: (2024)
AccentBox: Towards High-Fidelity Zero-Shot Accent Generation
by: Zhong, Jinzuomu, et al.
Published: (2024)
by: Zhong, Jinzuomu, et al.
Published: (2024)
Multi-Scale Accent Modeling and Disentangling for Multi-Speaker Multi-Accent Text-to-Speech Synthesis
by: Zhou, Xuehao, et al.
Published: (2024)
by: Zhou, Xuehao, et al.
Published: (2024)
Debatts: Zero-Shot Debating Text-to-Speech Synthesis
by: Huang, Yiqiao, et al.
Published: (2024)
by: Huang, Yiqiao, et al.
Published: (2024)
AccentFold: A Journey through African Accents for Zero-Shot ASR Adaptation to Target Accents
by: Owodunni, Abraham Toluwase, et al.
Published: (2024)
by: Owodunni, Abraham Toluwase, et al.
Published: (2024)
Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis
by: Lu, Ye-Xin, et al.
Published: (2024)
by: Lu, Ye-Xin, et al.
Published: (2024)
Zero-Shot Text-to-Speech from Continuous Text Streams
by: Dang, Trung, et al.
Published: (2024)
by: Dang, Trung, et al.
Published: (2024)
SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech
by: Cheng, Zhuangfei, et al.
Published: (2025)
by: Cheng, Zhuangfei, et al.
Published: (2025)
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
by: Zhang, Bowen, et al.
Published: (2025)
by: Zhang, Bowen, et al.
Published: (2025)
Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation
by: Yang, Mu, et al.
Published: (2024)
by: Yang, Mu, et al.
Published: (2024)
Effects of Speaker Count, Duration, and Accent Diversity on Zero-Shot Accent Robustness in Low-Resource ASR
by: Yong, Zheng-Xin, et al.
Published: (2025)
by: Yong, Zheng-Xin, et al.
Published: (2025)
Bridging the Modality Gap: Softly Discretizing Audio Representation for LLM-based Automatic Speech Recognition
by: Yang, Mu, et al.
Published: (2025)
by: Yang, Mu, et al.
Published: (2025)
Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis
by: R, Vinotha, et al.
Published: (2024)
by: R, Vinotha, et al.
Published: (2024)
Zero-Shot Text-to-Speech for Vietnamese
by: Vu, Thi, et al.
Published: (2025)
by: Vu, Thi, et al.
Published: (2025)
Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis
by: Wang, Tianrui, et al.
Published: (2025)
by: Wang, Tianrui, et al.
Published: (2025)
Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction
by: Zhang, Leying, et al.
Published: (2025)
by: Zhang, Leying, et al.
Published: (2025)
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
by: Ji, Shengpeng, et al.
Published: (2024)
by: Ji, Shengpeng, et al.
Published: (2024)
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder
by: Melechovsky, Jan, et al.
Published: (2022)
by: Melechovsky, Jan, et al.
Published: (2022)
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
by: Melechovsky, Jan, et al.
Published: (2024)
by: Melechovsky, Jan, et al.
Published: (2024)
Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System
by: Li, Ze, et al.
Published: (2024)
by: Li, Ze, et al.
Published: (2024)
Mixture of LoRA Experts with Multi-Modal and Multi-Granularity LLM Generative Error Correction for Accented Speech Recognition
by: Mu, Bingshen, et al.
Published: (2025)
by: Mu, Bingshen, et al.
Published: (2025)
EmoSteer-TTS: Fine-Grained and Training-Free Emotion-Controllable Text-to-Speech via Activation Steering
by: Xie, Tianxin, et al.
Published: (2025)
by: Xie, Tianxin, et al.
Published: (2025)
Advancing Zero-Shot Open-Set Speech Deepfake Source Tracing
by: Chhibber, Manasi, et al.
Published: (2025)
by: Chhibber, Manasi, et al.
Published: (2025)
DAT-CFTNet: Speech Enhancement for Cochlear Implant Recipients using Attention-based Dual-Path Recurrent Neural Network
by: Mamun, Nursadul, et al.
Published: (2026)
by: Mamun, Nursadul, et al.
Published: (2026)
MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition
by: Mu, Bingshen, et al.
Published: (2024)
by: Mu, Bingshen, et al.
Published: (2024)
Analyzing the Impact of Accent on English Speech: Acoustic and Articulatory Perspectives
by: Premananth, Gowtham, et al.
Published: (2025)
by: Premananth, Gowtham, et al.
Published: (2025)
Towards Zero-Shot Text-To-Speech for Arabic Dialects
by: Doan, Khai Duy, et al.
Published: (2024)
by: Doan, Khai Duy, et al.
Published: (2024)
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
by: Zhu, Han, et al.
Published: (2025)
by: Zhu, Han, et al.
Published: (2025)
CosyEdit: Unlocking End-to-End Speech Editing Capability from Zero-Shot Text-to-Speech Models
by: Chen, Junyang, et al.
Published: (2026)
by: Chen, Junyang, et al.
Published: (2026)
CodecMOS-Accent: A MOS Benchmark of Resynthesized and TTS Speech from Neural Codecs Across English Accents
by: Huang, Wen-Chin, et al.
Published: (2026)
by: Huang, Wen-Chin, et al.
Published: (2026)
Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis
by: Do, Cong-Thanh, et al.
Published: (2024)
by: Do, Cong-Thanh, et al.
Published: (2024)
LID Models are Actually Accent Classifiers: Implications and Solutions for LID on Accented Speech
by: Bafna, Niyati, et al.
Published: (2025)
by: Bafna, Niyati, et al.
Published: (2025)
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
by: Kim, Jaehyeon, et al.
Published: (2024)
by: Kim, Jaehyeon, et al.
Published: (2024)
Parallel GPT: Harmonizing the Independence and Interdependence of Acoustic and Semantic Information for Zero-Shot Text-to-Speech
by: Xing, Jingyuan, et al.
Published: (2025)
by: Xing, Jingyuan, et al.
Published: (2025)
Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training
by: Melechovsky, Jan, et al.
Published: (2024)
by: Melechovsky, Jan, et al.
Published: (2024)
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
by: Li, Xuyuan, et al.
Published: (2024)
by: Li, Xuyuan, et al.
Published: (2024)
Emotion-Aware Prefix: Towards Explicit Emotion Control in Voice Conversion Models
by: Yang, Haoyuan, et al.
Published: (2026)
by: Yang, Haoyuan, et al.
Published: (2026)
Similar Items
-
Activation Steering for Accent Adaptation in Speech Foundation Models
by: Sun, Jinuo, et al.
Published: (2026) -
Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora
by: Nespoli, Francesco, et al.
Published: (2024) -
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
by: Inoue, Sho, et al.
Published: (2024) -
GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech
by: Wang, Wenbin, et al.
Published: (2024) -
AccentBox: Towards High-Fidelity Zero-Shot Accent Generation
by: Zhong, Jinzuomu, et al.
Published: (2024)