:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Mu, Hansen, John H. L.
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2603.05977
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Activation Steering for Accent Adaptation in Speech Foundation Models
by: Sun, Jinuo, et al.
Published: (2026)

Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora
by: Nespoli, Francesco, et al.
Published: (2024)

MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
by: Inoue, Sho, et al.
Published: (2024)

GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech
by: Wang, Wenbin, et al.
Published: (2024)

AccentBox: Towards High-Fidelity Zero-Shot Accent Generation
by: Zhong, Jinzuomu, et al.
Published: (2024)

Multi-Scale Accent Modeling and Disentangling for Multi-Speaker Multi-Accent Text-to-Speech Synthesis
by: Zhou, Xuehao, et al.
Published: (2024)

Debatts: Zero-Shot Debating Text-to-Speech Synthesis
by: Huang, Yiqiao, et al.
Published: (2024)

AccentFold: A Journey through African Accents for Zero-Shot ASR Adaptation to Target Accents
by: Owodunni, Abraham Toluwase, et al.
Published: (2024)

Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis
by: Lu, Ye-Xin, et al.
Published: (2024)

Zero-Shot Text-to-Speech from Continuous Text Streams
by: Dang, Trung, et al.
Published: (2024)

SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech
by: Cheng, Zhuangfei, et al.
Published: (2025)

MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
by: Zhang, Bowen, et al.
Published: (2025)

Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation
by: Yang, Mu, et al.
Published: (2024)

Effects of Speaker Count, Duration, and Accent Diversity on Zero-Shot Accent Robustness in Low-Resource ASR
by: Yong, Zheng-Xin, et al.
Published: (2025)

Bridging the Modality Gap: Softly Discretizing Audio Representation for LLM-based Automatic Speech Recognition
by: Yang, Mu, et al.
Published: (2025)

Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis
by: R, Vinotha, et al.
Published: (2024)

Zero-Shot Text-to-Speech for Vietnamese
by: Vu, Thi, et al.
Published: (2025)

Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis
by: Wang, Tianrui, et al.
Published: (2025)

Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction
by: Zhang, Leying, et al.
Published: (2025)

MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
by: Ji, Shengpeng, et al.
Published: (2024)

Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder
by: Melechovsky, Jan, et al.
Published: (2022)

DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
by: Melechovsky, Jan, et al.
Published: (2024)

Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System
by: Li, Ze, et al.
Published: (2024)

Mixture of LoRA Experts with Multi-Modal and Multi-Granularity LLM Generative Error Correction for Accented Speech Recognition
by: Mu, Bingshen, et al.
Published: (2025)

EmoSteer-TTS: Fine-Grained and Training-Free Emotion-Controllable Text-to-Speech via Activation Steering
by: Xie, Tianxin, et al.
Published: (2025)

Advancing Zero-Shot Open-Set Speech Deepfake Source Tracing
by: Chhibber, Manasi, et al.
Published: (2025)

DAT-CFTNet: Speech Enhancement for Cochlear Implant Recipients using Attention-based Dual-Path Recurrent Neural Network
by: Mamun, Nursadul, et al.
Published: (2026)

MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition
by: Mu, Bingshen, et al.
Published: (2024)

Analyzing the Impact of Accent on English Speech: Acoustic and Articulatory Perspectives
by: Premananth, Gowtham, et al.
Published: (2025)

Towards Zero-Shot Text-To-Speech for Arabic Dialects
by: Doan, Khai Duy, et al.
Published: (2024)

ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
by: Zhu, Han, et al.
Published: (2025)

CosyEdit: Unlocking End-to-End Speech Editing Capability from Zero-Shot Text-to-Speech Models
by: Chen, Junyang, et al.
Published: (2026)

CodecMOS-Accent: A MOS Benchmark of Resynthesized and TTS Speech from Neural Codecs Across English Accents
by: Huang, Wen-Chin, et al.
Published: (2026)

Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis
by: Do, Cong-Thanh, et al.
Published: (2024)

LID Models are Actually Accent Classifiers: Implications and Solutions for LID on Accented Speech
by: Bafna, Niyati, et al.
Published: (2025)

CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
by: Kim, Jaehyeon, et al.
Published: (2024)

Parallel GPT: Harmonizing the Independence and Interdependence of Acoustic and Semantic Information for Zero-Shot Text-to-Speech
by: Xing, Jingyuan, et al.
Published: (2025)

Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training
by: Melechovsky, Jan, et al.
Published: (2024)

SF-Speech: Straightened Flow for Zero-Shot Voice Clone
by: Li, Xuyuan, et al.
Published: (2024)

Emotion-Aware Prefix: Towards Explicit Emotion Control in Voice Conversion Models
by: Yang, Haoyuan, et al.
Published: (2026)