Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Poon, Crystal Min Hui, Ng, Pai Chet, Miao, Xiaoxiao, Loh, Immanuel Jun Kai, Zhang, Bowen, Song, Haoyu, Mcloughlin, Ian
Format:	Preprint
Published:	2025
Subjects:	Sound Computation and Language
Online Access:	https://arxiv.org/abs/2511.11104
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908837089378304
author	Poon, Crystal Min Hui Ng, Pai Chet Miao, Xiaoxiao Loh, Immanuel Jun Kai Zhang, Bowen Song, Haoyu Mcloughlin, Ian
author_facet	Poon, Crystal Min Hui Ng, Pai Chet Miao, Xiaoxiao Loh, Immanuel Jun Kai Zhang, Bowen Song, Haoyu Mcloughlin, Ian
contents	Instruction-guided text-to-speech (TTS) research has reached a maturity level where excellent speech generation quality is possible on demand, yet two coupled biases persist in reducing perceived quality: accent bias, where models default towards dominant phonetic patterns, and linguistic bias, a misalignment in dialect-specific lexical or cultural information. These biases are interdependent and authentic accent generation requires both accent fidelity and correctly localized text. We present CLARITY (Contextual Linguistic Adaptation and Retrieval for Inclusive TTS sYnthesis), a backbone-agnostic framework to address both biases through dual-signal optimization. Firstly, we apply contextual linguistic adaptation to localize input text to align with the target dialect. Secondly, we propose retrieval-augmented accent prompting (RAAP) to ensure accent-consistent speech prompts. We evaluate CLARITY on twelve varieties of English accent via both subjective and objective analysis. Results clearly indicate that CLARITY improves accent accuracy and fairness, ensuring higher perceptual quality output\footnote{Code and audio samples are available at https://github.com/ICT-SIT/CLARITY.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_11104
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	CLARITY: Contextual Linguistic Adaptation and Accent Retrieval for Dual-Bias Mitigation in Text-to-Speech Generation Poon, Crystal Min Hui Ng, Pai Chet Miao, Xiaoxiao Loh, Immanuel Jun Kai Zhang, Bowen Song, Haoyu Mcloughlin, Ian Sound Computation and Language Instruction-guided text-to-speech (TTS) research has reached a maturity level where excellent speech generation quality is possible on demand, yet two coupled biases persist in reducing perceived quality: accent bias, where models default towards dominant phonetic patterns, and linguistic bias, a misalignment in dialect-specific lexical or cultural information. These biases are interdependent and authentic accent generation requires both accent fidelity and correctly localized text. We present CLARITY (Contextual Linguistic Adaptation and Retrieval for Inclusive TTS sYnthesis), a backbone-agnostic framework to address both biases through dual-signal optimization. Firstly, we apply contextual linguistic adaptation to localize input text to align with the target dialect. Secondly, we propose retrieval-augmented accent prompting (RAAP) to ensure accent-consistent speech prompts. We evaluate CLARITY on twelve varieties of English accent via both subjective and objective analysis. Results clearly indicate that CLARITY improves accent accuracy and fairness, ensuring higher perceptual quality output\footnote{Code and audio samples are available at https://github.com/ICT-SIT/CLARITY.
title	CLARITY: Contextual Linguistic Adaptation and Accent Retrieval for Dual-Bias Mitigation in Text-to-Speech Generation
topic	Sound Computation and Language
url	https://arxiv.org/abs/2511.11104

Similar Items