Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Haoshen, Zhong, Xueli, Lin, Bingbing, Huang, Jia, Pan, Xingduo, Liang, Shengxiang, Wang, Nizhuan, Siok, Wai Ting
Format:	Preprint
Published:	2026
Subjects:	Sound Computation and Language
Online Access:	https://arxiv.org/abs/2602.08696
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918344369635328
author	Wang, Haoshen Zhong, Xueli Lin, Bingbing Huang, Jia Pan, Xingduo Liang, Shengxiang Wang, Nizhuan Siok, Wai Ting
author_facet	Wang, Haoshen Zhong, Xueli Lin, Bingbing Huang, Jia Pan, Xingduo Liang, Shengxiang Wang, Nizhuan Siok, Wai Ting
contents	Dysarthric speech exhibits high variability and limited labeled data, posing major challenges for both automatic speech recognition (ASR) and assistive speech technologies. Existing approaches rely on synthetic data augmentation or speech reconstruction, yet often entangle speaker identity with pathological articulation, limiting controllability and robustness. In this paper, we propose ProtoDisent-TTS, a prototype-based disentanglement TTS framework built on a pre-trained text-to-speech backbone that factorizes speaker timbre and dysarthric articulation within a unified latent space. A pathology prototype codebook provides interpretable and controllable representations of healthy and dysarthric speech patterns, while a dual-classifier objective with a gradient reversal layer enforces invariance of speaker embeddings to pathological attributes. Experiments on the TORGO dataset demonstrate that this design enables bidirectional transformation between healthy and dysarthric speech, leading to consistent ASR performance gains and robust, speaker-aware speech reconstruction.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_08696
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Prototype-Based Disentanglement for Controllable Dysarthric Speech Synthesis Wang, Haoshen Zhong, Xueli Lin, Bingbing Huang, Jia Pan, Xingduo Liang, Shengxiang Wang, Nizhuan Siok, Wai Ting Sound Computation and Language Dysarthric speech exhibits high variability and limited labeled data, posing major challenges for both automatic speech recognition (ASR) and assistive speech technologies. Existing approaches rely on synthetic data augmentation or speech reconstruction, yet often entangle speaker identity with pathological articulation, limiting controllability and robustness. In this paper, we propose ProtoDisent-TTS, a prototype-based disentanglement TTS framework built on a pre-trained text-to-speech backbone that factorizes speaker timbre and dysarthric articulation within a unified latent space. A pathology prototype codebook provides interpretable and controllable representations of healthy and dysarthric speech patterns, while a dual-classifier objective with a gradient reversal layer enforces invariance of speaker embeddings to pathological attributes. Experiments on the TORGO dataset demonstrate that this design enables bidirectional transformation between healthy and dysarthric speech, leading to consistent ASR performance gains and robust, speaker-aware speech reconstruction.
title	Prototype-Based Disentanglement for Controllable Dysarthric Speech Synthesis
topic	Sound Computation and Language
url	https://arxiv.org/abs/2602.08696

Similar Items