Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Fatemeh, Keshvari, Rahil, Mahdian Toroghi, Hassan, Zareian
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2410.04092
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914965638610944
author	Fatemeh, Keshvari Rahil, Mahdian Toroghi Hassan, Zareian
author_facet	Fatemeh, Keshvari Rahil, Mahdian Toroghi Hassan, Zareian
contents	Dysarthric speech reconstruction is challenging due to its pathological sound patterns. Preserving speaker identity, especially without access to normal speech, is a key challenge. Our proposed approach uses contrastive learning to extract speaker embedding for reconstruction, while employing XLS-R representations instead of filter banks. The results show improved speech quality, naturalness, intelligibility, speaker identity preservation, and gender consistency for female speakers. Reconstructed speech exhibits 1.51 and 2.12 MOS score improvements and reduces word error rates by 25.45% and 32.1% for moderate and moderate-severe dysarthria speakers using Jasper speech recognition system, respectively. This approach offers promising advancements in dysarthric speech reconstruction.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_04092
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Enhancement of Dysarthric Speech Reconstruction by Contrastive Learning Fatemeh, Keshvari Rahil, Mahdian Toroghi Hassan, Zareian Audio and Speech Processing Dysarthric speech reconstruction is challenging due to its pathological sound patterns. Preserving speaker identity, especially without access to normal speech, is a key challenge. Our proposed approach uses contrastive learning to extract speaker embedding for reconstruction, while employing XLS-R representations instead of filter banks. The results show improved speech quality, naturalness, intelligibility, speaker identity preservation, and gender consistency for female speakers. Reconstructed speech exhibits 1.51 and 2.12 MOS score improvements and reduces word error rates by 25.45% and 32.1% for moderate and moderate-severe dysarthria speakers using Jasper speech recognition system, respectively. This approach offers promising advancements in dysarthric speech reconstruction.
title	Enhancement of Dysarthric Speech Reconstruction by Contrastive Learning
topic	Audio and Speech Processing
url	https://arxiv.org/abs/2410.04092

Similar Items