Saved in:
Bibliographic Details
Main Authors: Fatemeh, Keshvari, Rahil, Mahdian Toroghi, Hassan, Zareian
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2410.04092
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914965638610944
author Fatemeh, Keshvari
Rahil, Mahdian Toroghi
Hassan, Zareian
author_facet Fatemeh, Keshvari
Rahil, Mahdian Toroghi
Hassan, Zareian
contents Dysarthric speech reconstruction is challenging due to its pathological sound patterns. Preserving speaker identity, especially without access to normal speech, is a key challenge. Our proposed approach uses contrastive learning to extract speaker embedding for reconstruction, while employing XLS-R representations instead of filter banks. The results show improved speech quality, naturalness, intelligibility, speaker identity preservation, and gender consistency for female speakers. Reconstructed speech exhibits 1.51 and 2.12 MOS score improvements and reduces word error rates by 25.45% and 32.1% for moderate and moderate-severe dysarthria speakers using Jasper speech recognition system, respectively. This approach offers promising advancements in dysarthric speech reconstruction.
format Preprint
id arxiv_https___arxiv_org_abs_2410_04092
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Enhancement of Dysarthric Speech Reconstruction by Contrastive Learning
Fatemeh, Keshvari
Rahil, Mahdian Toroghi
Hassan, Zareian
Audio and Speech Processing
Dysarthric speech reconstruction is challenging due to its pathological sound patterns. Preserving speaker identity, especially without access to normal speech, is a key challenge. Our proposed approach uses contrastive learning to extract speaker embedding for reconstruction, while employing XLS-R representations instead of filter banks. The results show improved speech quality, naturalness, intelligibility, speaker identity preservation, and gender consistency for female speakers. Reconstructed speech exhibits 1.51 and 2.12 MOS score improvements and reduces word error rates by 25.45% and 32.1% for moderate and moderate-severe dysarthria speakers using Jasper speech recognition system, respectively. This approach offers promising advancements in dysarthric speech reconstruction.
title Enhancement of Dysarthric Speech Reconstruction by Contrastive Learning
topic Audio and Speech Processing
url https://arxiv.org/abs/2410.04092