Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Woszczyk, Dominika, Ribeiro, Manuel Sam, Merritt, Thomas, Korzekwa, Daniel
Format:	Preprint
Published:	2025
Subjects:	Sound Computation and Language Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2507.09310
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915385914163200
author	Woszczyk, Dominika Ribeiro, Manuel Sam Merritt, Thomas Korzekwa, Daniel
author_facet	Woszczyk, Dominika Ribeiro, Manuel Sam Merritt, Thomas Korzekwa, Daniel
contents	Text-to-Speech (TTS) systems in Lombard speaking style can improve the overall intelligibility of speech, useful for hearing loss and noisy conditions. However, training those models requires a large amount of data and the Lombard effect is challenging to record due to speaker and noise variability and tiring recording conditions. Voice conversion (VC) has been shown to be a useful augmentation technique to train TTS systems in the absence of recorded data from the target speaker in the target speaking style. In this paper, we are concerned with Lombard speaking style transfer. Our goal is to convert speaker identity while preserving the acoustic attributes that define the Lombard speaking style. We compare voice conversion models with implicit and explicit acoustic feature conditioning. We observe that our proposed implicit conditioning strategy achieves an intelligibility gain comparable to the model conditioned on explicit acoustic features, while also preserving speaker similarity.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_09310
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Voice Conversion for Lombard Speaking Style with Implicit and Explicit Acoustic Feature Conditioning Woszczyk, Dominika Ribeiro, Manuel Sam Merritt, Thomas Korzekwa, Daniel Sound Computation and Language Audio and Speech Processing Text-to-Speech (TTS) systems in Lombard speaking style can improve the overall intelligibility of speech, useful for hearing loss and noisy conditions. However, training those models requires a large amount of data and the Lombard effect is challenging to record due to speaker and noise variability and tiring recording conditions. Voice conversion (VC) has been shown to be a useful augmentation technique to train TTS systems in the absence of recorded data from the target speaker in the target speaking style. In this paper, we are concerned with Lombard speaking style transfer. Our goal is to convert speaker identity while preserving the acoustic attributes that define the Lombard speaking style. We compare voice conversion models with implicit and explicit acoustic feature conditioning. We observe that our proposed implicit conditioning strategy achieves an intelligibility gain comparable to the model conditioned on explicit acoustic features, while also preserving speaker similarity.
title	Voice Conversion for Lombard Speaking Style with Implicit and Explicit Acoustic Feature Conditioning
topic	Sound Computation and Language Audio and Speech Processing
url	https://arxiv.org/abs/2507.09310

Similar Items