Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Panariello, Michele, Todisco, Massimiliano, Evans, Nicholas
Format:	Preprint
Publié:	2024
Sujets:	Audio and Speech Processing
Accès en ligne:	https://arxiv.org/abs/2408.04306
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866911982193475584
author	Panariello, Michele Todisco, Massimiliano Evans, Nicholas
author_facet	Panariello, Michele Todisco, Massimiliano Evans, Nicholas
contents	Voice anonymisation can be used to help protect speaker privacy when speech data is shared with untrusted others. In most practical applications, while the voice identity should be sanitised, other attributes such as the spoken content should be preserved. There is always a trade-off; all approaches reported thus far sacrifice spoken content for anonymisation performance. We report what is, to the best of our knowledge, the first attempt to actively preserve spoken content in voice anonymisation. We show how the output of an auxiliary automatic speech recognition model can be used to condition the vocoder module of an anonymisation system using a set of learnable embedding dictionaries in order to preserve spoken content. Relative to a baseline approach, and for only a modest cost in anonymisation performance, the technique is successful in decreasing the word error rate computed from anonymised utterances by almost 60%.
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_04306
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Preserving spoken content in voice anonymisation with character-level vocoder conditioning Panariello, Michele Todisco, Massimiliano Evans, Nicholas Audio and Speech Processing Voice anonymisation can be used to help protect speaker privacy when speech data is shared with untrusted others. In most practical applications, while the voice identity should be sanitised, other attributes such as the spoken content should be preserved. There is always a trade-off; all approaches reported thus far sacrifice spoken content for anonymisation performance. We report what is, to the best of our knowledge, the first attempt to actively preserve spoken content in voice anonymisation. We show how the output of an auxiliary automatic speech recognition model can be used to condition the vocoder module of an anonymisation system using a set of learnable embedding dictionaries in order to preserve spoken content. Relative to a baseline approach, and for only a modest cost in anonymisation performance, the technique is successful in decreasing the word error rate computed from anonymised utterances by almost 60%.
title	Preserving spoken content in voice anonymisation with character-level vocoder conditioning
topic	Audio and Speech Processing
url	https://arxiv.org/abs/2408.04306

Documents similaires