Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Churchwell, Cameron, Morrison, Max, Pardo, Bryan
Format:	Preprint
Publié:	2024
Sujets:	Audio and Speech Processing Sound
Accès en ligne:	https://arxiv.org/abs/2402.17735
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866910345742778368
author	Churchwell, Cameron Morrison, Max Pardo, Bryan
author_facet	Churchwell, Cameron Morrison, Max Pardo, Bryan
contents	A phonetic posteriorgram (PPG) is a time-varying categorical distribution over acoustic units of speech (e.g., phonemes). PPGs are a popular representation in speech generation due to their ability to disentangle pronunciation features from speaker identity, allowing accurate reconstruction of pronunciation (e.g., voice conversion) and coarse-grained pronunciation editing (e.g., foreign accent conversion). In this paper, we demonstrably improve the quality of PPGs to produce a state-of-the-art interpretable PPG representation. We train an off-the-shelf speech synthesizer using our PPG representation and show that high-quality PPGs yield independent control over pitch and pronunciation. We further demonstrate novel uses of PPGs, such as an acoustic pronunciation distance and fine-grained pronunciation control.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_17735
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	High-Fidelity Neural Phonetic Posteriorgrams Churchwell, Cameron Morrison, Max Pardo, Bryan Audio and Speech Processing Sound A phonetic posteriorgram (PPG) is a time-varying categorical distribution over acoustic units of speech (e.g., phonemes). PPGs are a popular representation in speech generation due to their ability to disentangle pronunciation features from speaker identity, allowing accurate reconstruction of pronunciation (e.g., voice conversion) and coarse-grained pronunciation editing (e.g., foreign accent conversion). In this paper, we demonstrably improve the quality of PPGs to produce a state-of-the-art interpretable PPG representation. We train an off-the-shelf speech synthesizer using our PPG representation and show that high-quality PPGs yield independent control over pitch and pronunciation. We further demonstrate novel uses of PPGs, such as an acoustic pronunciation distance and fine-grained pronunciation control.
title	High-Fidelity Neural Phonetic Posteriorgrams
topic	Audio and Speech Processing Sound
url	https://arxiv.org/abs/2402.17735

Documents similaires