Enregistré dans:
Détails bibliographiques
Auteurs principaux: Churchwell, Cameron, Morrison, Max, Pardo, Bryan
Format: Preprint
Publié: 2024
Sujets:
Accès en ligne:https://arxiv.org/abs/2402.17735
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866910345742778368
author Churchwell, Cameron
Morrison, Max
Pardo, Bryan
author_facet Churchwell, Cameron
Morrison, Max
Pardo, Bryan
contents A phonetic posteriorgram (PPG) is a time-varying categorical distribution over acoustic units of speech (e.g., phonemes). PPGs are a popular representation in speech generation due to their ability to disentangle pronunciation features from speaker identity, allowing accurate reconstruction of pronunciation (e.g., voice conversion) and coarse-grained pronunciation editing (e.g., foreign accent conversion). In this paper, we demonstrably improve the quality of PPGs to produce a state-of-the-art interpretable PPG representation. We train an off-the-shelf speech synthesizer using our PPG representation and show that high-quality PPGs yield independent control over pitch and pronunciation. We further demonstrate novel uses of PPGs, such as an acoustic pronunciation distance and fine-grained pronunciation control.
format Preprint
id arxiv_https___arxiv_org_abs_2402_17735
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle High-Fidelity Neural Phonetic Posteriorgrams
Churchwell, Cameron
Morrison, Max
Pardo, Bryan
Audio and Speech Processing
Sound
A phonetic posteriorgram (PPG) is a time-varying categorical distribution over acoustic units of speech (e.g., phonemes). PPGs are a popular representation in speech generation due to their ability to disentangle pronunciation features from speaker identity, allowing accurate reconstruction of pronunciation (e.g., voice conversion) and coarse-grained pronunciation editing (e.g., foreign accent conversion). In this paper, we demonstrably improve the quality of PPGs to produce a state-of-the-art interpretable PPG representation. We train an off-the-shelf speech synthesizer using our PPG representation and show that high-quality PPGs yield independent control over pitch and pronunciation. We further demonstrate novel uses of PPGs, such as an acoustic pronunciation distance and fine-grained pronunciation control.
title High-Fidelity Neural Phonetic Posteriorgrams
topic Audio and Speech Processing
Sound
url https://arxiv.org/abs/2402.17735