Enregistré dans:
| Auteurs principaux: | , , |
|---|---|
| Format: | Preprint |
| Publié: |
2024
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2402.17735 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
| _version_ | 1866910345742778368 |
|---|---|
| author | Churchwell, Cameron Morrison, Max Pardo, Bryan |
| author_facet | Churchwell, Cameron Morrison, Max Pardo, Bryan |
| contents | A phonetic posteriorgram (PPG) is a time-varying categorical distribution over acoustic units of speech (e.g., phonemes). PPGs are a popular representation in speech generation due to their ability to disentangle pronunciation features from speaker identity, allowing accurate reconstruction of pronunciation (e.g., voice conversion) and coarse-grained pronunciation editing (e.g., foreign accent conversion). In this paper, we demonstrably improve the quality of PPGs to produce a state-of-the-art interpretable PPG representation. We train an off-the-shelf speech synthesizer using our PPG representation and show that high-quality PPGs yield independent control over pitch and pronunciation. We further demonstrate novel uses of PPGs, such as an acoustic pronunciation distance and fine-grained pronunciation control. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2402_17735 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | High-Fidelity Neural Phonetic Posteriorgrams Churchwell, Cameron Morrison, Max Pardo, Bryan Audio and Speech Processing Sound A phonetic posteriorgram (PPG) is a time-varying categorical distribution over acoustic units of speech (e.g., phonemes). PPGs are a popular representation in speech generation due to their ability to disentangle pronunciation features from speaker identity, allowing accurate reconstruction of pronunciation (e.g., voice conversion) and coarse-grained pronunciation editing (e.g., foreign accent conversion). In this paper, we demonstrably improve the quality of PPGs to produce a state-of-the-art interpretable PPG representation. We train an off-the-shelf speech synthesizer using our PPG representation and show that high-quality PPGs yield independent control over pitch and pronunciation. We further demonstrate novel uses of PPGs, such as an acoustic pronunciation distance and fine-grained pronunciation control. |
| title | High-Fidelity Neural Phonetic Posteriorgrams |
| topic | Audio and Speech Processing Sound |
| url | https://arxiv.org/abs/2402.17735 |