Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yeh, Sung-Lin, Tang, Hao
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Computation and Language
Online Access:	https://arxiv.org/abs/2409.06109
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917817641598976
author	Yeh, Sung-Lin Tang, Hao
author_facet	Yeh, Sung-Lin Tang, Hao
contents	Representing speech with discrete units has been widely used in speech codec and speech generation. However, there are several unverified claims about self-supervised discrete units, such as disentangling phonetic and speaker information with k-means, or assuming information loss after k-means. In this work, we take an information-theoretic perspective to answer how much information is present (information completeness) and how much information is accessible (information accessibility), before and after residual vector quantization. We show a lower bound for information completeness and estimate completeness on discretized HuBERT representations after residual vector quantization. We find that speaker information is sufficiently present in HuBERT discrete units, and that phonetic information is sufficiently present in the residual, showing that vector quantization does not achieve disentanglement. Our results offer a comprehensive assessment on the choice of discrete units, and suggest that a lot more information in the residual should be mined rather than discarded.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_06109
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Estimating the Completeness of Discrete Speech Units Yeh, Sung-Lin Tang, Hao Audio and Speech Processing Computation and Language Representing speech with discrete units has been widely used in speech codec and speech generation. However, there are several unverified claims about self-supervised discrete units, such as disentangling phonetic and speaker information with k-means, or assuming information loss after k-means. In this work, we take an information-theoretic perspective to answer how much information is present (information completeness) and how much information is accessible (information accessibility), before and after residual vector quantization. We show a lower bound for information completeness and estimate completeness on discretized HuBERT representations after residual vector quantization. We find that speaker information is sufficiently present in HuBERT discrete units, and that phonetic information is sufficiently present in the residual, showing that vector quantization does not achieve disentanglement. Our results offer a comprehensive assessment on the choice of discrete units, and suggest that a lot more information in the residual should be mined rather than discarded.
title	Estimating the Completeness of Discrete Speech Units
topic	Audio and Speech Processing Computation and Language
url	https://arxiv.org/abs/2409.06109

Similar Items