Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	C, Kishan K, Tan, Zhenning, Chen, Long, Jin, Minho, Han, Eunjung, Stolcke, Andreas, Lee, Chul
Format:	Preprint
Published:	2022
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2202.12349
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929237254995968
author	C, Kishan K Tan, Zhenning Chen, Long Jin, Minho Han, Eunjung Stolcke, Andreas Lee, Chul
author_facet	C, Kishan K Tan, Zhenning Chen, Long Jin, Minho Han, Eunjung Stolcke, Andreas Lee, Chul
contents	Household speaker identification with few enrollment utterances is an important yet challenging problem, especially when household members share similar voice characteristics and room acoustics. A common embedding space learned from a large number of speakers is not universally applicable for the optimal identification of every speaker in a household. In this work, we first formulate household speaker identification as a few-shot open-set recognition task and then propose a novel embedding adaptation framework to adapt speaker representations from the given universal embedding space to a household-specific embedding space using a set-to-set function, yielding better household speaker identification performance. With our algorithm, Open-set Few-shot Embedding Adaptation with Transformer (openFEAT), we observe that the speaker identification equal error rate (IEER) on simulated households with 2 to 7 hard-to-discriminate speakers is reduced by 23% to 31% relative.
format	Preprint
id	arxiv_https___arxiv_org_abs_2202_12349
institution	arXiv
publishDate	2022
record_format	arxiv
spellingShingle	openFEAT: Improving Speaker Identification by Open-set Few-shot Embedding Adaptation with Transformer C, Kishan K Tan, Zhenning Chen, Long Jin, Minho Han, Eunjung Stolcke, Andreas Lee, Chul Audio and Speech Processing Household speaker identification with few enrollment utterances is an important yet challenging problem, especially when household members share similar voice characteristics and room acoustics. A common embedding space learned from a large number of speakers is not universally applicable for the optimal identification of every speaker in a household. In this work, we first formulate household speaker identification as a few-shot open-set recognition task and then propose a novel embedding adaptation framework to adapt speaker representations from the given universal embedding space to a household-specific embedding space using a set-to-set function, yielding better household speaker identification performance. With our algorithm, Open-set Few-shot Embedding Adaptation with Transformer (openFEAT), we observe that the speaker identification equal error rate (IEER) on simulated households with 2 to 7 hard-to-discriminate speakers is reduced by 23% to 31% relative.
title	openFEAT: Improving Speaker Identification by Open-set Few-shot Embedding Adaptation with Transformer
topic	Audio and Speech Processing
url	https://arxiv.org/abs/2202.12349

Similar Items