Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Yang, Qiaoyu
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Sound
Online Access:	https://arxiv.org/abs/2502.00295
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913715978240000
author	Yang, Qiaoyu
author_facet	Yang, Qiaoyu
contents	The expanding feature set of modern headphones puts a challenge on the design of their control interface. Users may want to separately control each feature or quickly switch between modes that activate different features. Traditional approach of physical buttons may no longer be feasible when the feature set is large. Keyword spotting with voice commands is a promising solution to the issue. Most existing methods of keyword spotting only support commands spoken in a regular voice. However, regular voice may not be desirable in quiet places or public settings. In this paper, we investigate the problem of on-device keyword spotting in whisper voice and explore approaches to improve noise robustness. We leverage the inner microphone on noise-cancellation headphones as an additional source of voice input. We also design a curriculum learning strategy that gradually increases the proportion of whisper keywords during training. We demonstrate through experiments that the combination of multi-microphone processing and curriculum learning could improve F1 score of whisper keyword spotting by up to 15% in noisy conditions.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_00295
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Toward noise-robust whisper keyword spotting on headphones with in-earcup microphone and curriculum learning Yang, Qiaoyu Audio and Speech Processing Sound The expanding feature set of modern headphones puts a challenge on the design of their control interface. Users may want to separately control each feature or quickly switch between modes that activate different features. Traditional approach of physical buttons may no longer be feasible when the feature set is large. Keyword spotting with voice commands is a promising solution to the issue. Most existing methods of keyword spotting only support commands spoken in a regular voice. However, regular voice may not be desirable in quiet places or public settings. In this paper, we investigate the problem of on-device keyword spotting in whisper voice and explore approaches to improve noise robustness. We leverage the inner microphone on noise-cancellation headphones as an additional source of voice input. We also design a curriculum learning strategy that gradually increases the proportion of whisper keywords during training. We demonstrate through experiments that the combination of multi-microphone processing and curriculum learning could improve F1 score of whisper keyword spotting by up to 15% in noisy conditions.
title	Toward noise-robust whisper keyword spotting on headphones with in-earcup microphone and curriculum learning
topic	Audio and Speech Processing Sound
url	https://arxiv.org/abs/2502.00295

Similar Items