Saved in:
Bibliographic Details
Main Authors: Li, Yuang, Zhang, Min, Su, Chang, Li, Yinglu, Qiao, Xiaosong, Ren, Mengxin, Ma, Miaomiao, Wei, Daimeng, Tao, Shimin, Yang, Hao
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2309.09552
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910473517006848
author Li, Yuang
Zhang, Min
Su, Chang
Li, Yinglu
Qiao, Xiaosong
Ren, Mengxin
Ma, Miaomiao
Wei, Daimeng
Tao, Shimin
Yang, Hao
author_facet Li, Yuang
Zhang, Min
Su, Chang
Li, Yinglu
Qiao, Xiaosong
Ren, Mengxin
Ma, Miaomiao
Wei, Daimeng
Tao, Shimin
Yang, Hao
contents The recognition of rare named entities, such as personal names and terminologies, is challenging for automatic speech recognition (ASR) systems, especially when they are not frequently observed in the training data. In this paper, we introduce keyword spotting enhanced Whisper (KWS-Whisper), a novel ASR system that leverages the Whisper model and performs open-vocabulary keyword spotting (OV-KWS) on the hidden states of the Whisper encoder to recognize user-defined named entities. These entities serve as prompts for the Whisper decoder. To optimize the model, we propose a multitask training approach that learns OV-KWS and contextual-ASR tasks. We evaluate our approach on Chinese Aishell hot word subsets and two internal code-switching test sets and show that it significantly improves the entity recall compared to the original Whisper model. Moreover, we demonstrate that the OV-KWS can be a plug-and-play module to enhance the ASR error correction methods and frozen Whisper models.
format Preprint
id arxiv_https___arxiv_org_abs_2309_09552
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting
Li, Yuang
Zhang, Min
Su, Chang
Li, Yinglu
Qiao, Xiaosong
Ren, Mengxin
Ma, Miaomiao
Wei, Daimeng
Tao, Shimin
Yang, Hao
Artificial Intelligence
Computation and Language
The recognition of rare named entities, such as personal names and terminologies, is challenging for automatic speech recognition (ASR) systems, especially when they are not frequently observed in the training data. In this paper, we introduce keyword spotting enhanced Whisper (KWS-Whisper), a novel ASR system that leverages the Whisper model and performs open-vocabulary keyword spotting (OV-KWS) on the hidden states of the Whisper encoder to recognize user-defined named entities. These entities serve as prompts for the Whisper decoder. To optimize the model, we propose a multitask training approach that learns OV-KWS and contextual-ASR tasks. We evaluate our approach on Chinese Aishell hot word subsets and two internal code-switching test sets and show that it significantly improves the entity recall compared to the original Whisper model. Moreover, we demonstrate that the OV-KWS can be a plug-and-play module to enhance the ASR error correction methods and frozen Whisper models.
title A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting
topic Artificial Intelligence
Computation and Language
url https://arxiv.org/abs/2309.09552