Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Pandey, Laxmi, Li, Ke, Guo, Jinxi, Paul, Debjyoti, Guo, Arthur, Mahadeokar, Jay, Zhang, Xuedong
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2407.16664
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Multilingual pretraining for transfer learning significantly boosts the robustness of low-resource monolingual ASR models. This study systematically investigates three main aspects: (a) the impact of transfer learning on model performance during initial training or fine-tuning, (b) the influence of transfer learning across dataset domains and languages, and (c) the effect on rare-word recognition compared to non-rare words. Our finding suggests that RNNT-loss pretraining, followed by monolingual fine-tuning with Minimum Word Error Rate (MinWER) loss, consistently reduces Word Error Rates (WER) across languages like Italian and French. WER Reductions (WERR) reach 36.2% and 42.8% compared to monolingual baselines for MLS and in-house datasets. Out-of-domain pretraining leads to 28% higher WERR than in-domain pretraining. Both rare and non-rare words benefit, with rare words showing greater improvements with out-of-domain pretraining, and non-rare words with in-domain pretraining.

Similar Items