Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xiao, Yang, Mahmudi, Aso, Thieberger, Nick, Ambikairajah, Eliathamby, Holden, Eun-Jung, Dang, Ting
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing Computation and Language Sound
Online Access:	https://arxiv.org/abs/2603.06310
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910043702558720
author	Xiao, Yang Mahmudi, Aso Thieberger, Nick Ambikairajah, Eliathamby Holden, Eun-Jung Dang, Ting
author_facet	Xiao, Yang Mahmudi, Aso Thieberger, Nick Ambikairajah, Eliathamby Holden, Eun-Jung Dang, Ting
contents	Speech foundation models struggle with low-resource Pacific Indigenous languages because of severe data scarcity. Furthermore, full fine-tuning risks catastrophic forgetting. To address this gap, we present an empirical study adapting models to real-world Pacific datasets. We investigate how data volume and linguistic features affect adaptation success. Specifically, we evaluate strategies including Full Fine-Tuning and Low-Rank Adaptation (LoRA). Additionally, we analyze a continual learning framework for sequentially acquiring multiple languages. We demonstrate that adapting to these distant languages causes severe internal representational drift. Consequently, these models face a strict plasticity and stability dilemma. While LoRA adapts well initially, it suffers from catastrophic forgetting during sequential learning. Ultimately, this study highlights the urgent need for robust adaptation strategies tailored to underrepresented languages.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_06310
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Continual Adaptation for Pacific Indigenous Speech Recognition Xiao, Yang Mahmudi, Aso Thieberger, Nick Ambikairajah, Eliathamby Holden, Eun-Jung Dang, Ting Audio and Speech Processing Computation and Language Sound Speech foundation models struggle with low-resource Pacific Indigenous languages because of severe data scarcity. Furthermore, full fine-tuning risks catastrophic forgetting. To address this gap, we present an empirical study adapting models to real-world Pacific datasets. We investigate how data volume and linguistic features affect adaptation success. Specifically, we evaluate strategies including Full Fine-Tuning and Low-Rank Adaptation (LoRA). Additionally, we analyze a continual learning framework for sequentially acquiring multiple languages. We demonstrate that adapting to these distant languages causes severe internal representational drift. Consequently, these models face a strict plasticity and stability dilemma. While LoRA adapts well initially, it suffers from catastrophic forgetting during sequential learning. Ultimately, this study highlights the urgent need for robust adaptation strategies tailored to underrepresented languages.
title	Continual Adaptation for Pacific Indigenous Speech Recognition
topic	Audio and Speech Processing Computation and Language Sound
url	https://arxiv.org/abs/2603.06310

Similar Items