Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Pham, Linh
Format:	Preprint
Published:	2025
Subjects:	Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2506.00291
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908474568343552
author	Pham, Linh
author_facet	Pham, Linh
contents	There are few code switching datasets, labeled or unlabled, that exist today. As a result, ASR requires new methods to utilize the vast monolingual data and models that exist. This paper uses OpenAI's open source ASR model, Whisper, which has been pre-trained on 680K hours of audio to perform monolingual ASR tasks. In Part 1, this paper examines how exploiting Whisper's monolingual ability to individually tokenize training text, called "Switching Tokenizers Method", improves transcription accuracy. In Part 2, we combine the Switching Tokenizers Method from part 1 and train a GELU based adapter on the encoder. These two methods reduced Total Mixed Error Rate (MER) to 9.4% for the ASCEND dataset, 6% for SEAME devman and 9.7% for SEAME devsge, outperforming current SoTA methods.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_00291
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Improving Code Switching with Supervised Fine Tuning and GELU Adapters Pham, Linh Sound Audio and Speech Processing There are few code switching datasets, labeled or unlabled, that exist today. As a result, ASR requires new methods to utilize the vast monolingual data and models that exist. This paper uses OpenAI's open source ASR model, Whisper, which has been pre-trained on 680K hours of audio to perform monolingual ASR tasks. In Part 1, this paper examines how exploiting Whisper's monolingual ability to individually tokenize training text, called "Switching Tokenizers Method", improves transcription accuracy. In Part 2, we combine the Switching Tokenizers Method from part 1 and train a GELU based adapter on the encoder. These two methods reduced Total Mixed Error Rate (MER) to 9.4% for the ASCEND dataset, 6% for SEAME devman and 9.7% for SEAME devsge, outperforming current SoTA methods.
title	Improving Code Switching with Supervised Fine Tuning and GELU Adapters
topic	Sound Audio and Speech Processing
url	https://arxiv.org/abs/2506.00291

Similar Items