Saved in:
Bibliographic Details
Main Author: Pham, Linh
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2506.00291
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908474568343552
author Pham, Linh
author_facet Pham, Linh
contents There are few code switching datasets, labeled or unlabled, that exist today. As a result, ASR requires new methods to utilize the vast monolingual data and models that exist. This paper uses OpenAI's open source ASR model, Whisper, which has been pre-trained on 680K hours of audio to perform monolingual ASR tasks. In Part 1, this paper examines how exploiting Whisper's monolingual ability to individually tokenize training text, called "Switching Tokenizers Method", improves transcription accuracy. In Part 2, we combine the Switching Tokenizers Method from part 1 and train a GELU based adapter on the encoder. These two methods reduced Total Mixed Error Rate (MER) to 9.4% for the ASCEND dataset, 6% for SEAME devman and 9.7% for SEAME devsge, outperforming current SoTA methods.
format Preprint
id arxiv_https___arxiv_org_abs_2506_00291
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Improving Code Switching with Supervised Fine Tuning and GELU Adapters
Pham, Linh
Sound
Audio and Speech Processing
There are few code switching datasets, labeled or unlabled, that exist today. As a result, ASR requires new methods to utilize the vast monolingual data and models that exist. This paper uses OpenAI's open source ASR model, Whisper, which has been pre-trained on 680K hours of audio to perform monolingual ASR tasks. In Part 1, this paper examines how exploiting Whisper's monolingual ability to individually tokenize training text, called "Switching Tokenizers Method", improves transcription accuracy. In Part 2, we combine the Switching Tokenizers Method from part 1 and train a GELU based adapter on the encoder. These two methods reduced Total Mixed Error Rate (MER) to 9.4% for the ASCEND dataset, 6% for SEAME devman and 9.7% for SEAME devsge, outperforming current SoTA methods.
title Improving Code Switching with Supervised Fine Tuning and GELU Adapters
topic Sound
Audio and Speech Processing
url https://arxiv.org/abs/2506.00291