Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.14092 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866914000117170176 |
|---|---|
| author | Xu, Jing Wu, Minglin Wu, Xixin Meng, Helen |
| author_facet | Xu, Jing Wu, Minglin Wu, Xixin Meng, Helen |
| contents | Self-supervised (SSL) models have shown great performance in various downstream tasks. However, they are typically developed for limited languages, and may encounter new languages in real-world. Developing a SSL model for each new language is costly. Thus, it is vital to figure out how to efficiently adapt existed SSL models to a new language without impairing its original abilities. We propose adaptation methods which integrate LoRA to existed SSL models to extend new language. We also develop preservation strategies which include data combination and re-clustering to retain abilities on existed languages. Applied to mHuBERT, we investigate their effectiveness on speech re-synthesis task. Experiments show that our adaptation methods enable mHuBERT to be applied to a new language (Mandarin) with MOS value increased about 1.6 and the relative value of WER reduced up to 61.72%. Also, our preservation strategies ensure that the performance on both existed and new languages remains intact. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2406_14092 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models Xu, Jing Wu, Minglin Wu, Xixin Meng, Helen Computation and Language Audio and Speech Processing Self-supervised (SSL) models have shown great performance in various downstream tasks. However, they are typically developed for limited languages, and may encounter new languages in real-world. Developing a SSL model for each new language is costly. Thus, it is vital to figure out how to efficiently adapt existed SSL models to a new language without impairing its original abilities. We propose adaptation methods which integrate LoRA to existed SSL models to extend new language. We also develop preservation strategies which include data combination and re-clustering to retain abilities on existed languages. Applied to mHuBERT, we investigate their effectiveness on speech re-synthesis task. Experiments show that our adaptation methods enable mHuBERT to be applied to a new language (Mandarin) with MOS value increased about 1.6 and the relative value of WER reduced up to 61.72%. Also, our preservation strategies ensure that the performance on both existed and new languages remains intact. |
| title | Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models |
| topic | Computation and Language Audio and Speech Processing |
| url | https://arxiv.org/abs/2406.14092 |