MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Huang, Jiawen, Benetos, Emmanouil
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Audio and Speech Processing Computation and Language Sound
Accesso online:	https://arxiv.org/abs/2406.17618
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866911932315860992
author	Huang, Jiawen Benetos, Emmanouil
author_facet	Huang, Jiawen Benetos, Emmanouil
contents	Multilingual automatic lyrics transcription (ALT) is a challenging task due to the limited availability of labelled data and the challenges introduced by singing, compared to multilingual automatic speech recognition. Although some multilingual singing datasets have been released recently, English continues to dominate these collections. Multilingual ALT remains underexplored due to the scale of data and annotation quality. In this paper, we aim to create a multilingual ALT system with available datasets. Inspired by architectures that have been proven effective for English ALT, we adapt these techniques to the multilingual scenario by expanding the target vocabulary set. We then evaluate the performance of the multilingual model in comparison to its monolingual counterparts. Additionally, we explore various conditioning methods to incorporate language information into the model. We apply analysis by language and combine it with the language classification performance. Our findings reveal that the multilingual model performs consistently better than the monolingual models trained on the language subsets. Furthermore, we demonstrate that incorporating language information significantly enhances performance.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_17618
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model Huang, Jiawen Benetos, Emmanouil Audio and Speech Processing Computation and Language Sound Multilingual automatic lyrics transcription (ALT) is a challenging task due to the limited availability of labelled data and the challenges introduced by singing, compared to multilingual automatic speech recognition. Although some multilingual singing datasets have been released recently, English continues to dominate these collections. Multilingual ALT remains underexplored due to the scale of data and annotation quality. In this paper, we aim to create a multilingual ALT system with available datasets. Inspired by architectures that have been proven effective for English ALT, we adapt these techniques to the multilingual scenario by expanding the target vocabulary set. We then evaluate the performance of the multilingual model in comparison to its monolingual counterparts. Additionally, we explore various conditioning methods to incorporate language information into the model. We apply analysis by language and combine it with the language classification performance. Our findings reveal that the multilingual model performs consistently better than the monolingual models trained on the language subsets. Furthermore, we demonstrate that incorporating language information significantly enhances performance.
title	Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model
topic	Audio and Speech Processing Computation and Language Sound
url	https://arxiv.org/abs/2406.17618

Documenti analoghi