Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Benaicha, Moncef, Thulke, David, Turan, M. A. Tuğtekin
Format:	Preprint
Published:	2023
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2307.01310
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929496712544256
author	Benaicha, Moncef Thulke, David Turan, M. A. Tuğtekin
author_facet	Benaicha, Moncef Thulke, David Turan, M. A. Tuğtekin
contents	Recent Named Entity Recognition (NER) advancements have significantly enhanced text classification capabilities. This paper focuses on spoken NER, aimed explicitly at spoken document retrieval, an area not widely studied due to the lack of comprehensive datasets for spoken contexts. Additionally, the potential for cross-lingual transfer learning in low-resource situations deserves further investigation. In our study, we applied transfer learning techniques across Dutch, English, and German using both pipeline and End-to-End (E2E) approaches. We employed Wav2Vec2 XLS-R models on custom pseudo-annotated datasets to evaluate the adaptability of cross-lingual systems. Our exploration of different architectural configurations assessed the robustness of these systems in spoken NER. Results showed that the E2E model was superior to the pipeline model, particularly with limited annotation resources. Furthermore, transfer learning from German to Dutch improved performance by 7% over the standalone Dutch E2E system and 4% over the Dutch pipeline model. Our findings highlight the effectiveness of cross-lingual transfer in spoken NER and emphasize the need for additional data collection to improve these systems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2307_01310
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Leveraging Cross-Lingual Transfer Learning in Spoken Named Entity Recognition Systems Benaicha, Moncef Thulke, David Turan, M. A. Tuğtekin Computation and Language Recent Named Entity Recognition (NER) advancements have significantly enhanced text classification capabilities. This paper focuses on spoken NER, aimed explicitly at spoken document retrieval, an area not widely studied due to the lack of comprehensive datasets for spoken contexts. Additionally, the potential for cross-lingual transfer learning in low-resource situations deserves further investigation. In our study, we applied transfer learning techniques across Dutch, English, and German using both pipeline and End-to-End (E2E) approaches. We employed Wav2Vec2 XLS-R models on custom pseudo-annotated datasets to evaluate the adaptability of cross-lingual systems. Our exploration of different architectural configurations assessed the robustness of these systems in spoken NER. Results showed that the E2E model was superior to the pipeline model, particularly with limited annotation resources. Furthermore, transfer learning from German to Dutch improved performance by 7% over the standalone Dutch E2E system and 4% over the Dutch pipeline model. Our findings highlight the effectiveness of cross-lingual transfer in spoken NER and emphasize the need for additional data collection to improve these systems.
title	Leveraging Cross-Lingual Transfer Learning in Spoken Named Entity Recognition Systems
topic	Computation and Language
url	https://arxiv.org/abs/2307.01310

Similar Items