Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Furtado, Anna Beatriz Dimas, Ranasinghe, Tharindu, Blain, Frédéric, Mitkov, Ruslan
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Computation and Language Machine Learning
Online-Zugang:	https://arxiv.org/abs/2403.18018
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866913287725121536
author	Furtado, Anna Beatriz Dimas Ranasinghe, Tharindu Blain, Frédéric Mitkov, Ruslan
author_facet	Furtado, Anna Beatriz Dimas Ranasinghe, Tharindu Blain, Frédéric Mitkov, Ruslan
contents	Definition modelling (DM) is the task of automatically generating a dictionary definition for a specific word. Computational systems that are capable of DM can have numerous applications benefiting a wide range of audiences. As DM is considered a supervised natural language generation problem, these systems require large annotated datasets to train the machine learning (ML) models. Several DM datasets have been released for English and other high-resource languages. While Portuguese is considered a mid/high-resource language in most natural language processing tasks and is spoken by more than 200 million native speakers, there is no DM dataset available for Portuguese. In this research, we fill this gap by introducing DORE; the first dataset for Definition MOdelling for PoRtuguEse containing more than 100,000 definitions. We also evaluate several deep learning based DM models on DORE and report the results. The dataset and the findings of this paper will facilitate research and study of Portuguese in wider contexts.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_18018
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	DORE: A Dataset For Portuguese Definition Generation Furtado, Anna Beatriz Dimas Ranasinghe, Tharindu Blain, Frédéric Mitkov, Ruslan Computation and Language Machine Learning Definition modelling (DM) is the task of automatically generating a dictionary definition for a specific word. Computational systems that are capable of DM can have numerous applications benefiting a wide range of audiences. As DM is considered a supervised natural language generation problem, these systems require large annotated datasets to train the machine learning (ML) models. Several DM datasets have been released for English and other high-resource languages. While Portuguese is considered a mid/high-resource language in most natural language processing tasks and is spoken by more than 200 million native speakers, there is no DM dataset available for Portuguese. In this research, we fill this gap by introducing DORE; the first dataset for Definition MOdelling for PoRtuguEse containing more than 100,000 definitions. We also evaluate several deep learning based DM models on DORE and report the results. The dataset and the findings of this paper will facilitate research and study of Portuguese in wider contexts.
title	DORE: A Dataset For Portuguese Definition Generation
topic	Computation and Language Machine Learning
url	https://arxiv.org/abs/2403.18018

Ähnliche Einträge