Saved in:
Bibliographic Details
Main Authors: Taji, Dima, Zeman, Daniel
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.09417
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908276591951872
author Taji, Dima
Zeman, Daniel
author_facet Taji, Dima
Zeman, Daniel
contents Training models that can perform well on various NLP tasks require large amounts of data, and this becomes more apparent with nuanced tasks such as anaphora and conference resolution. To combat the prohibitive costs of creating manual gold annotated data, this paper explores two methods to automatically create datasets with coreferential annotations; direct conversion from existing datasets, and parsing using multilingual models capable of handling new and unseen languages. The paper details the current progress on those two fronts, as well as the challenges the efforts currently face, and our approach to overcoming these challenges.
format Preprint
id arxiv_https___arxiv_org_abs_2503_09417
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Towards Generating Automatic Anaphora Annotations
Taji, Dima
Zeman, Daniel
Computation and Language
Training models that can perform well on various NLP tasks require large amounts of data, and this becomes more apparent with nuanced tasks such as anaphora and conference resolution. To combat the prohibitive costs of creating manual gold annotated data, this paper explores two methods to automatically create datasets with coreferential annotations; direct conversion from existing datasets, and parsing using multilingual models capable of handling new and unseen languages. The paper details the current progress on those two fronts, as well as the challenges the efforts currently face, and our approach to overcoming these challenges.
title Towards Generating Automatic Anaphora Annotations
topic Computation and Language
url https://arxiv.org/abs/2503.09417