Saved in:
Bibliographic Details
Main Authors: DK, Thennal, Biemann, Chris, Hatzel, Hans Ole
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.12021
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915938477015040
author DK, Thennal
Biemann, Chris
Hatzel, Hans Ole
author_facet DK, Thennal
Biemann, Chris
Hatzel, Hans Ole
contents Label projection is an effective technique for cross-lingual transfer, extending span-annotated datasets from a high-resource language to low-resource ones. Most approaches perform label projection as a separate step after machine translation, and prior work that combines the two reports degraded translation quality. We re-evaluate this claim with LabelPigeon, a novel framework that jointly performs translation and label projection via XML tags. We design a direct evaluation scheme for label projection, and find that LabelPigeon outperforms baselines and actively improves translation quality in 11 languages. We further assess translation quality across 203 languages and varying annotation complexity, finding consistent improvement attributed to additional fine-tuning. Finally, across 27 languages and three downstream tasks, we report substantial gains in cross-lingual transfer over comparable work, up to +40.2 F1 on NER. Overall, our results demonstrate that XML-tagged label projection provides effective and efficient label transfer without compromising translation quality.
format Preprint
id arxiv_https___arxiv_org_abs_2603_12021
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Just Use XML: Revisiting Joint Translation and Label Projection
DK, Thennal
Biemann, Chris
Hatzel, Hans Ole
Computation and Language
Artificial Intelligence
Label projection is an effective technique for cross-lingual transfer, extending span-annotated datasets from a high-resource language to low-resource ones. Most approaches perform label projection as a separate step after machine translation, and prior work that combines the two reports degraded translation quality. We re-evaluate this claim with LabelPigeon, a novel framework that jointly performs translation and label projection via XML tags. We design a direct evaluation scheme for label projection, and find that LabelPigeon outperforms baselines and actively improves translation quality in 11 languages. We further assess translation quality across 203 languages and varying annotation complexity, finding consistent improvement attributed to additional fine-tuning. Finally, across 27 languages and three downstream tasks, we report substantial gains in cross-lingual transfer over comparable work, up to +40.2 F1 on NER. Overall, our results demonstrate that XML-tagged label projection provides effective and efficient label transfer without compromising translation quality.
title Just Use XML: Revisiting Joint Translation and Label Projection
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2603.12021