Saved in:
Bibliographic Details
Main Authors: Osmelak, Doreen, Chowdhury, Koel Dutta, Sentsova, Uliana, España-Bonet, Cristina, van Genabith, Josef
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2511.02721
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916041895968768
author Osmelak, Doreen
Chowdhury, Koel Dutta
Sentsova, Uliana
España-Bonet, Cristina
van Genabith, Josef
author_facet Osmelak, Doreen
Chowdhury, Koel Dutta
Sentsova, Uliana
España-Bonet, Cristina
van Genabith, Josef
contents Translators often enrich texts with background details that make implicit cultural meanings explicit for new audiences. This phenomenon, known as pragmatic explicitation, has been widely discussed in translation theory but rarely modeled computationally. We introduce PragExTra, the first multilingual corpus and detection framework for pragmatic explicitation. The corpus covers eight language pairs from TED-Multi and Europarl and includes additions such as entity descriptions, measurement conversions, and translator remarks. We identify candidate explicitation cases through null alignments and refined using active learning with human annotation. Our results show that entity and system-level explicitations are most frequent, and that active learning improves classifier accuracy by 7-8 percentage points, achieving up to 0.88 accuracy and 0.82 F1 across languages. PragExTra establishes pragmatic explicitation as a measurable, cross-linguistic phenomenon and takes a step towards building culturally aware machine translation. Keywords: translation, multilingualism, explicitation
format Preprint
id arxiv_https___arxiv_org_abs_2511_02721
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle PETra: A Multilingual Corpus of Pragmatic Explicitation in Translation
Osmelak, Doreen
Chowdhury, Koel Dutta
Sentsova, Uliana
España-Bonet, Cristina
van Genabith, Josef
Computation and Language
Translators often enrich texts with background details that make implicit cultural meanings explicit for new audiences. This phenomenon, known as pragmatic explicitation, has been widely discussed in translation theory but rarely modeled computationally. We introduce PragExTra, the first multilingual corpus and detection framework for pragmatic explicitation. The corpus covers eight language pairs from TED-Multi and Europarl and includes additions such as entity descriptions, measurement conversions, and translator remarks. We identify candidate explicitation cases through null alignments and refined using active learning with human annotation. Our results show that entity and system-level explicitations are most frequent, and that active learning improves classifier accuracy by 7-8 percentage points, achieving up to 0.88 accuracy and 0.82 F1 across languages. PragExTra establishes pragmatic explicitation as a measurable, cross-linguistic phenomenon and takes a step towards building culturally aware machine translation. Keywords: translation, multilingualism, explicitation
title PETra: A Multilingual Corpus of Pragmatic Explicitation in Translation
topic Computation and Language
url https://arxiv.org/abs/2511.02721