:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Proietti, Lorenzo, Grundkiewicz, Roman, Post, Matt
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2601.18006
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PyMarian: Fast Neural Machine Translation and Evaluation in Python
by: Gowda, Thamme, et al.
Published: (2024)

On Instruction-Finetuning Neural Machine Translation Models
by: Raunak, Vikas, et al.
Published: (2024)

Has Machine Translation Evaluation Achieved Human Parity? The Human Reference and the Limits of Progress
by: Proietti, Lorenzo, et al.
Published: (2025)

Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation
by: Kocmi, Tom, et al.
Published: (2024)

SLIDE: Reference-free Evaluation for Machine Translation using a Sliding Document Window
by: Raunak, Vikas, et al.
Published: (2023)

Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!
by: Perrella, Stefano, et al.
Published: (2024)

Estimating Machine Translation Difficulty
by: Proietti, Lorenzo, et al.
Published: (2025)

Beyond Correlation: Interpretable Evaluation of Machine Translation Metrics
by: Perrella, Stefano, et al.
Published: (2024)

Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies
by: Kocmi, Tom, et al.
Published: (2024)

Preliminary Ranking of WMT25 General Machine Translation Systems
by: Kocmi, Tom, et al.
Published: (2025)

Evaluating Automatic Metrics with Incremental Machine Translation Systems
by: Wu, Guojun, et al.
Published: (2024)

Escaping the sentence-level paradigm in machine translation
by: Post, Matt, et al.
Published: (2023)

AskQE: Question Answering as Automatic Evaluation for Machine Translation
by: Ki, Dayeon, et al.
Published: (2025)

Extending Automatic Machine Translation Evaluation to Book-Length Documents
by: Wang, Kuang-Da, et al.
Published: (2025)

Recovering document annotations for sentence-level bitext
by: Wicks, Rachel, et al.
Published: (2024)

Translation or Recitation? Calibrating Evaluation Scores for Machine Translation of Extremely Low-Resource Languages
by: Chen, Danlu, et al.
Published: (2026)

Confidence and Stability of Global and Pairwise Scores in NLP Evaluation
by: Levtsov, Georgii, et al.
Published: (2025)

Direct-Scoring NLG Evaluators Can Use Pairwise Comparisons Too
by: Lawrence, Logan, et al.
Published: (2025)

Quality and Quantity of Machine Translation References for Automatic Metrics
by: Zouhar, Vilém, et al.
Published: (2024)

Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy
by: Thompson, Brian, et al.
Published: (2024)

BiVert: Bidirectional Vocabulary Evaluation using Relations for Machine Translation
by: Cherf, Carinne, et al.
Published: (2024)

JP-TL-Bench: Anchored Pairwise LLM Evaluation for Bidirectional Japanese-English Translation
by: Lin, Leonard, et al.
Published: (2026)

Non-Linear Scoring Model for Translation Quality Evaluation
by: Gladkoff, Serge, et al.
Published: (2025)

Automatic Machine Translation Detection Using a Surrogate Multilingual Translation Model
by: García-Romero, Cristian, et al.
Published: (2025)

Pair2Score: Pairwise-to-Absolute Transfer for LLM-Based Essay Scoring
by: Hallaç, İbrahim Rıza, et al.
Published: (2026)

The Comparison of Translationese in Machine Translation and Human Transation in terms of Translation Relations
by: Zhou, Fan
Published: (2024)

Gained in Translation: Privileged Pairwise Judges Enhance Multilingual Reasoning
by: Sutawika, Lintang, et al.
Published: (2026)

A Critical Study of Automatic Evaluation in Sign Language Translation
by: Yazdani, Shakib, et al.
Published: (2025)

The quasi-semantic competence of LLMs: a case study on the part-whole relation
by: Proietti, Mattia, et al.
Published: (2025)

Token-level Ensembling of Models with Different Vocabularies
by: Wicks, Rachel, et al.
Published: (2025)

Convergences and Divergences between Automatic Assessment and Human Evaluation: Insights from Comparing ChatGPT-Generated Translation and Neural Machine Translation
by: Jiang, Zhaokun, et al.
Published: (2024)

Automatically Generating Chinese Homophone Words to Probe Machine Translation Estimation Systems
by: Qian, Shenbin, et al.
Published: (2025)

GRRM: Group Relative Reward Modeling for Machine Translation
by: Yang, Sen, et al.
Published: (2026)

Lexicography Saves Lives (LSL): Automatically Translating Suicide-Related Language
by: Schoene, Annika Marie, et al.
Published: (2024)

Machine Translation Meta Evaluation through Translation Accuracy Challenge Sets
by: Moghe, Nikita, et al.
Published: (2024)

Concept-Guided Chain-of-Thought Prompting for Pairwise Comparison Scoring of Texts with Large Language Models
by: Wu, Patrick Y., et al.
Published: (2023)

Beyond Scalar Scores: Reinforcement Learning for Error-Aware Quality Estimation of Machine Translation
by: Sindhujan, Archchana, et al.
Published: (2026)

RDBE: Reasoning Distillation-Based Evaluation Enhances Automatic Essay Scoring
by: Mohammadkhani, Ali Ghiasvand
Published: (2024)

CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation
by: Zhao, Rui, et al.
Published: (2024)

Uncertainty Quantification for Evaluating Machine Translation Bias
by: Staliūnaitė, Ieva Raminta, et al.
Published: (2025)