Saved in:
| Main Authors: | Proietti, Lorenzo, Grundkiewicz, Roman, Post, Matt |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.18006 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PyMarian: Fast Neural Machine Translation and Evaluation in Python
by: Gowda, Thamme, et al.
Published: (2024)
by: Gowda, Thamme, et al.
Published: (2024)
On Instruction-Finetuning Neural Machine Translation Models
by: Raunak, Vikas, et al.
Published: (2024)
by: Raunak, Vikas, et al.
Published: (2024)
Has Machine Translation Evaluation Achieved Human Parity? The Human Reference and the Limits of Progress
by: Proietti, Lorenzo, et al.
Published: (2025)
by: Proietti, Lorenzo, et al.
Published: (2025)
Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation
by: Kocmi, Tom, et al.
Published: (2024)
by: Kocmi, Tom, et al.
Published: (2024)
SLIDE: Reference-free Evaluation for Machine Translation using a Sliding Document Window
by: Raunak, Vikas, et al.
Published: (2023)
by: Raunak, Vikas, et al.
Published: (2023)
Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!
by: Perrella, Stefano, et al.
Published: (2024)
by: Perrella, Stefano, et al.
Published: (2024)
Estimating Machine Translation Difficulty
by: Proietti, Lorenzo, et al.
Published: (2025)
by: Proietti, Lorenzo, et al.
Published: (2025)
Beyond Correlation: Interpretable Evaluation of Machine Translation Metrics
by: Perrella, Stefano, et al.
Published: (2024)
by: Perrella, Stefano, et al.
Published: (2024)
Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies
by: Kocmi, Tom, et al.
Published: (2024)
by: Kocmi, Tom, et al.
Published: (2024)
Preliminary Ranking of WMT25 General Machine Translation Systems
by: Kocmi, Tom, et al.
Published: (2025)
by: Kocmi, Tom, et al.
Published: (2025)
Evaluating Automatic Metrics with Incremental Machine Translation Systems
by: Wu, Guojun, et al.
Published: (2024)
by: Wu, Guojun, et al.
Published: (2024)
Escaping the sentence-level paradigm in machine translation
by: Post, Matt, et al.
Published: (2023)
by: Post, Matt, et al.
Published: (2023)
AskQE: Question Answering as Automatic Evaluation for Machine Translation
by: Ki, Dayeon, et al.
Published: (2025)
by: Ki, Dayeon, et al.
Published: (2025)
Extending Automatic Machine Translation Evaluation to Book-Length Documents
by: Wang, Kuang-Da, et al.
Published: (2025)
by: Wang, Kuang-Da, et al.
Published: (2025)
Recovering document annotations for sentence-level bitext
by: Wicks, Rachel, et al.
Published: (2024)
by: Wicks, Rachel, et al.
Published: (2024)
Translation or Recitation? Calibrating Evaluation Scores for Machine Translation of Extremely Low-Resource Languages
by: Chen, Danlu, et al.
Published: (2026)
by: Chen, Danlu, et al.
Published: (2026)
Confidence and Stability of Global and Pairwise Scores in NLP Evaluation
by: Levtsov, Georgii, et al.
Published: (2025)
by: Levtsov, Georgii, et al.
Published: (2025)
Direct-Scoring NLG Evaluators Can Use Pairwise Comparisons Too
by: Lawrence, Logan, et al.
Published: (2025)
by: Lawrence, Logan, et al.
Published: (2025)
Quality and Quantity of Machine Translation References for Automatic Metrics
by: Zouhar, Vilém, et al.
Published: (2024)
by: Zouhar, Vilém, et al.
Published: (2024)
Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy
by: Thompson, Brian, et al.
Published: (2024)
by: Thompson, Brian, et al.
Published: (2024)
BiVert: Bidirectional Vocabulary Evaluation using Relations for Machine Translation
by: Cherf, Carinne, et al.
Published: (2024)
by: Cherf, Carinne, et al.
Published: (2024)
JP-TL-Bench: Anchored Pairwise LLM Evaluation for Bidirectional Japanese-English Translation
by: Lin, Leonard, et al.
Published: (2026)
by: Lin, Leonard, et al.
Published: (2026)
Non-Linear Scoring Model for Translation Quality Evaluation
by: Gladkoff, Serge, et al.
Published: (2025)
by: Gladkoff, Serge, et al.
Published: (2025)
Automatic Machine Translation Detection Using a Surrogate Multilingual Translation Model
by: García-Romero, Cristian, et al.
Published: (2025)
by: García-Romero, Cristian, et al.
Published: (2025)
Pair2Score: Pairwise-to-Absolute Transfer for LLM-Based Essay Scoring
by: Hallaç, İbrahim Rıza, et al.
Published: (2026)
by: Hallaç, İbrahim Rıza, et al.
Published: (2026)
The Comparison of Translationese in Machine Translation and Human Transation in terms of Translation Relations
by: Zhou, Fan
Published: (2024)
by: Zhou, Fan
Published: (2024)
Gained in Translation: Privileged Pairwise Judges Enhance Multilingual Reasoning
by: Sutawika, Lintang, et al.
Published: (2026)
by: Sutawika, Lintang, et al.
Published: (2026)
A Critical Study of Automatic Evaluation in Sign Language Translation
by: Yazdani, Shakib, et al.
Published: (2025)
by: Yazdani, Shakib, et al.
Published: (2025)
The quasi-semantic competence of LLMs: a case study on the part-whole relation
by: Proietti, Mattia, et al.
Published: (2025)
by: Proietti, Mattia, et al.
Published: (2025)
Token-level Ensembling of Models with Different Vocabularies
by: Wicks, Rachel, et al.
Published: (2025)
by: Wicks, Rachel, et al.
Published: (2025)
Convergences and Divergences between Automatic Assessment and Human Evaluation: Insights from Comparing ChatGPT-Generated Translation and Neural Machine Translation
by: Jiang, Zhaokun, et al.
Published: (2024)
by: Jiang, Zhaokun, et al.
Published: (2024)
Automatically Generating Chinese Homophone Words to Probe Machine Translation Estimation Systems
by: Qian, Shenbin, et al.
Published: (2025)
by: Qian, Shenbin, et al.
Published: (2025)
GRRM: Group Relative Reward Modeling for Machine Translation
by: Yang, Sen, et al.
Published: (2026)
by: Yang, Sen, et al.
Published: (2026)
Lexicography Saves Lives (LSL): Automatically Translating Suicide-Related Language
by: Schoene, Annika Marie, et al.
Published: (2024)
by: Schoene, Annika Marie, et al.
Published: (2024)
Machine Translation Meta Evaluation through Translation Accuracy Challenge Sets
by: Moghe, Nikita, et al.
Published: (2024)
by: Moghe, Nikita, et al.
Published: (2024)
Concept-Guided Chain-of-Thought Prompting for Pairwise Comparison Scoring of Texts with Large Language Models
by: Wu, Patrick Y., et al.
Published: (2023)
by: Wu, Patrick Y., et al.
Published: (2023)
Beyond Scalar Scores: Reinforcement Learning for Error-Aware Quality Estimation of Machine Translation
by: Sindhujan, Archchana, et al.
Published: (2026)
by: Sindhujan, Archchana, et al.
Published: (2026)
RDBE: Reasoning Distillation-Based Evaluation Enhances Automatic Essay Scoring
by: Mohammadkhani, Ali Ghiasvand
Published: (2024)
by: Mohammadkhani, Ali Ghiasvand
Published: (2024)
CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation
by: Zhao, Rui, et al.
Published: (2024)
by: Zhao, Rui, et al.
Published: (2024)
Uncertainty Quantification for Evaluating Machine Translation Bias
by: Staliūnaitė, Ieva Raminta, et al.
Published: (2025)
by: Staliūnaitė, Ieva Raminta, et al.
Published: (2025)
Similar Items
-
PyMarian: Fast Neural Machine Translation and Evaluation in Python
by: Gowda, Thamme, et al.
Published: (2024) -
On Instruction-Finetuning Neural Machine Translation Models
by: Raunak, Vikas, et al.
Published: (2024) -
Has Machine Translation Evaluation Achieved Human Parity? The Human Reference and the Limits of Progress
by: Proietti, Lorenzo, et al.
Published: (2025) -
Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation
by: Kocmi, Tom, et al.
Published: (2024) -
SLIDE: Reference-free Evaluation for Machine Translation using a Sliding Document Window
by: Raunak, Vikas, et al.
Published: (2023)