:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Perrella, Stefano, Agostinho, Eric Morales, Zaragoza, Hugo
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.19921
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!
by: Perrella, Stefano, et al.
Published: (2024)

Has Machine Translation Evaluation Achieved Human Parity? The Human Reference and the Limits of Progress
by: Proietti, Lorenzo, et al.
Published: (2025)

Beyond Correlation: Interpretable Evaluation of Machine Translation Metrics
by: Perrella, Stefano, et al.
Published: (2024)

Minimum Bayes Risk Decoding for Error Span Detection in Reference-Free Automatic Machine Translation Evaluation
by: Lyu, Boxuan, et al.
Published: (2025)

Feeding Two Birds or Favoring One? Adequacy-Fluency Tradeoffs in Evaluation and Meta-Evaluation of Machine Translation
by: Shayegh, Behzad, et al.
Published: (2025)

Fine-Grained and Multi-Dimensional Metrics for Document-Level Machine Translation
by: Sun, Yirong, et al.
Published: (2024)

Evaluation of Machine Translation Based on Semantic Dependencies and Keywords
by: Yuan, Kewei, et al.
Published: (2024)

On the Evaluation Practices in Multilingual NLP: Can Machine Translation Offer an Alternative to Human Translations?
by: Choenni, Rochelle, et al.
Published: (2024)

Should I Share this Translation? Evaluating Quality Feedback for User Reliance on Machine Translation
by: Ki, Dayeon, et al.
Published: (2025)

Estimating Machine Translation Difficulty
by: Proietti, Lorenzo, et al.
Published: (2025)

Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation
by: Kreutzer, Julia, et al.
Published: (2025)

MetaMetrics-MT: Tuning Meta-Metrics for Machine Translation via Human Preference Calibration
by: Anugraha, David, et al.
Published: (2024)

FairTranslate: An English-French Dataset for Gender Bias Evaluation in Machine Translation by Overcoming Gender Binarity
by: Jourdan, Fanny, et al.
Published: (2025)

Align-then-Slide: A complete evaluation framework for Ultra-Long Document-Level Machine Translation
by: Guo, Jiaxin, et al.
Published: (2025)

POMP: Probability-driven Meta-graph Prompter for LLMs in Low-resource Unsupervised Neural Machine Translation
by: Pan, Shilong, et al.
Published: (2024)

Better Late Than Never: Meta-Evaluation of Latency Metrics for Simultaneous Speech-to-Text Translation
by: Polák, Peter, et al.
Published: (2025)

Meta-Judging with Large Language Models: Concepts, Methods, and Challenges
by: Silva, Hugo, et al.
Published: (2026)

Evaluating the Meta- and Object-Level Reasoning of Large Language Models for Question Answering
by: Ferguson, Nick, et al.
Published: (2025)

Enhancing Document-Level Machine Translation via Filtered Synthetic Corpora and Two-Stage LLM Adaptation
by: Kim, Ireh, et al.
Published: (2026)

M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation
by: Feng, Zhaopeng, et al.
Published: (2024)

Sentiment Analysis Across Languages: Evaluation Before and After Machine Translation to English
by: Kathunia, Aekansh, et al.
Published: (2024)

Sociotechnical Effects of Machine Translation
by: Moorkens, Joss, et al.
Published: (2025)

SLIDE: Reference-free Evaluation for Machine Translation using a Sliding Document Window
by: Raunak, Vikas, et al.
Published: (2023)

Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation
by: Huang, Xu, et al.
Published: (2024)

Responsible AI in NLP: GUS-Net Span-Level Bias Detection Dataset and Benchmark for Generalizations, Unfairness, and Stereotypes
by: Powers, Maximus, et al.
Published: (2024)

Scaling Bidirectional Spans and Span Violations in Attention Mechanism
by: Kim, Jongwook, et al.
Published: (2025)

Convergences and Divergences between Automatic Assessment and Human Evaluation: Insights from Comparing ChatGPT-Generated Translation and Neural Machine Translation
by: Jiang, Zhaokun, et al.
Published: (2024)

Translation of Multifaceted Data without Re-Training of Machine Translation Systems
by: Moon, Hyeonseok, et al.
Published: (2024)

Transformer-Encoder Trees for Efficient Multilingual Machine Translation and Speech Translation
by: Guan, Yiwen, et al.
Published: (2025)

Generating Gender Alternatives in Machine Translation
by: Garg, Sarthak, et al.
Published: (2024)

Word Alignment as Preference for Machine Translation
by: Wu, Qiyu, et al.
Published: (2024)

Interplay of Machine Translation, Diacritics, and Diacritization
by: Chen, Wei-Rui, et al.
Published: (2024)

Glancing Future for Simultaneous Machine Translation
by: Guo, Shoutao, et al.
Published: (2023)

Trainable Reference-Based Evaluation Metric for Identifying Quality of English-Gujarati Machine Translation System
by: Joshi, Nisheeth, et al.
Published: (2025)

Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels
by: Yan, Jianhao, et al.
Published: (2024)

Magis-Bench: Evaluating LLMs on Magistrate-Level Legal Tasks
by: Pires, Ramon, et al.
Published: (2026)

DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory
by: Wang, Yutong, et al.
Published: (2024)

Memory Reviving, Continuing Learning and Beyond: Evaluation of Pre-trained Encoders and Decoders for Multimodal Machine Translation
by: Yu, Zhuang, et al.
Published: (2025)

Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination's Impact on Machine Translation
by: Kocyigit, Muhammed Yusuf, et al.
Published: (2025)

Retrieval-Augmented Machine Translation with Unstructured Knowledge
by: Wang, Jiaan, et al.
Published: (2024)