Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Flückiger, Alex, Amrhein, Chantal, Graf, Tim, Odermatt, Frédéric, Pömsl, Martin, Schläpfer, Philippe, Schottmann, Florian, Läubli, Samuel
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2502.02577
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915294486724608
author	Flückiger, Alex Amrhein, Chantal Graf, Tim Odermatt, Frédéric Pömsl, Martin Schläpfer, Philippe Schottmann, Florian Läubli, Samuel
author_facet	Flückiger, Alex Amrhein, Chantal Graf, Tim Odermatt, Frédéric Pömsl, Martin Schläpfer, Philippe Schottmann, Florian Läubli, Samuel
contents	As strong machine translation (MT) systems are increasingly based on large language models (LLMs), reliable quality benchmarking requires methods that capture their ability to leverage extended context. This study compares two commercial MT systems -- DeepL and Supertext -- by assessing their performance on unsegmented texts. We evaluate translation quality across four language directions with professional translators assessing segments with full document-level context. While segment-level assessments indicate no strong preference between the systems in most cases, document-level analysis reveals a preference for Supertext in three out of four language directions, suggesting superior consistency across longer texts. We advocate for more context-sensitive evaluation methodologies to ensure that MT quality assessments reflect real-world usability. We release all evaluation data and scripts for further analysis and reproduction at https://github.com/supertext/evaluation_deepl_supertext.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_02577
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	A comparison of translation performance between DeepL and Supertext Flückiger, Alex Amrhein, Chantal Graf, Tim Odermatt, Frédéric Pömsl, Martin Schläpfer, Philippe Schottmann, Florian Läubli, Samuel Computation and Language As strong machine translation (MT) systems are increasingly based on large language models (LLMs), reliable quality benchmarking requires methods that capture their ability to leverage extended context. This study compares two commercial MT systems -- DeepL and Supertext -- by assessing their performance on unsegmented texts. We evaluate translation quality across four language directions with professional translators assessing segments with full document-level context. While segment-level assessments indicate no strong preference between the systems in most cases, document-level analysis reveals a preference for Supertext in three out of four language directions, suggesting superior consistency across longer texts. We advocate for more context-sensitive evaluation methodologies to ensure that MT quality assessments reflect real-world usability. We release all evaluation data and scripts for further analysis and reproduction at https://github.com/supertext/evaluation_deepl_supertext.
title	A comparison of translation performance between DeepL and Supertext
topic	Computation and Language
url	https://arxiv.org/abs/2502.02577

Similar Items