:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kocmi, Tom, Zouhar, Vilém, Federmann, Christian, Post, Matt
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2401.06760
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Pearmut: Human Evaluation of Translation Made Trivial
by: Zouhar, Vilém, et al.
Published: (2026)

AI-Assisted Human Evaluation of Machine Translation
by: Zouhar, Vilém, et al.
Published: (2024)

Estimating Machine Translation Difficulty
by: Proietti, Lorenzo, et al.
Published: (2025)

SLIDE: Reference-free Evaluation for Machine Translation using a Sliding Document Window
by: Raunak, Vikas, et al.
Published: (2023)

Quality and Quantity of Machine Translation References for Automatic Metrics
by: Zouhar, Vilém, et al.
Published: (2024)

Stolen Subwords: Importance of Vocabularies for Machine Translation Model Stealing
by: Zouhar, Vilém
Published: (2024)

Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation
by: Kocmi, Tom, et al.
Published: (2024)

Distributional Properties of Subword Regularization
by: Cognetta, Marco, et al.
Published: (2024)

How to Select Datapoints for Efficient Human Evaluation of NLG Models?
by: Zouhar, Vilém, et al.
Published: (2025)

Multimodal Shannon Game with Images
by: Zouhar, Vilém, et al.
Published: (2023)

AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and Guardrails
by: Chowdhury, Sankalan Pal, et al.
Published: (2024)

Fine-Tuned Machine Translation Metrics Struggle in Unseen Domains
by: Zouhar, Vilém, et al.
Published: (2024)

A Bayesian Optimization Approach to Machine Translation Reranking
by: Cheng, Julius, et al.
Published: (2024)

Two Counterexamples to Tokenization and the Noiseless Channel
by: Cognetta, Marco, et al.
Published: (2024)

Biased Tales: Cultural and Topic Bias in Generating Children's Stories
by: Rooein, Donya, et al.
Published: (2025)

Evaluating Optimal Reference Translations
by: Zouhar, Vilém, et al.
Published: (2023)

Preliminary WMT24 Ranking of General MT Systems and LLMs
by: Kocmi, Tom, et al.
Published: (2024)

How to Engage Your Readers? Generating Guiding Questions to Promote Active Reading
by: Cui, Peng, et al.
Published: (2024)

Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement
by: Sarti, Gabriele, et al.
Published: (2025)

COMET-poly: Machine Translation Metric Grounded in Other Candidates
by: Züfle, Maike, et al.
Published: (2025)

PEAR: Pairwise Evaluation for Automatic Relative Scoring in Machine Translation
by: Proietti, Lorenzo, et al.
Published: (2026)

When LLMs Benchmark Themselves: Deconstructing Self-Bias in Automated Evaluation
by: Xu, Wenda, et al.
Published: (2025)

Pitfalls and Outlooks in Using COMET
by: Zouhar, Vilém, et al.
Published: (2024)

Early-Exit and Instant Confidence Translation Quality Estimation
by: Zouhar, Vilém, et al.
Published: (2025)

Multilingual Performance Biases of Large Language Models in Education
by: Gupta, Vansh, et al.
Published: (2025)

Machine Translation Meta Evaluation through Translation Accuracy Challenge Sets
by: Moghe, Nikita, et al.
Published: (2024)

RELIC: Investigating Large Language Model Responses using Self-Consistency
by: Cheng, Furui, et al.
Published: (2023)

QE4PE: Word-level Quality Estimation for Human Post-Editing
by: Sarti, Gabriele, et al.
Published: (2025)

TASER: Translation Assessment via Systematic Evaluation and Reasoning
by: Maheswaran, Monishwaran, et al.
Published: (2025)

How Important is `Perfect' English for Machine Translation Prompts?
by: Schmidtová, Patrícia, et al.
Published: (2025)

Generating Difficult-to-Translate Texts
by: Zouhar, Vilém, et al.
Published: (2025)

PWESuite: Phonetic Word Embeddings and Tasks They Facilitate
by: Zouhar, Vilém, et al.
Published: (2023)

Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References
by: Tang, Tianyi, et al.
Published: (2023)

Escaping the sentence-level paradigm in machine translation
by: Post, Matt, et al.
Published: (2023)

A Formal Perspective on Byte-Pair Encoding
by: Zouhar, Vilém, et al.
Published: (2023)

Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation
by: Kreutzer, Julia, et al.
Published: (2025)

Recovering document annotations for sentence-level bitext
by: Wicks, Rachel, et al.
Published: (2024)

Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models
by: Lu, Qingyu, et al.
Published: (2023)

The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs
by: Li, Hong, et al.
Published: (2024)

Can Reasoning Help Large Language Models Capture Human Annotator Disagreement?
by: Ni, Jingwei, et al.
Published: (2025)