Saved in:
| Main Authors: | Kocmi, Tom, Zouhar, Vilém, Federmann, Christian, Post, Matt |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.06760 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Pearmut: Human Evaluation of Translation Made Trivial
by: Zouhar, Vilém, et al.
Published: (2026)
by: Zouhar, Vilém, et al.
Published: (2026)
AI-Assisted Human Evaluation of Machine Translation
by: Zouhar, Vilém, et al.
Published: (2024)
by: Zouhar, Vilém, et al.
Published: (2024)
Estimating Machine Translation Difficulty
by: Proietti, Lorenzo, et al.
Published: (2025)
by: Proietti, Lorenzo, et al.
Published: (2025)
SLIDE: Reference-free Evaluation for Machine Translation using a Sliding Document Window
by: Raunak, Vikas, et al.
Published: (2023)
by: Raunak, Vikas, et al.
Published: (2023)
Quality and Quantity of Machine Translation References for Automatic Metrics
by: Zouhar, Vilém, et al.
Published: (2024)
by: Zouhar, Vilém, et al.
Published: (2024)
Stolen Subwords: Importance of Vocabularies for Machine Translation Model Stealing
by: Zouhar, Vilém
Published: (2024)
by: Zouhar, Vilém
Published: (2024)
Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation
by: Kocmi, Tom, et al.
Published: (2024)
by: Kocmi, Tom, et al.
Published: (2024)
Distributional Properties of Subword Regularization
by: Cognetta, Marco, et al.
Published: (2024)
by: Cognetta, Marco, et al.
Published: (2024)
How to Select Datapoints for Efficient Human Evaluation of NLG Models?
by: Zouhar, Vilém, et al.
Published: (2025)
by: Zouhar, Vilém, et al.
Published: (2025)
Multimodal Shannon Game with Images
by: Zouhar, Vilém, et al.
Published: (2023)
by: Zouhar, Vilém, et al.
Published: (2023)
AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and Guardrails
by: Chowdhury, Sankalan Pal, et al.
Published: (2024)
by: Chowdhury, Sankalan Pal, et al.
Published: (2024)
Fine-Tuned Machine Translation Metrics Struggle in Unseen Domains
by: Zouhar, Vilém, et al.
Published: (2024)
by: Zouhar, Vilém, et al.
Published: (2024)
A Bayesian Optimization Approach to Machine Translation Reranking
by: Cheng, Julius, et al.
Published: (2024)
by: Cheng, Julius, et al.
Published: (2024)
Two Counterexamples to Tokenization and the Noiseless Channel
by: Cognetta, Marco, et al.
Published: (2024)
by: Cognetta, Marco, et al.
Published: (2024)
Biased Tales: Cultural and Topic Bias in Generating Children's Stories
by: Rooein, Donya, et al.
Published: (2025)
by: Rooein, Donya, et al.
Published: (2025)
Evaluating Optimal Reference Translations
by: Zouhar, Vilém, et al.
Published: (2023)
by: Zouhar, Vilém, et al.
Published: (2023)
Preliminary WMT24 Ranking of General MT Systems and LLMs
by: Kocmi, Tom, et al.
Published: (2024)
by: Kocmi, Tom, et al.
Published: (2024)
How to Engage Your Readers? Generating Guiding Questions to Promote Active Reading
by: Cui, Peng, et al.
Published: (2024)
by: Cui, Peng, et al.
Published: (2024)
Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement
by: Sarti, Gabriele, et al.
Published: (2025)
by: Sarti, Gabriele, et al.
Published: (2025)
COMET-poly: Machine Translation Metric Grounded in Other Candidates
by: Züfle, Maike, et al.
Published: (2025)
by: Züfle, Maike, et al.
Published: (2025)
PEAR: Pairwise Evaluation for Automatic Relative Scoring in Machine Translation
by: Proietti, Lorenzo, et al.
Published: (2026)
by: Proietti, Lorenzo, et al.
Published: (2026)
When LLMs Benchmark Themselves: Deconstructing Self-Bias in Automated Evaluation
by: Xu, Wenda, et al.
Published: (2025)
by: Xu, Wenda, et al.
Published: (2025)
Pitfalls and Outlooks in Using COMET
by: Zouhar, Vilém, et al.
Published: (2024)
by: Zouhar, Vilém, et al.
Published: (2024)
Early-Exit and Instant Confidence Translation Quality Estimation
by: Zouhar, Vilém, et al.
Published: (2025)
by: Zouhar, Vilém, et al.
Published: (2025)
Multilingual Performance Biases of Large Language Models in Education
by: Gupta, Vansh, et al.
Published: (2025)
by: Gupta, Vansh, et al.
Published: (2025)
Machine Translation Meta Evaluation through Translation Accuracy Challenge Sets
by: Moghe, Nikita, et al.
Published: (2024)
by: Moghe, Nikita, et al.
Published: (2024)
RELIC: Investigating Large Language Model Responses using Self-Consistency
by: Cheng, Furui, et al.
Published: (2023)
by: Cheng, Furui, et al.
Published: (2023)
QE4PE: Word-level Quality Estimation for Human Post-Editing
by: Sarti, Gabriele, et al.
Published: (2025)
by: Sarti, Gabriele, et al.
Published: (2025)
TASER: Translation Assessment via Systematic Evaluation and Reasoning
by: Maheswaran, Monishwaran, et al.
Published: (2025)
by: Maheswaran, Monishwaran, et al.
Published: (2025)
How Important is `Perfect' English for Machine Translation Prompts?
by: Schmidtová, Patrícia, et al.
Published: (2025)
by: Schmidtová, Patrícia, et al.
Published: (2025)
Generating Difficult-to-Translate Texts
by: Zouhar, Vilém, et al.
Published: (2025)
by: Zouhar, Vilém, et al.
Published: (2025)
PWESuite: Phonetic Word Embeddings and Tasks They Facilitate
by: Zouhar, Vilém, et al.
Published: (2023)
by: Zouhar, Vilém, et al.
Published: (2023)
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References
by: Tang, Tianyi, et al.
Published: (2023)
by: Tang, Tianyi, et al.
Published: (2023)
Escaping the sentence-level paradigm in machine translation
by: Post, Matt, et al.
Published: (2023)
by: Post, Matt, et al.
Published: (2023)
A Formal Perspective on Byte-Pair Encoding
by: Zouhar, Vilém, et al.
Published: (2023)
by: Zouhar, Vilém, et al.
Published: (2023)
Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation
by: Kreutzer, Julia, et al.
Published: (2025)
by: Kreutzer, Julia, et al.
Published: (2025)
Recovering document annotations for sentence-level bitext
by: Wicks, Rachel, et al.
Published: (2024)
by: Wicks, Rachel, et al.
Published: (2024)
Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models
by: Lu, Qingyu, et al.
Published: (2023)
by: Lu, Qingyu, et al.
Published: (2023)
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs
by: Li, Hong, et al.
Published: (2024)
by: Li, Hong, et al.
Published: (2024)
Can Reasoning Help Large Language Models Capture Human Annotator Disagreement?
by: Ni, Jingwei, et al.
Published: (2025)
by: Ni, Jingwei, et al.
Published: (2025)
Similar Items
-
Pearmut: Human Evaluation of Translation Made Trivial
by: Zouhar, Vilém, et al.
Published: (2026) -
AI-Assisted Human Evaluation of Machine Translation
by: Zouhar, Vilém, et al.
Published: (2024) -
Estimating Machine Translation Difficulty
by: Proietti, Lorenzo, et al.
Published: (2025) -
SLIDE: Reference-free Evaluation for Machine Translation using a Sliding Document Window
by: Raunak, Vikas, et al.
Published: (2023) -
Quality and Quantity of Machine Translation References for Automatic Metrics
by: Zouhar, Vilém, et al.
Published: (2024)