:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Domhan, Tobias, Zhu, Dawei
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2505.01761
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Automated evaluation of LLMs for effective machine translation of Mandarin Chinese to English
by: Zhang, Yue, et al.
Published: (2026)

Less is more: Not all samples are effective for evaluation
by: Song, Wentang, et al.
Published: (2025)

MT-Ranker: Reference-free machine translation evaluation by inter-system ranking
by: Moosa, Ibraheem Muhammad, et al.
Published: (2024)

IsoChronoMeter: A simple and effective isochronic translation evaluation metric
by: Rozanov, Nikolai, et al.
Published: (2024)

Mind the Gap... or Not? How Translation Errors and Evaluation Details Skew Multilingual Results
by: Peter, Jan-Thorsten, et al.
Published: (2025)

Large-scale cloze evaluation reveals that token prediction tasks are neither lexically nor semantically aligned
by: Jacobs, Cassandra L., et al.
Published: (2024)

Better & Faster Large Language Models via Multi-token Prediction
by: Gloeckle, Fabian, et al.
Published: (2024)

LBPE: Long-token-first Tokenization to Improve Large Language Models
by: Lian, Haoran, et al.
Published: (2024)

A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism
by: Thompson, Brian, et al.
Published: (2024)

Recurrent babbling: evaluating the acquisition of grammar from limited input data
by: Pannitto, Ludovica, et al.
Published: (2020)

Contextual effects of sentiment deployment in human and machine translation
by: Comstock, Lindy, et al.
Published: (2025)

Source framing triggers systematic evaluation bias in Large Language Models
by: Germani, Federico, et al.
Published: (2025)

Re-evaluating Open-ended Evaluation of Large Language Models
by: Liu, Siqi, et al.
Published: (2025)

Batayan: A Filipino NLP benchmark for evaluating Large Language Models
by: Montalan, Jann Railey, et al.
Published: (2025)

Assessing "Implicit" Retrieval Robustness of Large Language Models
by: Shen, Xiaoyu, et al.
Published: (2024)

MetricX-25 and GemSpanEval: Google Translate Submissions to the WMT25 Evaluation Shared Task
by: Juraska, Juraj, et al.
Published: (2025)

Optimizing example selection for retrieval-augmented machine translation with translation memories
by: Bouthors, Maxime, et al.
Published: (2024)

Mining experimental data from Materials Science literature with Large Language Models: an evaluation study
by: Foppiano, Luca, et al.
Published: (2024)

To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models
by: Lin, Junyan, et al.
Published: (2024)

DLLMQuant: Quantizing Diffusion-based Large Language Models
by: Xu, Chen, et al.
Published: (2025)

Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation
by: Sánchez-Cartagena, Víctor M., et al.
Published: (2024)

CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model
by: Li, Jiangtong, et al.
Published: (2025)

Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models
by: In, Yeonjun, et al.
Published: (2025)

COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain
by: Panagoulias, Dimitrios P., et al.
Published: (2024)

Escaping the sentence-level paradigm in machine translation
by: Post, Matt, et al.
Published: (2023)

Multilingual Language Model Pretraining using Machine-translated Data
by: Wang, Jiayi, et al.
Published: (2025)

A Preference-driven Paradigm for Enhanced Translation with Large Language Models
by: Zhu, Dawei, et al.
Published: (2024)

An evaluation of LLMs and Google Translate for translation of selected Indian languages via sentiment and semantic analyses
by: Chandra, Rohitash, et al.
Published: (2025)

Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking
by: Zhu, Junda, et al.
Published: (2025)

Fine-Tuning Large Language Models to Translate: Will a Touch of Noisy Data in Misaligned Languages Suffice?
by: Zhu, Dawei, et al.
Published: (2024)

Beyond the Covariance Trap: Unlocking Generalization in Same-Subject Knowledge Editing for Large Language Models
by: Liu, Xiyu, et al.
Published: (2026)

Byte-token Enhanced Language Models for Temporal Point Processes Analysis
by: Kong, Quyu, et al.
Published: (2025)

Feeding Two Birds or Favoring One? Adequacy-Fluency Tradeoffs in Evaluation and Meta-Evaluation of Machine Translation
by: Shayegh, Behzad, et al.
Published: (2025)

A unified foundational framework for knowledge injection and evaluation of Large Language Models in Combustion Science
by: Yang, Zonglin, et al.
Published: (2026)

Large Language Models, scientific knowledge and factuality: A framework to streamline human expert evaluation
by: Wysocka, Magdalena, et al.
Published: (2023)

The first open machine translation system for the Chechen language
by: Umishov, Abu-Viskhan A., et al.
Published: (2025)

More Women, Same Stereotypes: Unpacking the Gender Bias Paradox in Large Language Models
by: Chen, Evan, et al.
Published: (2025)

Investigating the translation capabilities of Large Language Models trained on parallel data only
by: Gilabert, Javier García, et al.
Published: (2024)

A Report on the llms evaluating the high school questions
by: Jiawei, Zhu, et al.
Published: (2025)

An evaluation of DeepSeek Models in Biomedical Natural Language Processing
by: Zhan, Zaifu, et al.
Published: (2025)