Saved in:
Bibliographic Details
Main Authors: Wang, Zhengxiang, Makarova, Veronika, Li, Zhi, Kodner, Jordan, Rambow, Owen
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.11368
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913869296828416
author Wang, Zhengxiang
Makarova, Veronika
Li, Zhi
Kodner, Jordan
Rambow, Owen
author_facet Wang, Zhengxiang
Makarova, Veronika
Li, Zhi
Kodner, Jordan
Rambow, Owen
contents The paper explores the performance of LLMs in the context of multi-dimensional analytic writing assessments, i.e. their ability to provide both scores and comments based on multiple assessment criteria. Using a corpus of literature reviews written by L2 graduate students and assessed by human experts against 9 analytic criteria, we prompt several popular LLMs to perform the same task under various conditions. To evaluate the quality of feedback comments, we apply a novel feedback comment quality evaluation framework. This framework is interpretable, cost-efficient, scalable, and reproducible, compared to existing methods that rely on manual judgments. We find that LLMs can generate reasonably good and generally reliable multi-dimensional analytic assessments. We release our corpus and code for reproducibility.
format Preprint
id arxiv_https___arxiv_org_abs_2502_11368
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle LLMs can Perform Multi-Dimensional Analytic Writing Assessments: A Case Study of L2 Graduate-Level Academic English Writing
Wang, Zhengxiang
Makarova, Veronika
Li, Zhi
Kodner, Jordan
Rambow, Owen
Computation and Language
Artificial Intelligence
The paper explores the performance of LLMs in the context of multi-dimensional analytic writing assessments, i.e. their ability to provide both scores and comments based on multiple assessment criteria. Using a corpus of literature reviews written by L2 graduate students and assessed by human experts against 9 analytic criteria, we prompt several popular LLMs to perform the same task under various conditions. To evaluate the quality of feedback comments, we apply a novel feedback comment quality evaluation framework. This framework is interpretable, cost-efficient, scalable, and reproducible, compared to existing methods that rely on manual judgments. We find that LLMs can generate reasonably good and generally reliable multi-dimensional analytic assessments. We release our corpus and code for reproducibility.
title LLMs can Perform Multi-Dimensional Analytic Writing Assessments: A Case Study of L2 Graduate-Level Academic English Writing
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2502.11368