Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Zhengxiang, Makarova, Veronika, Li, Zhi, Kodner, Jordan, Rambow, Owen
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.11368
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913869296828416
author	Wang, Zhengxiang Makarova, Veronika Li, Zhi Kodner, Jordan Rambow, Owen
author_facet	Wang, Zhengxiang Makarova, Veronika Li, Zhi Kodner, Jordan Rambow, Owen
contents	The paper explores the performance of LLMs in the context of multi-dimensional analytic writing assessments, i.e. their ability to provide both scores and comments based on multiple assessment criteria. Using a corpus of literature reviews written by L2 graduate students and assessed by human experts against 9 analytic criteria, we prompt several popular LLMs to perform the same task under various conditions. To evaluate the quality of feedback comments, we apply a novel feedback comment quality evaluation framework. This framework is interpretable, cost-efficient, scalable, and reproducible, compared to existing methods that rely on manual judgments. We find that LLMs can generate reasonably good and generally reliable multi-dimensional analytic assessments. We release our corpus and code for reproducibility.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_11368
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	LLMs can Perform Multi-Dimensional Analytic Writing Assessments: A Case Study of L2 Graduate-Level Academic English Writing Wang, Zhengxiang Makarova, Veronika Li, Zhi Kodner, Jordan Rambow, Owen Computation and Language Artificial Intelligence The paper explores the performance of LLMs in the context of multi-dimensional analytic writing assessments, i.e. their ability to provide both scores and comments based on multiple assessment criteria. Using a corpus of literature reviews written by L2 graduate students and assessed by human experts against 9 analytic criteria, we prompt several popular LLMs to perform the same task under various conditions. To evaluate the quality of feedback comments, we apply a novel feedback comment quality evaluation framework. This framework is interpretable, cost-efficient, scalable, and reproducible, compared to existing methods that rely on manual judgments. We find that LLMs can generate reasonably good and generally reliable multi-dimensional analytic assessments. We release our corpus and code for reproducibility.
title	LLMs can Perform Multi-Dimensional Analytic Writing Assessments: A Case Study of L2 Graduate-Level Academic English Writing
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2502.11368

Similar Items