Saved in:
Bibliographic Details
Main Authors: Moore, Steven, Costello, Eamon, Nguyen, Huy A., Stamper, John
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2405.20529
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913371466498048
author Moore, Steven
Costello, Eamon
Nguyen, Huy A.
Stamper, John
author_facet Moore, Steven
Costello, Eamon
Nguyen, Huy A.
Stamper, John
contents Evaluating multiple-choice questions (MCQs) involves either labor intensive human assessments or automated methods that prioritize readability, often overlooking deeper question design flaws. To address this issue, we introduce the Scalable Automatic Question Usability Evaluation Toolkit (SAQUET), an open-source tool that leverages the Item-Writing Flaws (IWF) rubric for a comprehensive and automated quality evaluation of MCQs. By harnessing the latest in large language models such as GPT-4, advanced word embeddings, and Transformers designed to analyze textual complexity, SAQUET effectively pinpoints and assesses a wide array of flaws in MCQs. We first demonstrate the discrepancy between commonly used automated evaluation metrics and the human assessment of MCQ quality. Then we evaluate SAQUET on a diverse dataset of MCQs across the five domains of Chemistry, Statistics, Computer Science, Humanities, and Healthcare, showing how it effectively distinguishes between flawed and flawless questions, providing a level of analysis beyond what is achievable with traditional metrics. With an accuracy rate of over 94% in detecting the presence of flaws identified by human evaluators, our findings emphasize the limitations of existing evaluation methods and showcase potential in improving the quality of educational assessments.
format Preprint
id arxiv_https___arxiv_org_abs_2405_20529
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle An Automatic Question Usability Evaluation Toolkit
Moore, Steven
Costello, Eamon
Nguyen, Huy A.
Stamper, John
Artificial Intelligence
Computation and Language
Evaluating multiple-choice questions (MCQs) involves either labor intensive human assessments or automated methods that prioritize readability, often overlooking deeper question design flaws. To address this issue, we introduce the Scalable Automatic Question Usability Evaluation Toolkit (SAQUET), an open-source tool that leverages the Item-Writing Flaws (IWF) rubric for a comprehensive and automated quality evaluation of MCQs. By harnessing the latest in large language models such as GPT-4, advanced word embeddings, and Transformers designed to analyze textual complexity, SAQUET effectively pinpoints and assesses a wide array of flaws in MCQs. We first demonstrate the discrepancy between commonly used automated evaluation metrics and the human assessment of MCQ quality. Then we evaluate SAQUET on a diverse dataset of MCQs across the five domains of Chemistry, Statistics, Computer Science, Humanities, and Healthcare, showing how it effectively distinguishes between flawed and flawless questions, providing a level of analysis beyond what is achievable with traditional metrics. With an accuracy rate of over 94% in detecting the presence of flaws identified by human evaluators, our findings emphasize the limitations of existing evaluation methods and showcase potential in improving the quality of educational assessments.
title An Automatic Question Usability Evaluation Toolkit
topic Artificial Intelligence
Computation and Language
url https://arxiv.org/abs/2405.20529