Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	de Chillaz, Aymeric, Sotnikova, Anna, Jermann, Patrick, Bosselut, Antoine
Formato:	Preprint
Publicado:	2025
Materias:	Computers and Society Artificial Intelligence
Acceso en línea:	https://arxiv.org/abs/2507.03013
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866915371495194624
author	de Chillaz, Aymeric Sotnikova, Anna Jermann, Patrick Bosselut, Antoine
author_facet	de Chillaz, Aymeric Sotnikova, Anna Jermann, Patrick Bosselut, Antoine
contents	Generative AI systems have rapidly advanced, with multimodal input capabilities enabling reasoning beyond text-based tasks. In education, these advancements could influence assessment design and question answering, presenting both opportunities and challenges. To investigate these effects, we introduce a high-quality dataset of 201 university-level STEM questions, manually annotated with features such as image type, role, problem complexity, and question format. Our study analyzes how these features affect generative AI performance compared to students. We evaluate four model families with five prompting strategies, comparing results to the average of 546 student responses per question. Although the best model correctly answers on average 58.5 % of the questions using majority vote aggregation, human participants consistently outperform AI on questions involving visual components. Interestingly, human performance remains stable across question features but varies by subject, whereas AI performance is susceptible to both subject matter and question features. Finally, we provide actionable insights for educators, demonstrating how question design can enhance academic integrity by leveraging features that challenge current AI systems without increasing the cognitive burden for students.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_03013
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Challenges for AI in Multimodal STEM Assessments: a Human-AI Comparison de Chillaz, Aymeric Sotnikova, Anna Jermann, Patrick Bosselut, Antoine Computers and Society Artificial Intelligence Generative AI systems have rapidly advanced, with multimodal input capabilities enabling reasoning beyond text-based tasks. In education, these advancements could influence assessment design and question answering, presenting both opportunities and challenges. To investigate these effects, we introduce a high-quality dataset of 201 university-level STEM questions, manually annotated with features such as image type, role, problem complexity, and question format. Our study analyzes how these features affect generative AI performance compared to students. We evaluate four model families with five prompting strategies, comparing results to the average of 546 student responses per question. Although the best model correctly answers on average 58.5 % of the questions using majority vote aggregation, human participants consistently outperform AI on questions involving visual components. Interestingly, human performance remains stable across question features but varies by subject, whereas AI performance is susceptible to both subject matter and question features. Finally, we provide actionable insights for educators, demonstrating how question design can enhance academic integrity by leveraging features that challenge current AI systems without increasing the cognitive burden for students.
title	Challenges for AI in Multimodal STEM Assessments: a Human-AI Comparison
topic	Computers and Society Artificial Intelligence
url	https://arxiv.org/abs/2507.03013

Ejemplares similares