Saved in:
Bibliographic Details
Main Authors: Qiu, Yiding, Azimi, Seyed Mahdi, Lensky, Artem
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.10093
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Large programming courses struggle to provide timely, detailed feedback on student code. We developed Mark My Works, a local autograding system that combines traditional unit testing with LLM-generated explanations. The system uses role-based prompts to analyze submissions, critique code quality, and generate pedagogical feedback while maintaining transparency in its reasoning process. We piloted the system in a 191-student engineering course, comparing AI-generated assessments with human grading on 79 submissions. While AI scores showed no linear correlation with human scores (r = -0.177, p = 0.124), both systems exhibited similar left-skewed distributions, suggesting they recognize comparable quality hierarchies despite different scoring philosophies. The AI system demonstrated more conservative scoring (mean: 59.95 vs 80.53 human) but generated significantly more detailed technical feedback.