Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chen, Eason, Judicke, Sophia, Beigh, Kayla, Tang, Xinyi, Wang, Isabel, Yuan, Nina, Xiao, Zimo, Li, Chuangji, Li, Shizhuo, Luttmer, Reed, Singh, Shreya, Yampolsky, Maria, Parikh, Naman, Zhao, Yvonne, Chen, Meiyi, Huang, Scarlett, Mohanty, Anishka, Johnson, Gregory, Mackey, John, Lin, Jionghao, Koedinger, Ken
Format:	Preprint
Published:	2026
Subjects:	Human-Computer Interaction Artificial Intelligence Computers and Society
Online Access:	https://arxiv.org/abs/2602.18807
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914435702980608
author	Chen, Eason Judicke, Sophia Beigh, Kayla Tang, Xinyi Wang, Isabel Yuan, Nina Xiao, Zimo Li, Chuangji Li, Shizhuo Luttmer, Reed Singh, Shreya Yampolsky, Maria Parikh, Naman Zhao, Yvonne Chen, Meiyi Huang, Scarlett Mohanty, Anishka Johnson, Gregory Mackey, John Lin, Jionghao Koedinger, Ken
author_facet	Chen, Eason Judicke, Sophia Beigh, Kayla Tang, Xinyi Wang, Isabel Yuan, Nina Xiao, Zimo Li, Chuangji Li, Shizhuo Luttmer, Reed Singh, Shreya Yampolsky, Maria Parikh, Naman Zhao, Yvonne Chen, Meiyi Huang, Scarlett Mohanty, Anishka Johnson, Gregory Mackey, John Lin, Jionghao Koedinger, Ken
contents	We evaluate GPTutor, an LLM-powered tutoring system for an undergraduate discrete mathematics course. It integrates two LLM-supported tools: a structured proof-review tool that provides embedded feedback on students' written proof attempts, and a chatbot for math questions. In a staggered-access study with 148 students, earlier access was associated with higher homework performance during the interval when only the experimental group could use the system, while we did not observe this performance increase transfer to exam scores. Usage logs show that students with lower self-efficacy and prior exam performance used both components more frequently. Session-level behavioral labels, produced by human coding and scaled using an automated classifier, characterize how students engaged with the chatbot (e.g., answer-seeking or help-seeking). In models controlling for prior performance and self-efficacy, higher chatbot usage and answer-seeking behavior were negatively associated with subsequent midterm performance, whereas proof-review usage showed no detectable independent association. Together, the findings suggest that chatbot-based support alone may not reliably support transfer to independent assessment of math proof-learning outcomes, whereas work-anchored, structured feedback appears less associated with reduced learning.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_18807
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Chat-Based Support Alone May Not Be Enough: Comparing Conversational and Embedded LLM Feedback for Mathematical Proof Learning Chen, Eason Judicke, Sophia Beigh, Kayla Tang, Xinyi Wang, Isabel Yuan, Nina Xiao, Zimo Li, Chuangji Li, Shizhuo Luttmer, Reed Singh, Shreya Yampolsky, Maria Parikh, Naman Zhao, Yvonne Chen, Meiyi Huang, Scarlett Mohanty, Anishka Johnson, Gregory Mackey, John Lin, Jionghao Koedinger, Ken Human-Computer Interaction Artificial Intelligence Computers and Society We evaluate GPTutor, an LLM-powered tutoring system for an undergraduate discrete mathematics course. It integrates two LLM-supported tools: a structured proof-review tool that provides embedded feedback on students' written proof attempts, and a chatbot for math questions. In a staggered-access study with 148 students, earlier access was associated with higher homework performance during the interval when only the experimental group could use the system, while we did not observe this performance increase transfer to exam scores. Usage logs show that students with lower self-efficacy and prior exam performance used both components more frequently. Session-level behavioral labels, produced by human coding and scaled using an automated classifier, characterize how students engaged with the chatbot (e.g., answer-seeking or help-seeking). In models controlling for prior performance and self-efficacy, higher chatbot usage and answer-seeking behavior were negatively associated with subsequent midterm performance, whereas proof-review usage showed no detectable independent association. Together, the findings suggest that chatbot-based support alone may not reliably support transfer to independent assessment of math proof-learning outcomes, whereas work-anchored, structured feedback appears less associated with reduced learning.
title	Chat-Based Support Alone May Not Be Enough: Comparing Conversational and Embedded LLM Feedback for Mathematical Proof Learning
topic	Human-Computer Interaction Artificial Intelligence Computers and Society
url	https://arxiv.org/abs/2602.18807

Similar Items