Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.18807 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866914435702980608 |
|---|---|
| author | Chen, Eason Judicke, Sophia Beigh, Kayla Tang, Xinyi Wang, Isabel Yuan, Nina Xiao, Zimo Li, Chuangji Li, Shizhuo Luttmer, Reed Singh, Shreya Yampolsky, Maria Parikh, Naman Zhao, Yvonne Chen, Meiyi Huang, Scarlett Mohanty, Anishka Johnson, Gregory Mackey, John Lin, Jionghao Koedinger, Ken |
| author_facet | Chen, Eason Judicke, Sophia Beigh, Kayla Tang, Xinyi Wang, Isabel Yuan, Nina Xiao, Zimo Li, Chuangji Li, Shizhuo Luttmer, Reed Singh, Shreya Yampolsky, Maria Parikh, Naman Zhao, Yvonne Chen, Meiyi Huang, Scarlett Mohanty, Anishka Johnson, Gregory Mackey, John Lin, Jionghao Koedinger, Ken |
| contents | We evaluate GPTutor, an LLM-powered tutoring system for an undergraduate discrete mathematics course. It integrates two LLM-supported tools: a structured proof-review tool that provides embedded feedback on students' written proof attempts, and a chatbot for math questions. In a staggered-access study with 148 students, earlier access was associated with higher homework performance during the interval when only the experimental group could use the system, while we did not observe this performance increase transfer to exam scores. Usage logs show that students with lower self-efficacy and prior exam performance used both components more frequently. Session-level behavioral labels, produced by human coding and scaled using an automated classifier, characterize how students engaged with the chatbot (e.g., answer-seeking or help-seeking). In models controlling for prior performance and self-efficacy, higher chatbot usage and answer-seeking behavior were negatively associated with subsequent midterm performance, whereas proof-review usage showed no detectable independent association. Together, the findings suggest that chatbot-based support alone may not reliably support transfer to independent assessment of math proof-learning outcomes, whereas work-anchored, structured feedback appears less associated with reduced learning. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2602_18807 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Chat-Based Support Alone May Not Be Enough: Comparing Conversational and Embedded LLM Feedback for Mathematical Proof Learning Chen, Eason Judicke, Sophia Beigh, Kayla Tang, Xinyi Wang, Isabel Yuan, Nina Xiao, Zimo Li, Chuangji Li, Shizhuo Luttmer, Reed Singh, Shreya Yampolsky, Maria Parikh, Naman Zhao, Yvonne Chen, Meiyi Huang, Scarlett Mohanty, Anishka Johnson, Gregory Mackey, John Lin, Jionghao Koedinger, Ken Human-Computer Interaction Artificial Intelligence Computers and Society We evaluate GPTutor, an LLM-powered tutoring system for an undergraduate discrete mathematics course. It integrates two LLM-supported tools: a structured proof-review tool that provides embedded feedback on students' written proof attempts, and a chatbot for math questions. In a staggered-access study with 148 students, earlier access was associated with higher homework performance during the interval when only the experimental group could use the system, while we did not observe this performance increase transfer to exam scores. Usage logs show that students with lower self-efficacy and prior exam performance used both components more frequently. Session-level behavioral labels, produced by human coding and scaled using an automated classifier, characterize how students engaged with the chatbot (e.g., answer-seeking or help-seeking). In models controlling for prior performance and self-efficacy, higher chatbot usage and answer-seeking behavior were negatively associated with subsequent midterm performance, whereas proof-review usage showed no detectable independent association. Together, the findings suggest that chatbot-based support alone may not reliably support transfer to independent assessment of math proof-learning outcomes, whereas work-anchored, structured feedback appears less associated with reduced learning. |
| title | Chat-Based Support Alone May Not Be Enough: Comparing Conversational and Embedded LLM Feedback for Mathematical Proof Learning |
| topic | Human-Computer Interaction Artificial Intelligence Computers and Society |
| url | https://arxiv.org/abs/2602.18807 |