Saved in:
Bibliographic Details
Main Authors: Pope, Nicolas, Tedre, Matti
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.21631
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917232155557888
author Pope, Nicolas
Tedre, Matti
author_facet Pope, Nicolas
Tedre, Matti
contents Most classroom engagements with generative AI focus on prompting pre-trained models, leaving the role of training data and model mechanics opaque. We developed a browser-based tool that allows students to train a small transformer language model entirely on their own device, making the training process visible. In a CS1 course, 162 students completed pre- and post-test explanations of why language models sometimes produce incorrect or strange output. After a brief hands-on training activity, students' explanations shifted significantly from anthropomorphic and misconceived accounts toward data- and model-based reasoning. The results suggest that enabling learners to directly observe training can support conceptual understanding of the data-driven nature of language models and model training, even within a short intervention. For K-12 AI literacy and AI education research, the study findings suggest that enabling students to train - and not only prompt - language models can shift how they think about AI.
format Preprint
id arxiv_https___arxiv_org_abs_2601_21631
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Turning Language Model Training from Black Box into a Sandbox
Pope, Nicolas
Tedre, Matti
Computers and Society
Most classroom engagements with generative AI focus on prompting pre-trained models, leaving the role of training data and model mechanics opaque. We developed a browser-based tool that allows students to train a small transformer language model entirely on their own device, making the training process visible. In a CS1 course, 162 students completed pre- and post-test explanations of why language models sometimes produce incorrect or strange output. After a brief hands-on training activity, students' explanations shifted significantly from anthropomorphic and misconceived accounts toward data- and model-based reasoning. The results suggest that enabling learners to directly observe training can support conceptual understanding of the data-driven nature of language models and model training, even within a short intervention. For K-12 AI literacy and AI education research, the study findings suggest that enabling students to train - and not only prompt - language models can shift how they think about AI.
title Turning Language Model Training from Black Box into a Sandbox
topic Computers and Society
url https://arxiv.org/abs/2601.21631