Saved in:
Bibliographic Details
Main Author: Sun, Simeng
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.25073
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • We study Transformers on the task \emph{program trace generation} (PTG), where models produce step-by-step execution traces for synthetic programs. Unlike existing algorithmic problems, PTG externalizes reasoning through long traces where each step is trivial. We train small Transformers with diverse modifications, including alternative position encodings, softmax replacements, hybrid model, and short convolutions. While these models achieve strong in-distribution accuracy, they exhibit systematic failures when generalizing to various factors (e.g., program length, trace steps), though some designs significantly improve generalization.