Saved in:
Bibliographic Details
Main Authors: Gonon, Antoine, Cordonnier, Alexandre, Boumal, Nicolas
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.07562
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912887585374208
author Gonon, Antoine
Cordonnier, Alexandre
Boumal, Nicolas
author_facet Gonon, Antoine
Cordonnier, Alexandre
Boumal, Nicolas
contents Match-and-copy is a core retrieval primitive used at inference time by large language models to retrieve a matching token from the context then copy its successor. Yet, understanding how this behavior emerges on natural data is challenging because retrieval and memorization are entangled. To disentangle the two, we introduce Gaussian Match-and-Copy (GMC), a minimalist benchmark that isolates long-range retrieval through pure second-order correlation signals. Numerical investigations show that this task retains key qualitative aspects of how Transformers develop match-and-copy circuits in practice, and separates architectures by their retrieval capabilities. We also analyze the optimization dynamics in a simplified attention setting. Although many solutions are a priori possible under a regression objective, including ones that do not implement retrieval, we identify an implicit-bias regime in which gradient descent drives the parameters to diverge while their direction aligns with the max-margin separator, yielding hard match selection. We prove this max-margin alignment for GD trajectories that reach vanishing empirical loss under explicit technical conditions.
format Preprint
id arxiv_https___arxiv_org_abs_2602_07562
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Gaussian Match-and-Copy: A Minimalist Benchmark for Studying Transformer Induction
Gonon, Antoine
Cordonnier, Alexandre
Boumal, Nicolas
Machine Learning
Artificial Intelligence
Match-and-copy is a core retrieval primitive used at inference time by large language models to retrieve a matching token from the context then copy its successor. Yet, understanding how this behavior emerges on natural data is challenging because retrieval and memorization are entangled. To disentangle the two, we introduce Gaussian Match-and-Copy (GMC), a minimalist benchmark that isolates long-range retrieval through pure second-order correlation signals. Numerical investigations show that this task retains key qualitative aspects of how Transformers develop match-and-copy circuits in practice, and separates architectures by their retrieval capabilities. We also analyze the optimization dynamics in a simplified attention setting. Although many solutions are a priori possible under a regression objective, including ones that do not implement retrieval, we identify an implicit-bias regime in which gradient descent drives the parameters to diverge while their direction aligns with the max-margin separator, yielding hard match selection. We prove this max-margin alignment for GD trajectories that reach vanishing empirical loss under explicit technical conditions.
title Gaussian Match-and-Copy: A Minimalist Benchmark for Studying Transformer Induction
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2602.07562