Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Musat, Tiberiu, Pimentel, Tiago, Noci, Lorenzo, Stolfo, Alessandro, Sachan, Mrinmaya, Hofmann, Thomas
Formato:	Preprint
Publicado:	2025
Materias:	Artificial Intelligence Computation and Language
Acceso en línea:	https://arxiv.org/abs/2511.01033
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866918278555762688
author	Musat, Tiberiu Pimentel, Tiago Noci, Lorenzo Stolfo, Alessandro Sachan, Mrinmaya Hofmann, Thomas
author_facet	Musat, Tiberiu Pimentel, Tiago Noci, Lorenzo Stolfo, Alessandro Sachan, Mrinmaya Hofmann, Thomas
contents	Transformers have become the dominant architecture for natural language processing. Part of their success is owed to a remarkable capability known as in-context learning (ICL): they can acquire and apply novel associations solely from their input context, without any updates to their weights. In this work, we study the emergence of induction heads, a previously identified mechanism in two-layer transformers that is particularly important for in-context learning. We uncover a relatively simple and interpretable structure of the weight matrices implementing the induction head. We theoretically explain the origin of this structure using a minimal ICL task formulation and a modified transformer architecture. We give a formal proof that the training dynamics remain constrained to a 19-dimensional subspace of the parameter space. Empirically, we validate this constraint while observing that only 3 dimensions account for the emergence of an induction head. By further studying the training dynamics inside this 3-dimensional subspace, we find that the time until the emergence of an induction head follows a tight asymptotic bound that is quadratic in the input context length.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_01033
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	On the Emergence of Induction Heads for In-Context Learning Musat, Tiberiu Pimentel, Tiago Noci, Lorenzo Stolfo, Alessandro Sachan, Mrinmaya Hofmann, Thomas Artificial Intelligence Computation and Language Transformers have become the dominant architecture for natural language processing. Part of their success is owed to a remarkable capability known as in-context learning (ICL): they can acquire and apply novel associations solely from their input context, without any updates to their weights. In this work, we study the emergence of induction heads, a previously identified mechanism in two-layer transformers that is particularly important for in-context learning. We uncover a relatively simple and interpretable structure of the weight matrices implementing the induction head. We theoretically explain the origin of this structure using a minimal ICL task formulation and a modified transformer architecture. We give a formal proof that the training dynamics remain constrained to a 19-dimensional subspace of the parameter space. Empirically, we validate this constraint while observing that only 3 dimensions account for the emergence of an induction head. By further studying the training dynamics inside this 3-dimensional subspace, we find that the time until the emergence of an induction head follows a tight asymptotic bound that is quadratic in the input context length.
title	On the Emergence of Induction Heads for In-Context Learning
topic	Artificial Intelligence Computation and Language
url	https://arxiv.org/abs/2511.01033

Ejemplares similares