Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Mehta, Sushant, Gupta, Ishan
Formato:	Preprint
Publicado:	2025
Materias:	Machine Learning Artificial Intelligence
Acceso en línea:	https://arxiv.org/abs/2511.06232
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866912696061919232
author	Mehta, Sushant Gupta, Ishan
author_facet	Mehta, Sushant Gupta, Ishan
contents	In-context learning (ICL) enables large language models to adapt to new tasks from demonstrations without parameter updates. Despite extensive empirical studies, a principled understanding of ICL emergence at scale remains more elusive. We present a unified theoretical framework connecting scaling laws to ICL emergence in transformers. Our analysis establishes that ICL performance follows power-law relationships with model depth $L$, width $d$, context length $k$, and training data $D$, with exponents determined by task structure. We show that under specific conditions, transformers implement gradient-based metalearning in their forward pass, with an effective learning rate $η_{\text{eff}} = Θ(1/\sqrt{Ld})$. We demonstrate sharp phase transitions at critical scales and derive optimal depth-width allocations favoring $L^* \propto N^{2/3}$, $d^* \propto N^{1/3}$ for the fixed parameter budget $N = Ld$. Systematic experiments on synthetic tasks validate our predictions, with measured scaling exponents closely matching theory. This work provides both necessary and sufficient conditions for the emergence of ICLs and establishes fundamental computational limits on what transformers can learn in-context.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_06232
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Scaling Laws and In-Context Learning: A Unified Theoretical Framework Mehta, Sushant Gupta, Ishan Machine Learning Artificial Intelligence In-context learning (ICL) enables large language models to adapt to new tasks from demonstrations without parameter updates. Despite extensive empirical studies, a principled understanding of ICL emergence at scale remains more elusive. We present a unified theoretical framework connecting scaling laws to ICL emergence in transformers. Our analysis establishes that ICL performance follows power-law relationships with model depth $L$, width $d$, context length $k$, and training data $D$, with exponents determined by task structure. We show that under specific conditions, transformers implement gradient-based metalearning in their forward pass, with an effective learning rate $η_{\text{eff}} = Θ(1/\sqrt{Ld})$. We demonstrate sharp phase transitions at critical scales and derive optimal depth-width allocations favoring $L^* \propto N^{2/3}$, $d^* \propto N^{1/3}$ for the fixed parameter budget $N = Ld$. Systematic experiments on synthetic tasks validate our predictions, with measured scaling exponents closely matching theory. This work provides both necessary and sufficient conditions for the emergence of ICLs and establishes fundamental computational limits on what transformers can learn in-context.
title	Scaling Laws and In-Context Learning: A Unified Theoretical Framework
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2511.06232

Ejemplares similares