Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ma, Qian, Xu, Ruoxiang, Cai, Yongqiang
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2511.06376
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914144199901184
author	Ma, Qian Xu, Ruoxiang Cai, Yongqiang
author_facet	Ma, Qian Xu, Ruoxiang Cai, Yongqiang
contents	Numerous studies have demonstrated that the Transformer architecture possesses the capability for in-context learning (ICL). In scenarios involving function approximation, context can serve as a control parameter for the model, endowing it with the universal approximation property (UAP). In practice, context is represented by tokens from a finite set, referred to as a vocabulary, which is the case considered in this paper, \emph{i.e.}, vocabulary in-context learning (VICL). We demonstrate that VICL in single-layer Transformers, without positional encoding, does not possess the UAP; however, it is possible to achieve the UAP when positional encoding is included. Several sufficient conditions for the positional encoding are provided. Our findings reveal the benefits of positional encoding from an approximation theory perspective in the context of ICL.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_06376
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding Ma, Qian Xu, Ruoxiang Cai, Yongqiang Machine Learning Numerous studies have demonstrated that the Transformer architecture possesses the capability for in-context learning (ICL). In scenarios involving function approximation, context can serve as a control parameter for the model, endowing it with the universal approximation property (UAP). In practice, context is represented by tokens from a finite set, referred to as a vocabulary, which is the case considered in this paper, \emph{i.e.}, vocabulary in-context learning (VICL). We demonstrate that VICL in single-layer Transformers, without positional encoding, does not possess the UAP; however, it is possible to achieve the UAP when positional encoding is included. Several sufficient conditions for the positional encoding are provided. Our findings reveal the benefits of positional encoding from an approximation theory perspective in the context of ICL.
title	Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding
topic	Machine Learning
url	https://arxiv.org/abs/2511.06376

Similar Items