Saved in:
Bibliographic Details
Main Authors: Ma, Qian, Xu, Ruoxiang, Cai, Yongqiang
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2511.06376
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914144199901184
author Ma, Qian
Xu, Ruoxiang
Cai, Yongqiang
author_facet Ma, Qian
Xu, Ruoxiang
Cai, Yongqiang
contents Numerous studies have demonstrated that the Transformer architecture possesses the capability for in-context learning (ICL). In scenarios involving function approximation, context can serve as a control parameter for the model, endowing it with the universal approximation property (UAP). In practice, context is represented by tokens from a finite set, referred to as a vocabulary, which is the case considered in this paper, \emph{i.e.}, vocabulary in-context learning (VICL). We demonstrate that VICL in single-layer Transformers, without positional encoding, does not possess the UAP; however, it is possible to achieve the UAP when positional encoding is included. Several sufficient conditions for the positional encoding are provided. Our findings reveal the benefits of positional encoding from an approximation theory perspective in the context of ICL.
format Preprint
id arxiv_https___arxiv_org_abs_2511_06376
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding
Ma, Qian
Xu, Ruoxiang
Cai, Yongqiang
Machine Learning
Numerous studies have demonstrated that the Transformer architecture possesses the capability for in-context learning (ICL). In scenarios involving function approximation, context can serve as a control parameter for the model, endowing it with the universal approximation property (UAP). In practice, context is represented by tokens from a finite set, referred to as a vocabulary, which is the case considered in this paper, \emph{i.e.}, vocabulary in-context learning (VICL). We demonstrate that VICL in single-layer Transformers, without positional encoding, does not possess the UAP; however, it is possible to achieve the UAP when positional encoding is included. Several sufficient conditions for the positional encoding are provided. Our findings reveal the benefits of positional encoding from an approximation theory perspective in the context of ICL.
title Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding
topic Machine Learning
url https://arxiv.org/abs/2511.06376