Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Pasten, Hector, Urrutia, Felipe, Jimenez, Hector, Calderon, Cristian B., Rojas, Cristóbal, Kozachinskiy, Alexander
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2505.10606
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912378963099648
author	Pasten, Hector Urrutia, Felipe Jimenez, Hector Calderon, Cristian B. Rojas, Cristóbal Kozachinskiy, Alexander
author_facet	Pasten, Hector Urrutia, Felipe Jimenez, Hector Calderon, Cristian B. Rojas, Cristóbal Kozachinskiy, Alexander
contents	Understanding how Transformers work and how they process information is key to the theoretical and empirical advancement of these machines. In this work, we demonstrate the existence of two phenomena in Transformers, namely isolation and continuity. Both of these phenomena hinder Transformers to learn even simple pattern sequences. Isolation expresses that any learnable sequence must be isolated from another learnable sequence, and hence some sequences cannot be learned by a single Transformer at the same time. Continuity entails that an attractor basin forms around a learned sequence, such that any sequence falling in that basin will collapse towards the learned sequence. Here, we mathematically prove these phenomena emerge in all Transformers that use compact positional encoding, and design rigorous experiments, demonstrating that the theoretical limitations we shed light on occur on the practical scale.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_10606
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models Pasten, Hector Urrutia, Felipe Jimenez, Hector Calderon, Cristian B. Rojas, Cristóbal Kozachinskiy, Alexander Machine Learning Artificial Intelligence Understanding how Transformers work and how they process information is key to the theoretical and empirical advancement of these machines. In this work, we demonstrate the existence of two phenomena in Transformers, namely isolation and continuity. Both of these phenomena hinder Transformers to learn even simple pattern sequences. Isolation expresses that any learnable sequence must be isolated from another learnable sequence, and hence some sequences cannot be learned by a single Transformer at the same time. Continuity entails that an attractor basin forms around a learned sequence, such that any sequence falling in that basin will collapse towards the learned sequence. Here, we mathematically prove these phenomena emerge in all Transformers that use compact positional encoding, and design rigorous experiments, demonstrating that the theoretical limitations we shed light on occur on the practical scale.
title	Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2505.10606

Similar Items