Saved in:
Bibliographic Details
Main Authors: Pasten, Hector, Urrutia, Felipe, Jimenez, Hector, Calderon, Cristian B., Rojas, Cristóbal, Kozachinskiy, Alexander
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2505.10606
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912378963099648
author Pasten, Hector
Urrutia, Felipe
Jimenez, Hector
Calderon, Cristian B.
Rojas, Cristóbal
Kozachinskiy, Alexander
author_facet Pasten, Hector
Urrutia, Felipe
Jimenez, Hector
Calderon, Cristian B.
Rojas, Cristóbal
Kozachinskiy, Alexander
contents Understanding how Transformers work and how they process information is key to the theoretical and empirical advancement of these machines. In this work, we demonstrate the existence of two phenomena in Transformers, namely isolation and continuity. Both of these phenomena hinder Transformers to learn even simple pattern sequences. Isolation expresses that any learnable sequence must be isolated from another learnable sequence, and hence some sequences cannot be learned by a single Transformer at the same time. Continuity entails that an attractor basin forms around a learned sequence, such that any sequence falling in that basin will collapse towards the learned sequence. Here, we mathematically prove these phenomena emerge in all Transformers that use compact positional encoding, and design rigorous experiments, demonstrating that the theoretical limitations we shed light on occur on the practical scale.
format Preprint
id arxiv_https___arxiv_org_abs_2505_10606
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models
Pasten, Hector
Urrutia, Felipe
Jimenez, Hector
Calderon, Cristian B.
Rojas, Cristóbal
Kozachinskiy, Alexander
Machine Learning
Artificial Intelligence
Understanding how Transformers work and how they process information is key to the theoretical and empirical advancement of these machines. In this work, we demonstrate the existence of two phenomena in Transformers, namely isolation and continuity. Both of these phenomena hinder Transformers to learn even simple pattern sequences. Isolation expresses that any learnable sequence must be isolated from another learnable sequence, and hence some sequences cannot be learned by a single Transformer at the same time. Continuity entails that an attractor basin forms around a learned sequence, such that any sequence falling in that basin will collapse towards the learned sequence. Here, we mathematically prove these phenomena emerge in all Transformers that use compact positional encoding, and design rigorous experiments, demonstrating that the theoretical limitations we shed light on occur on the practical scale.
title Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2505.10606