Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.10606 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866912378963099648 |
|---|---|
| author | Pasten, Hector Urrutia, Felipe Jimenez, Hector Calderon, Cristian B. Rojas, Cristóbal Kozachinskiy, Alexander |
| author_facet | Pasten, Hector Urrutia, Felipe Jimenez, Hector Calderon, Cristian B. Rojas, Cristóbal Kozachinskiy, Alexander |
| contents | Understanding how Transformers work and how they process information is key to the theoretical and empirical advancement of these machines. In this work, we demonstrate the existence of two phenomena in Transformers, namely isolation and continuity. Both of these phenomena hinder Transformers to learn even simple pattern sequences. Isolation expresses that any learnable sequence must be isolated from another learnable sequence, and hence some sequences cannot be learned by a single Transformer at the same time. Continuity entails that an attractor basin forms around a learned sequence, such that any sequence falling in that basin will collapse towards the learned sequence. Here, we mathematically prove these phenomena emerge in all Transformers that use compact positional encoding, and design rigorous experiments, demonstrating that the theoretical limitations we shed light on occur on the practical scale. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2505_10606 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models Pasten, Hector Urrutia, Felipe Jimenez, Hector Calderon, Cristian B. Rojas, Cristóbal Kozachinskiy, Alexander Machine Learning Artificial Intelligence Understanding how Transformers work and how they process information is key to the theoretical and empirical advancement of these machines. In this work, we demonstrate the existence of two phenomena in Transformers, namely isolation and continuity. Both of these phenomena hinder Transformers to learn even simple pattern sequences. Isolation expresses that any learnable sequence must be isolated from another learnable sequence, and hence some sequences cannot be learned by a single Transformer at the same time. Continuity entails that an attractor basin forms around a learned sequence, such that any sequence falling in that basin will collapse towards the learned sequence. Here, we mathematically prove these phenomena emerge in all Transformers that use compact positional encoding, and design rigorous experiments, demonstrating that the theoretical limitations we shed light on occur on the practical scale. |
| title | Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models |
| topic | Machine Learning Artificial Intelligence |
| url | https://arxiv.org/abs/2505.10606 |