Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ruscio, Valeria, Nanni, Umberto, Silvestri, Fabrizio
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2508.02546
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908477975166976
author	Ruscio, Valeria Nanni, Umberto Silvestri, Fabrizio
author_facet	Ruscio, Valeria Nanni, Umberto Silvestri, Fabrizio
contents	Attention sink (AS) is a consistent pattern in transformer attention maps where certain tokens (often special tokens or positional anchors) disproportionately attract attention from other tokens. We show that in transformers, AS is not an architectural artifact, but it is the manifestation of a fundamental geometric principle: the establishment of reference frames that anchor representational spaces. We analyze several architectures and identify three distinct reference frame types, centralized, distributed, and bidirectional, that correlate with the attention sink phenomenon. We show that they emerge during the earliest stages of training as optimal solutions to the problem of establishing stable coordinate systems in high-dimensional spaces. We show the influence of architecture components, particularly position encoding implementations, on the specific type of reference frame. This perspective transforms our understanding of transformer attention mechanisms and provides insights for both architecture design and the relationship with AS.
format	Preprint
id	arxiv_https___arxiv_org_abs_2508_02546
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	What are you sinking? A geometric approach on attention sink Ruscio, Valeria Nanni, Umberto Silvestri, Fabrizio Machine Learning Artificial Intelligence Computation and Language Attention sink (AS) is a consistent pattern in transformer attention maps where certain tokens (often special tokens or positional anchors) disproportionately attract attention from other tokens. We show that in transformers, AS is not an architectural artifact, but it is the manifestation of a fundamental geometric principle: the establishment of reference frames that anchor representational spaces. We analyze several architectures and identify three distinct reference frame types, centralized, distributed, and bidirectional, that correlate with the attention sink phenomenon. We show that they emerge during the earliest stages of training as optimal solutions to the problem of establishing stable coordinate systems in high-dimensional spaces. We show the influence of architecture components, particularly position encoding implementations, on the specific type of reference frame. This perspective transforms our understanding of transformer attention mechanisms and provides insights for both architecture design and the relationship with AS.
title	What are you sinking? A geometric approach on attention sink
topic	Machine Learning Artificial Intelligence Computation and Language
url	https://arxiv.org/abs/2508.02546

Similar Items