Indholdsfortegnelse: :: Library Catalog

Saved in:

Bibliografiske detaljer
Hovedforfatter:	Zixi, Li
Format:	Recurso digital
Sprog:
Udgivet:	Zenodo 2025
Online adgang:	https://doi.org/10.57967/hf/7066
Tags:	Tilføj Tag Ingen Tags, Vær først til at tagge denne postø!

Indholdsfortegnelse:

We present LeftAndRight, a diagnostic framework using four algorithmic primitives (>>, <<, 1, 0) to reveal a fundamental property of transformer representations: they geometrically collapse backward operations, regardless of attention architecture. The counterintuitive discovery: We initially hypothesized that causal attention masks cause this collapse. Through systematic validation across three levels—attention patterns, to- ken embeddings, and sentence embeddings—we discovered that even bidirectional models collapse backward operations. DistilBERT, which can attend to future tokens (36.2% future attention), shows zero backward primitives (<< = 0%) at both token and sentence levels. This reveals that the collapse is not caused by attention masks, but by representation geometry itself. Our experiments on 25 boundary problems (OpenXOR, TSP, SAT) and three model architectures (MiniLM, Pythia, DistilBERT) show universal collapse (A = 1.000 across all tests). We demonstrate that learned representations encode inherent temporal directionality—possibly from positional encodings, training data ordering, or fundamental properties of sequential modeling—that prevents encoding of backward operations even when attention is bidirectional. This is not about causal attention. This is about how representations form. The 4 atoms revealed a deeper geometric truth than expected: transformers fail at backtracking not because of attention architecture, but because their representation space is geometrically unidirectional

Lignende værker