Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Vasilenko, Vladimir
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Machine Learning I.2.7
Online Access:	https://arxiv.org/abs/2604.12016
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915935329189888
author	Vasilenko, Vladimir
author_facet	Vasilenko, Vladimir
contents	Large language models map semantically related prompts to similar internal representations -- a phenomenon interpretable as attractor-like dynamics. We ask whether the identity document of a persistent cognitive agent (its cognitive_core) exhibits analogous attractor-like behavior. We present a controlled experiment on Llama 3.1 8B Instruct, comparing hidden states of an original cognitive_core (Condition A), seven paraphrases (Condition B), and seven structurally matched controls (Condition C). Mean-pooled states at layers 8, 16, and 24 show that paraphrases converge to a tighter cluster than controls (Cohen's d > 1.88, p < 10^{-27}, Bonferroni-corrected). Replication on Gemma 2 9B confirms cross-architecture generalizability. Ablations suggest the effect is primarily semantic rather than structural, and that structural completeness appears necessary to reach the attractor region. An exploratory experiment shows that reading a scientific description of the agent shifts internal state toward the attractor -- closer than a sham preprint -- distinguishing knowing about an identity from operating as that identity. These results provide representational evidence that agent identity documents induce attractor-like geometry in LLM activation space.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_12016
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Identity as Attractor: Geometric Evidence for Persistent Agent Architecture in LLM Activation Space Vasilenko, Vladimir Artificial Intelligence Machine Learning I.2.7 Large language models map semantically related prompts to similar internal representations -- a phenomenon interpretable as attractor-like dynamics. We ask whether the identity document of a persistent cognitive agent (its cognitive_core) exhibits analogous attractor-like behavior. We present a controlled experiment on Llama 3.1 8B Instruct, comparing hidden states of an original cognitive_core (Condition A), seven paraphrases (Condition B), and seven structurally matched controls (Condition C). Mean-pooled states at layers 8, 16, and 24 show that paraphrases converge to a tighter cluster than controls (Cohen's d > 1.88, p < 10^{-27}, Bonferroni-corrected). Replication on Gemma 2 9B confirms cross-architecture generalizability. Ablations suggest the effect is primarily semantic rather than structural, and that structural completeness appears necessary to reach the attractor region. An exploratory experiment shows that reading a scientific description of the agent shifts internal state toward the attractor -- closer than a sham preprint -- distinguishing knowing about an identity from operating as that identity. These results provide representational evidence that agent identity documents induce attractor-like geometry in LLM activation space.
title	Identity as Attractor: Geometric Evidence for Persistent Agent Architecture in LLM Activation Space
topic	Artificial Intelligence Machine Learning I.2.7
url	https://arxiv.org/abs/2604.12016

Similar Items