Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Mao, Xutao, Zhao, Jinman, Penn, Gerald, Wang, Cong
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.03354
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915985941856256
author	Mao, Xutao Zhao, Jinman Penn, Gerald Wang, Cong
author_facet	Mao, Xutao Zhao, Jinman Penn, Gerald Wang, Cong
contents	Agent memory failures are silent: an LLM-based agent can produce a fluent response even when it fails to extract, retain, or retrieve the information needed across sessions. The write-manage-read loop describes the external pipeline of these systems but leaves open which internal computations implement each stage. Tracing feature circuits across the Qwen-3 family (0.6B--14B) and two memory frameworks (mem0 and A-MEM), we report two mechanistic findings and one deliverable. First, control is detectable before content: routing circuitry is causally active at 0.6B, while content circuitry produces no detectable signal until 4B, exposing a deployment regime where small models route memory decisions before they can reliably extract or ground the underlying facts. Second, the shared hub is recruited, not created: Write and Read converge on a late-layer hub that already exists in the base model as a context-grounding substrate, and memory framing recruits a memory-specific functional direction on this substrate rather than building one of its own. Both findings transfer across mem0 and A-MEM, indicating that the underlying computations are properties of the base model rather than of any particular interface. Building on this circuit structure, we develop an unsupervised stage-level diagnostic that localizes silent failures to the responsible operation up to 76.2% accuracy, outperforming the strongest supervised baseline by 13 points. Together, these results point to circuit-level signatures as a practical handle for monitoring and structurally-guided design of agent memory.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_03354
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis Mao, Xutao Zhao, Jinman Penn, Gerald Wang, Cong Artificial Intelligence Agent memory failures are silent: an LLM-based agent can produce a fluent response even when it fails to extract, retain, or retrieve the information needed across sessions. The write-manage-read loop describes the external pipeline of these systems but leaves open which internal computations implement each stage. Tracing feature circuits across the Qwen-3 family (0.6B--14B) and two memory frameworks (mem0 and A-MEM), we report two mechanistic findings and one deliverable. First, control is detectable before content: routing circuitry is causally active at 0.6B, while content circuitry produces no detectable signal until 4B, exposing a deployment regime where small models route memory decisions before they can reliably extract or ground the underlying facts. Second, the shared hub is recruited, not created: Write and Read converge on a late-layer hub that already exists in the base model as a context-grounding substrate, and memory framing recruits a memory-specific functional direction on this substrate rather than building one of its own. Both findings transfer across mem0 and A-MEM, indicating that the underlying computations are properties of the base model rather than of any particular interface. Building on this circuit structure, we develop an unsupervised stage-level diagnostic that localizes silent failures to the responsible operation up to 76.2% accuracy, outperforming the strongest supervised baseline by 13 points. Together, these results point to circuit-level signatures as a practical handle for monitoring and structurally-guided design of agent memory.
title	What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis
topic	Artificial Intelligence
url	https://arxiv.org/abs/2605.03354

Similar Items