Saved in:
Bibliographic Details
Main Authors: Mao, Xutao, Zhao, Jinman, Penn, Gerald, Wang, Cong
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.03354
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915985941856256
author Mao, Xutao
Zhao, Jinman
Penn, Gerald
Wang, Cong
author_facet Mao, Xutao
Zhao, Jinman
Penn, Gerald
Wang, Cong
contents Agent memory failures are silent: an LLM-based agent can produce a fluent response even when it fails to extract, retain, or retrieve the information needed across sessions. The write-manage-read loop describes the external pipeline of these systems but leaves open which internal computations implement each stage. Tracing feature circuits across the Qwen-3 family (0.6B--14B) and two memory frameworks (mem0 and A-MEM), we report two mechanistic findings and one deliverable. First, control is detectable before content: routing circuitry is causally active at 0.6B, while content circuitry produces no detectable signal until 4B, exposing a deployment regime where small models route memory decisions before they can reliably extract or ground the underlying facts. Second, the shared hub is recruited, not created: Write and Read converge on a late-layer hub that already exists in the base model as a context-grounding substrate, and memory framing recruits a memory-specific functional direction on this substrate rather than building one of its own. Both findings transfer across mem0 and A-MEM, indicating that the underlying computations are properties of the base model rather than of any particular interface. Building on this circuit structure, we develop an unsupervised stage-level diagnostic that localizes silent failures to the responsible operation up to 76.2% accuracy, outperforming the strongest supervised baseline by 13 points. Together, these results point to circuit-level signatures as a practical handle for monitoring and structurally-guided design of agent memory.
format Preprint
id arxiv_https___arxiv_org_abs_2605_03354
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis
Mao, Xutao
Zhao, Jinman
Penn, Gerald
Wang, Cong
Artificial Intelligence
Agent memory failures are silent: an LLM-based agent can produce a fluent response even when it fails to extract, retain, or retrieve the information needed across sessions. The write-manage-read loop describes the external pipeline of these systems but leaves open which internal computations implement each stage. Tracing feature circuits across the Qwen-3 family (0.6B--14B) and two memory frameworks (mem0 and A-MEM), we report two mechanistic findings and one deliverable. First, control is detectable before content: routing circuitry is causally active at 0.6B, while content circuitry produces no detectable signal until 4B, exposing a deployment regime where small models route memory decisions before they can reliably extract or ground the underlying facts. Second, the shared hub is recruited, not created: Write and Read converge on a late-layer hub that already exists in the base model as a context-grounding substrate, and memory framing recruits a memory-specific functional direction on this substrate rather than building one of its own. Both findings transfer across mem0 and A-MEM, indicating that the underlying computations are properties of the base model rather than of any particular interface. Building on this circuit structure, we develop an unsupervised stage-level diagnostic that localizes silent failures to the responsible operation up to 76.2% accuracy, outperforming the strongest supervised baseline by 13 points. Together, these results point to circuit-level signatures as a practical handle for monitoring and structurally-guided design of agent memory.
title What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis
topic Artificial Intelligence
url https://arxiv.org/abs/2605.03354