MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Yuan, Boqin, Su, Yue, Yao, Kun
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2603.02473
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866911586410561536
author	Yuan, Boqin Su, Yue Yao, Kun
author_facet	Yuan, Boqin Su, Yue Yao, Kun
contents	Memory-augmented LLM agents store and retrieve information from prior interactions, yet the relative importance of how memories are written versus how they are retrieved remains unclear. We introduce a diagnostic framework that analyzes how performance differences manifest across write strategies, retrieval methods, and memory utilization behavior, and apply it to a 3x3 study crossing three write strategies (raw chunks, Mem0-style fact extraction, MemGPT-style summarization) with three retrieval methods (cosine, BM25, hybrid reranking). On LoCoMo, retrieval method is the dominant factor: average accuracy spans 20 points across retrieval methods (57.1% to 77.2%) but only 3-8 points across write strategies. Raw chunked storage, which requires zero LLM calls, matches or outperforms expensive lossy alternatives, suggesting that current memory pipelines may discard useful context that downstream retrieval mechanisms fail to compensate for. Failure analysis shows that performance breakdowns most often manifest at the retrieval stage rather than at utilization. We argue that, under current retrieval practices, improving retrieval quality yields larger gains than increasing write-time sophistication. Code is publicly available at https://github.com/boqiny/memory-probe.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_02473
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory Yuan, Boqin Su, Yue Yao, Kun Artificial Intelligence Memory-augmented LLM agents store and retrieve information from prior interactions, yet the relative importance of how memories are written versus how they are retrieved remains unclear. We introduce a diagnostic framework that analyzes how performance differences manifest across write strategies, retrieval methods, and memory utilization behavior, and apply it to a 3x3 study crossing three write strategies (raw chunks, Mem0-style fact extraction, MemGPT-style summarization) with three retrieval methods (cosine, BM25, hybrid reranking). On LoCoMo, retrieval method is the dominant factor: average accuracy spans 20 points across retrieval methods (57.1% to 77.2%) but only 3-8 points across write strategies. Raw chunked storage, which requires zero LLM calls, matches or outperforms expensive lossy alternatives, suggesting that current memory pipelines may discard useful context that downstream retrieval mechanisms fail to compensate for. Failure analysis shows that performance breakdowns most often manifest at the retrieval stage rather than at utilization. We argue that, under current retrieval practices, improving retrieval quality yields larger gains than increasing write-time sophistication. Code is publicly available at https://github.com/boqiny/memory-probe.
title	Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory
topic	Artificial Intelligence
url	https://arxiv.org/abs/2603.02473

Documenti analoghi