Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kariyappa, Sanjay, Suh, G. Edward
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2602.22603
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918360747343872
author	Kariyappa, Sanjay Suh, G. Edward
author_facet	Kariyappa, Sanjay Suh, G. Edward
contents	Long-running agentic tasks, such as deep research, require multi-hop reasoning over information distributed across multiple webpages and documents. In such tasks, the LLM context is dominated by tokens from external retrieval, causing memory usage to grow rapidly and limiting decode performance. While several KV cache compression techniques exist for long-context inputs, we find that existing heuristics fail to support multi-step reasoning models effectively. We address this challenge with SideQuest -- a novel approach that leverages the Large Reasoning Model (LRM) itself to perform KV cache compression by reasoning about the usefulness of tokens in its context. To prevent the tokens associated with this management process from polluting the model's memory, we frame KV cache compression as an auxiliary task executed in parallel to the main reasoning task. Our evaluations, using a model trained with just 215 samples, show that SideQuest reduces peak token usage by up to 65% on agentic tasks with minimal degradation in accuracy, outperforming heuristic-based KV cache compression techniques.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_22603
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning Kariyappa, Sanjay Suh, G. Edward Artificial Intelligence Machine Learning Long-running agentic tasks, such as deep research, require multi-hop reasoning over information distributed across multiple webpages and documents. In such tasks, the LLM context is dominated by tokens from external retrieval, causing memory usage to grow rapidly and limiting decode performance. While several KV cache compression techniques exist for long-context inputs, we find that existing heuristics fail to support multi-step reasoning models effectively. We address this challenge with SideQuest -- a novel approach that leverages the Large Reasoning Model (LRM) itself to perform KV cache compression by reasoning about the usefulness of tokens in its context. To prevent the tokens associated with this management process from polluting the model's memory, we frame KV cache compression as an auxiliary task executed in parallel to the main reasoning task. Our evaluations, using a model trained with just 215 samples, show that SideQuest reduces peak token usage by up to 65% on agentic tasks with minimal degradation in accuracy, outperforming heuristic-based KV cache compression techniques.
title	SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning
topic	Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2602.22603

Similar Items