Saved in:
Bibliographic Details
Main Authors: Belitsky, Max, Kopiczko, Dawid J., Dorkenwald, Michael, Mirza, M. Jehanzeb, Glass, James R., Snoek, Cees G. M., Asano, Yuki M.
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.08799
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918148578476032
author Belitsky, Max
Kopiczko, Dawid J.
Dorkenwald, Michael
Mirza, M. Jehanzeb
Glass, James R.
Snoek, Cees G. M.
Asano, Yuki M.
author_facet Belitsky, Max
Kopiczko, Dawid J.
Dorkenwald, Michael
Mirza, M. Jehanzeb
Glass, James R.
Snoek, Cees G. M.
Asano, Yuki M.
contents We propose cache steering, a lightweight method for implicit steering of language models via a one-shot intervention applied directly to the key-value cache. To validate its effectiveness, we apply cache steering to induce chain-of-thought reasoning in small language models. Our approach constructs steering vectors from reasoning traces, obtained either from teacher models (e.g., GPT-4o) or existing human annotations, that shift model behavior toward more explicit, multi-step reasoning without fine-tuning or prompt modifications. Experimental evaluations on diverse reasoning benchmarks demonstrate that cache steering improves both the qualitative structure of model reasoning and quantitative task performance. Additional experiments show that the method also scales to larger models and yields further gains on challenging datasets such as GPQA and MATH. Compared to prior activation steering techniques that require continuous interventions, our one-shot cache steering offers substantial advantages in terms of inference latency, hyperparameter stability, and ease of integration with existing inference APIs. Beyond mere reasoning induction, we show that cache steering enables controllable transfer of reasoning styles (e.g., stepwise, causal, analogical), making it a practical tool for behavior-level guidance of language models.
format Preprint
id arxiv_https___arxiv_org_abs_2507_08799
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle KV Cache Steering for Controlling Frozen LLMs
Belitsky, Max
Kopiczko, Dawid J.
Dorkenwald, Michael
Mirza, M. Jehanzeb
Glass, James R.
Snoek, Cees G. M.
Asano, Yuki M.
Computation and Language
Artificial Intelligence
We propose cache steering, a lightweight method for implicit steering of language models via a one-shot intervention applied directly to the key-value cache. To validate its effectiveness, we apply cache steering to induce chain-of-thought reasoning in small language models. Our approach constructs steering vectors from reasoning traces, obtained either from teacher models (e.g., GPT-4o) or existing human annotations, that shift model behavior toward more explicit, multi-step reasoning without fine-tuning or prompt modifications. Experimental evaluations on diverse reasoning benchmarks demonstrate that cache steering improves both the qualitative structure of model reasoning and quantitative task performance. Additional experiments show that the method also scales to larger models and yields further gains on challenging datasets such as GPQA and MATH. Compared to prior activation steering techniques that require continuous interventions, our one-shot cache steering offers substantial advantages in terms of inference latency, hyperparameter stability, and ease of integration with existing inference APIs. Beyond mere reasoning induction, we show that cache steering enables controllable transfer of reasoning styles (e.g., stepwise, causal, analogical), making it a practical tool for behavior-level guidance of language models.
title KV Cache Steering for Controlling Frozen LLMs
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2507.08799