Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Belitsky, Max, Kopiczko, Dawid J., Dorkenwald, Michael, Mirza, M. Jehanzeb, Glass, James R., Snoek, Cees G. M., Asano, Yuki M.
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2507.08799
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918148578476032
author	Belitsky, Max Kopiczko, Dawid J. Dorkenwald, Michael Mirza, M. Jehanzeb Glass, James R. Snoek, Cees G. M. Asano, Yuki M.
author_facet	Belitsky, Max Kopiczko, Dawid J. Dorkenwald, Michael Mirza, M. Jehanzeb Glass, James R. Snoek, Cees G. M. Asano, Yuki M.
contents	We propose cache steering, a lightweight method for implicit steering of language models via a one-shot intervention applied directly to the key-value cache. To validate its effectiveness, we apply cache steering to induce chain-of-thought reasoning in small language models. Our approach constructs steering vectors from reasoning traces, obtained either from teacher models (e.g., GPT-4o) or existing human annotations, that shift model behavior toward more explicit, multi-step reasoning without fine-tuning or prompt modifications. Experimental evaluations on diverse reasoning benchmarks demonstrate that cache steering improves both the qualitative structure of model reasoning and quantitative task performance. Additional experiments show that the method also scales to larger models and yields further gains on challenging datasets such as GPQA and MATH. Compared to prior activation steering techniques that require continuous interventions, our one-shot cache steering offers substantial advantages in terms of inference latency, hyperparameter stability, and ease of integration with existing inference APIs. Beyond mere reasoning induction, we show that cache steering enables controllable transfer of reasoning styles (e.g., stepwise, causal, analogical), making it a practical tool for behavior-level guidance of language models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_08799
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	KV Cache Steering for Controlling Frozen LLMs Belitsky, Max Kopiczko, Dawid J. Dorkenwald, Michael Mirza, M. Jehanzeb Glass, James R. Snoek, Cees G. M. Asano, Yuki M. Computation and Language Artificial Intelligence We propose cache steering, a lightweight method for implicit steering of language models via a one-shot intervention applied directly to the key-value cache. To validate its effectiveness, we apply cache steering to induce chain-of-thought reasoning in small language models. Our approach constructs steering vectors from reasoning traces, obtained either from teacher models (e.g., GPT-4o) or existing human annotations, that shift model behavior toward more explicit, multi-step reasoning without fine-tuning or prompt modifications. Experimental evaluations on diverse reasoning benchmarks demonstrate that cache steering improves both the qualitative structure of model reasoning and quantitative task performance. Additional experiments show that the method also scales to larger models and yields further gains on challenging datasets such as GPQA and MATH. Compared to prior activation steering techniques that require continuous interventions, our one-shot cache steering offers substantial advantages in terms of inference latency, hyperparameter stability, and ease of integration with existing inference APIs. Beyond mere reasoning induction, we show that cache steering enables controllable transfer of reasoning styles (e.g., stepwise, causal, analogical), making it a practical tool for behavior-level guidance of language models.
title	KV Cache Steering for Controlling Frozen LLMs
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2507.08799

Similar Items