Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Ian, Guruprasad, Kapilesh, Sengupta, Raunak, Satish, Ninad, D'Antoni, Loris, Yu, Rose
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2605.21770
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910242847064064
author	Li, Ian Guruprasad, Kapilesh Sengupta, Raunak Satish, Ninad D'Antoni, Loris Yu, Rose
author_facet	Li, Ian Guruprasad, Kapilesh Sengupta, Raunak Satish, Ninad D'Antoni, Loris Yu, Rose
contents	Large language models frequently produce errors in reasoning tasks despite possessing the underlying knowledge required for correct reasoning. One possible approach to improve reasoning consistency is through activation steering. However, existing activation steering approaches apply fixed, pre-computed correction vectors, ignoring where the model currently sits along its generation trajectory; the result is indiscriminate perturbation that disrupts already-correct steps as freely as erroneous ones. We propose Manifold-Guided Attention Steering (MAGS), a trajectory-aware inference-time intervention grounded in a geometric observation: the output activations of specific attention heads diverge from a low-dimensional correctness manifold at the point of error, and this deviation compounds through subsequent steps. For each identified attention head, we learn a low-dimensional subspace from contrastive pairs of correct and incorrect traces that capture the directions along which error behavior deviates from correct behavior. During inference, we monitor each head's proximity to this manifold and apply a targeted projection correction when deviation exceeds a learned threshold, steering the attention output back toward the correct subspace before the error propagates. MAGS consistently outperforms both unsteered baselines and static steering approaches across benchmarks spanning mathematical reasoning (MATH-500, GSM8K), code generation (HumanEval, MBPP), and molecular generation (SMILES), suggesting that correctness manifolds are a general feature of LLM attention geometry.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_21770
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Manifold-Guided Attention Steering Li, Ian Guruprasad, Kapilesh Sengupta, Raunak Satish, Ninad D'Antoni, Loris Yu, Rose Machine Learning Large language models frequently produce errors in reasoning tasks despite possessing the underlying knowledge required for correct reasoning. One possible approach to improve reasoning consistency is through activation steering. However, existing activation steering approaches apply fixed, pre-computed correction vectors, ignoring where the model currently sits along its generation trajectory; the result is indiscriminate perturbation that disrupts already-correct steps as freely as erroneous ones. We propose Manifold-Guided Attention Steering (MAGS), a trajectory-aware inference-time intervention grounded in a geometric observation: the output activations of specific attention heads diverge from a low-dimensional correctness manifold at the point of error, and this deviation compounds through subsequent steps. For each identified attention head, we learn a low-dimensional subspace from contrastive pairs of correct and incorrect traces that capture the directions along which error behavior deviates from correct behavior. During inference, we monitor each head's proximity to this manifold and apply a targeted projection correction when deviation exceeds a learned threshold, steering the attention output back toward the correct subspace before the error propagates. MAGS consistently outperforms both unsteered baselines and static steering approaches across benchmarks spanning mathematical reasoning (MATH-500, GSM8K), code generation (HumanEval, MBPP), and molecular generation (SMILES), suggesting that correctness manifolds are a general feature of LLM attention geometry.
title	Manifold-Guided Attention Steering
topic	Machine Learning
url	https://arxiv.org/abs/2605.21770

Similar Items