Saved in:
Bibliographic Details
Main Authors: Feng, Zijian, Li, Tianjiao, Zhu, Zixiao, Zhou, Hanzhang, Qian, Junlang, Zhang, Li, Chua, Jia Jim Deryl, Mak, Lee Onn, Ng, Gee Wah, Mao, Kezhi
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.04428
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908812744589312
author Feng, Zijian
Li, Tianjiao
Zhu, Zixiao
Zhou, Hanzhang
Qian, Junlang
Zhang, Li
Chua, Jia Jim Deryl
Mak, Lee Onn
Ng, Gee Wah
Mao, Kezhi
author_facet Feng, Zijian
Li, Tianjiao
Zhu, Zixiao
Zhou, Hanzhang
Qian, Junlang
Zhang, Li
Chua, Jia Jim Deryl
Mak, Lee Onn
Ng, Gee Wah
Mao, Kezhi
contents Activation steering has emerged as a cost-effective paradigm for modifying large language model (LLM) behaviors. Existing methods typically intervene at the block level, steering the bundled activations of selected attention heads, feedforward networks, or residual streams. However, we reveal that block-level activations are inherently heterogeneous, entangling beneficial, irrelevant, and harmful features, thereby rendering block-level steering coarse, inefficient, and intrusive. To investigate the root cause, we decompose block activations into fine-grained atomic unit (AU)-level activations, where each AU-level activation corresponds to a single dimension of the block activation, and each AU denotes a slice of the block weight matrix. Steering an AU-level activation is thus equivalent to steering its associated AU. Our theoretical and empirical analysis show that heterogeneity arises because different AUs or dimensions control distinct token distributions in LLM outputs. Hence, block-level steering inevitably moves helpful and harmful token directions together, which reduces efficiency. Restricting intervention to beneficial AUs yields more precise and effective steering. Building on this insight, we propose AUSteer, a simple and efficient method that operates at a finer granularity of the AU level. AUSteer first identifies discriminative AUs globally by computing activation momenta on contrastive samples. It then assigns adaptive steering strengths tailored to diverse inputs and selected AU activations. Comprehensive experiments on multiple LLMs and tasks show that AUSteer consistently surpasses advanced baselines while steering considerably fewer activations, demonstrating that steering less achieves more.
format Preprint
id arxiv_https___arxiv_org_abs_2602_04428
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Fine-Grained Activation Steering: Steering Less, Achieving More
Feng, Zijian
Li, Tianjiao
Zhu, Zixiao
Zhou, Hanzhang
Qian, Junlang
Zhang, Li
Chua, Jia Jim Deryl
Mak, Lee Onn
Ng, Gee Wah
Mao, Kezhi
Computation and Language
Activation steering has emerged as a cost-effective paradigm for modifying large language model (LLM) behaviors. Existing methods typically intervene at the block level, steering the bundled activations of selected attention heads, feedforward networks, or residual streams. However, we reveal that block-level activations are inherently heterogeneous, entangling beneficial, irrelevant, and harmful features, thereby rendering block-level steering coarse, inefficient, and intrusive. To investigate the root cause, we decompose block activations into fine-grained atomic unit (AU)-level activations, where each AU-level activation corresponds to a single dimension of the block activation, and each AU denotes a slice of the block weight matrix. Steering an AU-level activation is thus equivalent to steering its associated AU. Our theoretical and empirical analysis show that heterogeneity arises because different AUs or dimensions control distinct token distributions in LLM outputs. Hence, block-level steering inevitably moves helpful and harmful token directions together, which reduces efficiency. Restricting intervention to beneficial AUs yields more precise and effective steering. Building on this insight, we propose AUSteer, a simple and efficient method that operates at a finer granularity of the AU level. AUSteer first identifies discriminative AUs globally by computing activation momenta on contrastive samples. It then assigns adaptive steering strengths tailored to diverse inputs and selected AU activations. Comprehensive experiments on multiple LLMs and tasks show that AUSteer consistently surpasses advanced baselines while steering considerably fewer activations, demonstrating that steering less achieves more.
title Fine-Grained Activation Steering: Steering Less, Achieving More
topic Computation and Language
url https://arxiv.org/abs/2602.04428