Saved in:
Bibliographic Details
Main Authors: Fang, Yi, Wang, Wenjie, Xue, Mingfeng, Deng, Boyi, Xu, Fengli, Liu, Dayiheng, Feng, Fuli
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.03595
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918275315662848
author Fang, Yi
Wang, Wenjie
Xue, Mingfeng
Deng, Boyi
Xu, Fengli
Liu, Dayiheng
Feng, Fuli
author_facet Fang, Yi
Wang, Wenjie
Xue, Mingfeng
Deng, Boyi
Xu, Fengli
Liu, Dayiheng
Feng, Fuli
contents Large Reasoning Models (LRMs) exhibit human-like cognitive reasoning strategies (e.g. backtracking, cross-verification) during reasoning process, which improves their performance on complex tasks. Currently, reasoning strategies are autonomously selected by LRMs themselves. However, such autonomous selection often produces inefficient or even erroneous reasoning paths. To make reasoning more reliable and flexible, it is important to develop methods for controlling reasoning strategies. Existing methods struggle to control fine-grained reasoning strategies due to conceptual entanglement in LRMs' hidden states. To address this, we leverage Sparse Autoencoders (SAEs) to decompose strategy-entangled hidden states into a disentangled feature space. To identify the few strategy-specific features from the vast pool of SAE features, we propose SAE-Steering, an efficient two-stage feature identification pipeline. SAE-Steering first recalls features that amplify the logits of strategy-specific keywords, filtering out over 99\% of features, and then ranks the remaining features by their control effectiveness. Using the identified strategy-specific features as control vectors, SAE-Steering outperforms existing methods by over 15\% in control effectiveness. Furthermore, controlling reasoning strategies can redirect LRMs from erroneous paths to correct ones, achieving a 7\% absolute accuracy improvement.
format Preprint
id arxiv_https___arxiv_org_abs_2601_03595
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Controllable LLM Reasoning via Sparse Autoencoder-Based Steering
Fang, Yi
Wang, Wenjie
Xue, Mingfeng
Deng, Boyi
Xu, Fengli
Liu, Dayiheng
Feng, Fuli
Artificial Intelligence
Computation and Language
Large Reasoning Models (LRMs) exhibit human-like cognitive reasoning strategies (e.g. backtracking, cross-verification) during reasoning process, which improves their performance on complex tasks. Currently, reasoning strategies are autonomously selected by LRMs themselves. However, such autonomous selection often produces inefficient or even erroneous reasoning paths. To make reasoning more reliable and flexible, it is important to develop methods for controlling reasoning strategies. Existing methods struggle to control fine-grained reasoning strategies due to conceptual entanglement in LRMs' hidden states. To address this, we leverage Sparse Autoencoders (SAEs) to decompose strategy-entangled hidden states into a disentangled feature space. To identify the few strategy-specific features from the vast pool of SAE features, we propose SAE-Steering, an efficient two-stage feature identification pipeline. SAE-Steering first recalls features that amplify the logits of strategy-specific keywords, filtering out over 99\% of features, and then ranks the remaining features by their control effectiveness. Using the identified strategy-specific features as control vectors, SAE-Steering outperforms existing methods by over 15\% in control effectiveness. Furthermore, controlling reasoning strategies can redirect LRMs from erroneous paths to correct ones, achieving a 7\% absolute accuracy improvement.
title Controllable LLM Reasoning via Sparse Autoencoder-Based Steering
topic Artificial Intelligence
Computation and Language
url https://arxiv.org/abs/2601.03595