Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Fang, Yi, Wang, Wenjie, Xue, Mingfeng, Deng, Boyi, Xu, Fengli, Liu, Dayiheng, Feng, Fuli
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2601.03595
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918275315662848
author	Fang, Yi Wang, Wenjie Xue, Mingfeng Deng, Boyi Xu, Fengli Liu, Dayiheng Feng, Fuli
author_facet	Fang, Yi Wang, Wenjie Xue, Mingfeng Deng, Boyi Xu, Fengli Liu, Dayiheng Feng, Fuli
contents	Large Reasoning Models (LRMs) exhibit human-like cognitive reasoning strategies (e.g. backtracking, cross-verification) during reasoning process, which improves their performance on complex tasks. Currently, reasoning strategies are autonomously selected by LRMs themselves. However, such autonomous selection often produces inefficient or even erroneous reasoning paths. To make reasoning more reliable and flexible, it is important to develop methods for controlling reasoning strategies. Existing methods struggle to control fine-grained reasoning strategies due to conceptual entanglement in LRMs' hidden states. To address this, we leverage Sparse Autoencoders (SAEs) to decompose strategy-entangled hidden states into a disentangled feature space. To identify the few strategy-specific features from the vast pool of SAE features, we propose SAE-Steering, an efficient two-stage feature identification pipeline. SAE-Steering first recalls features that amplify the logits of strategy-specific keywords, filtering out over 99\% of features, and then ranks the remaining features by their control effectiveness. Using the identified strategy-specific features as control vectors, SAE-Steering outperforms existing methods by over 15\% in control effectiveness. Furthermore, controlling reasoning strategies can redirect LRMs from erroneous paths to correct ones, achieving a 7\% absolute accuracy improvement.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_03595
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Controllable LLM Reasoning via Sparse Autoencoder-Based Steering Fang, Yi Wang, Wenjie Xue, Mingfeng Deng, Boyi Xu, Fengli Liu, Dayiheng Feng, Fuli Artificial Intelligence Computation and Language Large Reasoning Models (LRMs) exhibit human-like cognitive reasoning strategies (e.g. backtracking, cross-verification) during reasoning process, which improves their performance on complex tasks. Currently, reasoning strategies are autonomously selected by LRMs themselves. However, such autonomous selection often produces inefficient or even erroneous reasoning paths. To make reasoning more reliable and flexible, it is important to develop methods for controlling reasoning strategies. Existing methods struggle to control fine-grained reasoning strategies due to conceptual entanglement in LRMs' hidden states. To address this, we leverage Sparse Autoencoders (SAEs) to decompose strategy-entangled hidden states into a disentangled feature space. To identify the few strategy-specific features from the vast pool of SAE features, we propose SAE-Steering, an efficient two-stage feature identification pipeline. SAE-Steering first recalls features that amplify the logits of strategy-specific keywords, filtering out over 99\% of features, and then ranks the remaining features by their control effectiveness. Using the identified strategy-specific features as control vectors, SAE-Steering outperforms existing methods by over 15\% in control effectiveness. Furthermore, controlling reasoning strategies can redirect LRMs from erroneous paths to correct ones, achieving a 7\% absolute accuracy improvement.
title	Controllable LLM Reasoning via Sparse Autoencoder-Based Steering
topic	Artificial Intelligence Computation and Language
url	https://arxiv.org/abs/2601.03595

Similar Items