Enregistré dans:
Détails bibliographiques
Auteurs principaux: Du, Yanrui, Zhao, Sendong, Gao, Yibo, Zhao, Danyang, Lin, Qika, Ma, Ming, Li, Jiayun, Jiang, Yi, He, Kai, Xu, Qianyi, Qin, Bing, Feng, Mengling
Format: Preprint
Publié: 2026
Sujets:
Accès en ligne:https://arxiv.org/abs/2602.01982
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866910008730451968
author Du, Yanrui
Zhao, Sendong
Gao, Yibo
Zhao, Danyang
Lin, Qika
Ma, Ming
Li, Jiayun
Jiang, Yi
He, Kai
Xu, Qianyi
Qin, Bing
Feng, Mengling
author_facet Du, Yanrui
Zhao, Sendong
Gao, Yibo
Zhao, Danyang
Lin, Qika
Ma, Ming
Li, Jiayun
Jiang, Yi
He, Kai
Xu, Qianyi
Qin, Bing
Feng, Mengling
contents Large language models (LLMs) equipped with chain-of-thought (CoT) achieve strong performance and offer a window into LLM behavior. However, recent evidence suggests that improvements in CoT capabilities often come with redundant reasoning processes, motivating a key question: Can LLMs acquire a fast-thinking mode analogous to human System 1 reasoning? To explore this, our study presents a self-sampling framework based on activation steering for efficient CoT learning. Our method can induce style-aligned and variable-length reasoning traces from target LLMs themselves without any teacher guidance, thereby alleviating a central bottleneck of SFT-based methods-the scarcity of high-quality supervision data. Using filtered data by gold answers, we perform SFT for efficient CoT learning with (i) a human-like dual-cognitive system, and (ii) a progressive compression curriculum. Furthermore, we explore a self-evolution regime in which SFT is driven solely by prediction-consistent data of variable-length variants, eliminating the need for gold answers. Extensive experiments on math benchmarks, together with cross-domain generalization tests in medicine, show that our method yields stable improvements for both general and R1-style LLMs. Our data and model checkpoints can be found at https://github.com/DYR1/S3-CoT.
format Preprint
id arxiv_https___arxiv_org_abs_2602_01982
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle S3-CoT: Self-Sampled Succinct Reasoning Enables Efficient Chain-of-Thought LLMs
Du, Yanrui
Zhao, Sendong
Gao, Yibo
Zhao, Danyang
Lin, Qika
Ma, Ming
Li, Jiayun
Jiang, Yi
He, Kai
Xu, Qianyi
Qin, Bing
Feng, Mengling
Computation and Language
Large language models (LLMs) equipped with chain-of-thought (CoT) achieve strong performance and offer a window into LLM behavior. However, recent evidence suggests that improvements in CoT capabilities often come with redundant reasoning processes, motivating a key question: Can LLMs acquire a fast-thinking mode analogous to human System 1 reasoning? To explore this, our study presents a self-sampling framework based on activation steering for efficient CoT learning. Our method can induce style-aligned and variable-length reasoning traces from target LLMs themselves without any teacher guidance, thereby alleviating a central bottleneck of SFT-based methods-the scarcity of high-quality supervision data. Using filtered data by gold answers, we perform SFT for efficient CoT learning with (i) a human-like dual-cognitive system, and (ii) a progressive compression curriculum. Furthermore, we explore a self-evolution regime in which SFT is driven solely by prediction-consistent data of variable-length variants, eliminating the need for gold answers. Extensive experiments on math benchmarks, together with cross-domain generalization tests in medicine, show that our method yields stable improvements for both general and R1-style LLMs. Our data and model checkpoints can be found at https://github.com/DYR1/S3-CoT.
title S3-CoT: Self-Sampled Succinct Reasoning Enables Efficient Chain-of-Thought LLMs
topic Computation and Language
url https://arxiv.org/abs/2602.01982