Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Du, Yanrui, Zhao, Sendong, Gao, Yibo, Zhao, Danyang, Lin, Qika, Ma, Ming, Li, Jiayun, Jiang, Yi, He, Kai, Xu, Qianyi, Qin, Bing, Feng, Mengling
Format:	Preprint
Publié:	2026
Sujets:	Computation and Language
Accès en ligne:	https://arxiv.org/abs/2602.01982
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866910008730451968
author	Du, Yanrui Zhao, Sendong Gao, Yibo Zhao, Danyang Lin, Qika Ma, Ming Li, Jiayun Jiang, Yi He, Kai Xu, Qianyi Qin, Bing Feng, Mengling
author_facet	Du, Yanrui Zhao, Sendong Gao, Yibo Zhao, Danyang Lin, Qika Ma, Ming Li, Jiayun Jiang, Yi He, Kai Xu, Qianyi Qin, Bing Feng, Mengling
contents	Large language models (LLMs) equipped with chain-of-thought (CoT) achieve strong performance and offer a window into LLM behavior. However, recent evidence suggests that improvements in CoT capabilities often come with redundant reasoning processes, motivating a key question: Can LLMs acquire a fast-thinking mode analogous to human System 1 reasoning? To explore this, our study presents a self-sampling framework based on activation steering for efficient CoT learning. Our method can induce style-aligned and variable-length reasoning traces from target LLMs themselves without any teacher guidance, thereby alleviating a central bottleneck of SFT-based methods-the scarcity of high-quality supervision data. Using filtered data by gold answers, we perform SFT for efficient CoT learning with (i) a human-like dual-cognitive system, and (ii) a progressive compression curriculum. Furthermore, we explore a self-evolution regime in which SFT is driven solely by prediction-consistent data of variable-length variants, eliminating the need for gold answers. Extensive experiments on math benchmarks, together with cross-domain generalization tests in medicine, show that our method yields stable improvements for both general and R1-style LLMs. Our data and model checkpoints can be found at https://github.com/DYR1/S3-CoT.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_01982
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	S3-CoT: Self-Sampled Succinct Reasoning Enables Efficient Chain-of-Thought LLMs Du, Yanrui Zhao, Sendong Gao, Yibo Zhao, Danyang Lin, Qika Ma, Ming Li, Jiayun Jiang, Yi He, Kai Xu, Qianyi Qin, Bing Feng, Mengling Computation and Language Large language models (LLMs) equipped with chain-of-thought (CoT) achieve strong performance and offer a window into LLM behavior. However, recent evidence suggests that improvements in CoT capabilities often come with redundant reasoning processes, motivating a key question: Can LLMs acquire a fast-thinking mode analogous to human System 1 reasoning? To explore this, our study presents a self-sampling framework based on activation steering for efficient CoT learning. Our method can induce style-aligned and variable-length reasoning traces from target LLMs themselves without any teacher guidance, thereby alleviating a central bottleneck of SFT-based methods-the scarcity of high-quality supervision data. Using filtered data by gold answers, we perform SFT for efficient CoT learning with (i) a human-like dual-cognitive system, and (ii) a progressive compression curriculum. Furthermore, we explore a self-evolution regime in which SFT is driven solely by prediction-consistent data of variable-length variants, eliminating the need for gold answers. Extensive experiments on math benchmarks, together with cross-domain generalization tests in medicine, show that our method yields stable improvements for both general and R1-style LLMs. Our data and model checkpoints can be found at https://github.com/DYR1/S3-CoT.
title	S3-CoT: Self-Sampled Succinct Reasoning Enables Efficient Chain-of-Thought LLMs
topic	Computation and Language
url	https://arxiv.org/abs/2602.01982

Documents similaires