Enregistré dans:
| Auteurs principaux: | , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Publié: |
2026
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2602.01982 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
| _version_ | 1866910008730451968 |
|---|---|
| author | Du, Yanrui Zhao, Sendong Gao, Yibo Zhao, Danyang Lin, Qika Ma, Ming Li, Jiayun Jiang, Yi He, Kai Xu, Qianyi Qin, Bing Feng, Mengling |
| author_facet | Du, Yanrui Zhao, Sendong Gao, Yibo Zhao, Danyang Lin, Qika Ma, Ming Li, Jiayun Jiang, Yi He, Kai Xu, Qianyi Qin, Bing Feng, Mengling |
| contents | Large language models (LLMs) equipped with chain-of-thought (CoT) achieve strong performance and offer a window into LLM behavior. However, recent evidence suggests that improvements in CoT capabilities often come with redundant reasoning processes, motivating a key question: Can LLMs acquire a fast-thinking mode analogous to human System 1 reasoning? To explore this, our study presents a self-sampling framework based on activation steering for efficient CoT learning. Our method can induce style-aligned and variable-length reasoning traces from target LLMs themselves without any teacher guidance, thereby alleviating a central bottleneck of SFT-based methods-the scarcity of high-quality supervision data. Using filtered data by gold answers, we perform SFT for efficient CoT learning with (i) a human-like dual-cognitive system, and (ii) a progressive compression curriculum. Furthermore, we explore a self-evolution regime in which SFT is driven solely by prediction-consistent data of variable-length variants, eliminating the need for gold answers. Extensive experiments on math benchmarks, together with cross-domain generalization tests in medicine, show that our method yields stable improvements for both general and R1-style LLMs. Our data and model checkpoints can be found at https://github.com/DYR1/S3-CoT. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2602_01982 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | S3-CoT: Self-Sampled Succinct Reasoning Enables Efficient Chain-of-Thought LLMs Du, Yanrui Zhao, Sendong Gao, Yibo Zhao, Danyang Lin, Qika Ma, Ming Li, Jiayun Jiang, Yi He, Kai Xu, Qianyi Qin, Bing Feng, Mengling Computation and Language Large language models (LLMs) equipped with chain-of-thought (CoT) achieve strong performance and offer a window into LLM behavior. However, recent evidence suggests that improvements in CoT capabilities often come with redundant reasoning processes, motivating a key question: Can LLMs acquire a fast-thinking mode analogous to human System 1 reasoning? To explore this, our study presents a self-sampling framework based on activation steering for efficient CoT learning. Our method can induce style-aligned and variable-length reasoning traces from target LLMs themselves without any teacher guidance, thereby alleviating a central bottleneck of SFT-based methods-the scarcity of high-quality supervision data. Using filtered data by gold answers, we perform SFT for efficient CoT learning with (i) a human-like dual-cognitive system, and (ii) a progressive compression curriculum. Furthermore, we explore a self-evolution regime in which SFT is driven solely by prediction-consistent data of variable-length variants, eliminating the need for gold answers. Extensive experiments on math benchmarks, together with cross-domain generalization tests in medicine, show that our method yields stable improvements for both general and R1-style LLMs. Our data and model checkpoints can be found at https://github.com/DYR1/S3-CoT. |
| title | S3-CoT: Self-Sampled Succinct Reasoning Enables Efficient Chain-of-Thought LLMs |
| topic | Computation and Language |
| url | https://arxiv.org/abs/2602.01982 |