Saved in:
Bibliographic Details
Main Authors: Ji, Ke, Xu, Jiahao, Liang, Tian, Liu, Qiuzhi, He, Zhiwei, Chen, Xingyu, Liu, Xiaoyuan, Wang, Zhijie, Chen, Junying, Wang, Benyou, Tu, Zhaopeng, Mi, Haitao, Yu, Dong
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.02875
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917945713623040
author Ji, Ke
Xu, Jiahao
Liang, Tian
Liu, Qiuzhi
He, Zhiwei
Chen, Xingyu
Liu, Xiaoyuan
Wang, Zhijie
Chen, Junying
Wang, Benyou
Tu, Zhaopeng
Mi, Haitao
Yu, Dong
author_facet Ji, Ke
Xu, Jiahao
Liang, Tian
Liu, Qiuzhi
He, Zhiwei
Chen, Xingyu
Liu, Xiaoyuan
Wang, Zhijie
Chen, Junying
Wang, Benyou
Tu, Zhaopeng
Mi, Haitao
Yu, Dong
contents Improving the reasoning capabilities of large language models (LLMs) typically requires supervised fine-tuning with labeled data or computationally expensive sampling. We introduce Unsupervised Prefix Fine-Tuning (UPFT), which leverages the observation of Prefix Self-Consistency -- the shared initial reasoning steps across diverse solution trajectories -- to enhance LLM reasoning efficiency. By training exclusively on the initial prefix substrings (as few as 8 tokens), UPFT removes the need for labeled data or exhaustive sampling. Experiments on reasoning benchmarks show that UPFT matches the performance of supervised methods such as Rejection Sampling Fine-Tuning, while reducing training time by 75% and sampling cost by 99%. Further analysis reveals that errors tend to appear in later stages of the reasoning process and that prefix-based training preserves the model's structural knowledge. This work demonstrates how minimal unsupervised fine-tuning can unlock substantial reasoning gains in LLMs, offering a scalable and resource-efficient alternative to conventional approaches.
format Preprint
id arxiv_https___arxiv_org_abs_2503_02875
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models
Ji, Ke
Xu, Jiahao
Liang, Tian
Liu, Qiuzhi
He, Zhiwei
Chen, Xingyu
Liu, Xiaoyuan
Wang, Zhijie
Chen, Junying
Wang, Benyou
Tu, Zhaopeng
Mi, Haitao
Yu, Dong
Computation and Language
Improving the reasoning capabilities of large language models (LLMs) typically requires supervised fine-tuning with labeled data or computationally expensive sampling. We introduce Unsupervised Prefix Fine-Tuning (UPFT), which leverages the observation of Prefix Self-Consistency -- the shared initial reasoning steps across diverse solution trajectories -- to enhance LLM reasoning efficiency. By training exclusively on the initial prefix substrings (as few as 8 tokens), UPFT removes the need for labeled data or exhaustive sampling. Experiments on reasoning benchmarks show that UPFT matches the performance of supervised methods such as Rejection Sampling Fine-Tuning, while reducing training time by 75% and sampling cost by 99%. Further analysis reveals that errors tend to appear in later stages of the reasoning process and that prefix-based training preserves the model's structural knowledge. This work demonstrates how minimal unsupervised fine-tuning can unlock substantial reasoning gains in LLMs, offering a scalable and resource-efficient alternative to conventional approaches.
title The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models
topic Computation and Language
url https://arxiv.org/abs/2503.02875