Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lin, Yen-Ting, Jin, Di, Xu, Tengyu, Wu, Tianhao, Sukhbaatar, Sainbayar, Zhu, Chen, He, Yun, Chen, Yun-Nung, Weston, Jason, Tian, Yuandong, Rahnama, Arash, Wang, Sinong, Ma, Hao, Fang, Han
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2501.10799
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Large language models (LLMs) have recently demonstrated remarkable success in mathematical reasoning. Despite progress in methods like chain-of-thought prompting and self-consistency sampling, these advances often focus on final correctness without ensuring that the underlying reasoning process is coherent and reliable. This paper introduces Step-KTO, a training framework that combines process-level and outcome-level binary feedback to guide LLMs toward more trustworthy reasoning trajectories. By providing binary evaluations for both the intermediate reasoning steps and the final answer, Step-KTO encourages the model to adhere to logical progressions rather than relying on superficial shortcuts. Our experiments on challenging mathematical benchmarks show that Step-KTO significantly improves both final answer accuracy and the quality of intermediate reasoning steps. For example, on the MATH-500 dataset, Step-KTO achieves a notable improvement in Pass@1 accuracy over strong baselines. These results highlight the promise of integrating stepwise process feedback into LLM training, paving the way toward more interpretable and dependable reasoning capabilities.

Similar Items