Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Cao, Lang, Chen, Renhong, Zou, Yingtian, Peng, Chao, Xu, Huacong, Wang, Yuxian, Ning, Wu, Chen, Qian, Peng, Mofan, Chen, Zijie, Su, Peishuo, Li, Yitong
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2503.22233
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914378946707456
author	Cao, Lang Chen, Renhong Zou, Yingtian Peng, Chao Xu, Huacong Wang, Yuxian Ning, Wu Chen, Qian Peng, Mofan Chen, Zijie Su, Peishuo Li, Yitong
author_facet	Cao, Lang Chen, Renhong Zou, Yingtian Peng, Chao Xu, Huacong Wang, Yuxian Ning, Wu Chen, Qian Peng, Mofan Chen, Zijie Su, Peishuo Li, Yitong
contents	We introduce the Entropy-Driven Uncertainty Process Reward Model (EDU-PRM), a novel entropy-driven training framework for process reward modeling that enables dynamic, uncertainty-aligned segmentation of complex reasoning steps, eliminating the need for costly manual step annotations. Unlike previous Process Reward Models (PRMs) that rely on static partitioning and human labeling, EDU-PRM automatically anchors step boundaries at tokens with high predictive entropy, effectively capturing intrinsic logical transitions and facilitating efficient exploration of diverse reasoning paths. On the ProcessBench benchmark, EDU-PRM outperforms strong public PRM baselines, such as Math-Shepherd PRM and Omega PRM, and EDU-PRM achieves comparable results with SOTA models while only using 1.5% training data. Furthermore, by leveraging our proposed EDU sampling strategy, we observe accuracy boosts from 64.7% to 67.3% for generative reasoning tasks, accompanied by a reduction of 32% in token usage. These findings underscore the potential of EDU-PRM as a scalable and annotation-efficient paradigm for process supervision in mathematical reasoning, paving the way for more efficient and robust approaches to complex mathematical problem solving.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_22233
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	More Bang for the Buck: Process Reward Modeling with Entropy-Driven Uncertainty Cao, Lang Chen, Renhong Zou, Yingtian Peng, Chao Xu, Huacong Wang, Yuxian Ning, Wu Chen, Qian Peng, Mofan Chen, Zijie Su, Peishuo Li, Yitong Machine Learning Artificial Intelligence Computation and Language We introduce the Entropy-Driven Uncertainty Process Reward Model (EDU-PRM), a novel entropy-driven training framework for process reward modeling that enables dynamic, uncertainty-aligned segmentation of complex reasoning steps, eliminating the need for costly manual step annotations. Unlike previous Process Reward Models (PRMs) that rely on static partitioning and human labeling, EDU-PRM automatically anchors step boundaries at tokens with high predictive entropy, effectively capturing intrinsic logical transitions and facilitating efficient exploration of diverse reasoning paths. On the ProcessBench benchmark, EDU-PRM outperforms strong public PRM baselines, such as Math-Shepherd PRM and Omega PRM, and EDU-PRM achieves comparable results with SOTA models while only using 1.5% training data. Furthermore, by leveraging our proposed EDU sampling strategy, we observe accuracy boosts from 64.7% to 67.3% for generative reasoning tasks, accompanied by a reduction of 32% in token usage. These findings underscore the potential of EDU-PRM as a scalable and annotation-efficient paradigm for process supervision in mathematical reasoning, paving the way for more efficient and robust approaches to complex mathematical problem solving.
title	More Bang for the Buck: Process Reward Modeling with Entropy-Driven Uncertainty
topic	Machine Learning Artificial Intelligence Computation and Language
url	https://arxiv.org/abs/2503.22233

Similar Items