Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Wang, Xiangfeng, Guo, Hangyu, Lai, Yanlin, Huang, Mitt, Zhao, Liang, Yao, Chengyuan, Zhang, Yinmin, Han, Qi, Ren, Xiaoxiao, Yuan, Chun, Xu, Tong, Ge, Zheng, Zhang, Xiangyu, Jiang, Daxin
Format: Preprint
Veröffentlicht: 2026
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2602.11570
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866914324466892800
author Wang, Xiangfeng
Guo, Hangyu
Lai, Yanlin
Huang, Mitt
Zhao, Liang
Yao, Chengyuan
Zhang, Yinmin
Han, Qi
Ren, Xiaoxiao
Yuan, Chun
Xu, Tong
Ge, Zheng
Zhang, Xiangyu
Jiang, Daxin
author_facet Wang, Xiangfeng
Guo, Hangyu
Lai, Yanlin
Huang, Mitt
Zhao, Liang
Yao, Chengyuan
Zhang, Yinmin
Han, Qi
Ren, Xiaoxiao
Yuan, Chun
Xu, Tong
Ge, Zheng
Zhang, Xiangyu
Jiang, Daxin
contents While model-based verifiers are essential for scaling Reinforcement Learning with Verifiable Rewards (RLVR), current outcome-centric verification paradigms primarily focus on the consistency between the final result and the ground truth, often neglecting potential errors in the derivation process. This leads to assigning positive rewards to correct answers produced from incorrect derivations. To bridge this gap, we introduce PRIME, a benchmark for evaluating verifiers on Process-Outcome Alignment verification in Mathematics and Engineering. Curated from a comprehensive collection of college-level STEM problems, PRIME comprises 2,530 high-difficulty samples through a consistency-based filtering pipeline. Through extensive evaluation, we find that current verifiers frequently fail to detect derivation flaws. Furthermore, we propose a process-aware RLVR training paradigm utilizing verifiers selected via PRIME. This approach substantially outperforms the outcome-only verification baseline, achieving absolute performance gains of 8.29%, 9.12%, and 7.31% on AIME24, AIME25, and Beyond-AIME, respectively, for the Qwen3-14B-Base model. Finally, we demonstrate a strong linear correlation ($R^2 > 0.92$) between verifier accuracy on PRIME and RLVR training effectiveness, validating PRIME as a reliable predictor for verifier selection.
format Preprint
id arxiv_https___arxiv_org_abs_2602_11570
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle PRIME: A Process-Outcome Alignment Benchmark for Verifiable Reasoning in Mathematics and Engineering
Wang, Xiangfeng
Guo, Hangyu
Lai, Yanlin
Huang, Mitt
Zhao, Liang
Yao, Chengyuan
Zhang, Yinmin
Han, Qi
Ren, Xiaoxiao
Yuan, Chun
Xu, Tong
Ge, Zheng
Zhang, Xiangyu
Jiang, Daxin
Computation and Language
While model-based verifiers are essential for scaling Reinforcement Learning with Verifiable Rewards (RLVR), current outcome-centric verification paradigms primarily focus on the consistency between the final result and the ground truth, often neglecting potential errors in the derivation process. This leads to assigning positive rewards to correct answers produced from incorrect derivations. To bridge this gap, we introduce PRIME, a benchmark for evaluating verifiers on Process-Outcome Alignment verification in Mathematics and Engineering. Curated from a comprehensive collection of college-level STEM problems, PRIME comprises 2,530 high-difficulty samples through a consistency-based filtering pipeline. Through extensive evaluation, we find that current verifiers frequently fail to detect derivation flaws. Furthermore, we propose a process-aware RLVR training paradigm utilizing verifiers selected via PRIME. This approach substantially outperforms the outcome-only verification baseline, achieving absolute performance gains of 8.29%, 9.12%, and 7.31% on AIME24, AIME25, and Beyond-AIME, respectively, for the Qwen3-14B-Base model. Finally, we demonstrate a strong linear correlation ($R^2 > 0.92$) between verifier accuracy on PRIME and RLVR training effectiveness, validating PRIME as a reliable predictor for verifier selection.
title PRIME: A Process-Outcome Alignment Benchmark for Verifiable Reasoning in Mathematics and Engineering
topic Computation and Language
url https://arxiv.org/abs/2602.11570