Saved in:
| Main Authors: | Chi, Yinghui, Wang, Lucien |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.02395 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Verifiable Process Rewards for Agentic Reasoning
by: Yuan, Huining, et al.
Published: (2026)
by: Yuan, Huining, et al.
Published: (2026)
Beyond Outcome Verification: Verifiable Process Reward Models for Structured Reasoning
by: Pronesti, Massimiliano, et al.
Published: (2026)
by: Pronesti, Massimiliano, et al.
Published: (2026)
Adversarial Training for Process Reward Models
by: Juneja, Gurusha, et al.
Published: (2025)
by: Juneja, Gurusha, et al.
Published: (2025)
Process Reward Models That Think
by: Khalifa, Muhammad, et al.
Published: (2025)
by: Khalifa, Muhammad, et al.
Published: (2025)
Time Series Reasoning via Process-Verifiable Thinking Data Synthesis and Scheduling for Tailored LLM Reasoning
by: Zhou, Jiahui, et al.
Published: (2026)
by: Zhou, Jiahui, et al.
Published: (2026)
Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis
by: Qiu, Zhisong, et al.
Published: (2026)
by: Qiu, Zhisong, et al.
Published: (2026)
Controllable and Verifiable Tool-Use Data Synthesis for Agentic Reinforcement Learning
by: Xu, Siyuan, et al.
Published: (2026)
by: Xu, Siyuan, et al.
Published: (2026)
Coarse-to-Fine Process Reward Modeling for Mathematical Reasoning
by: Hu, Yulan, et al.
Published: (2025)
by: Hu, Yulan, et al.
Published: (2025)
Trade-R1: Bridging Verifiable Rewards to Stochastic Environments via Process-Level Reasoning Verification
by: Sun, Rui, et al.
Published: (2026)
by: Sun, Rui, et al.
Published: (2026)
GRPO is Secretly a Process Reward Model
by: Sullivan, Michael, et al.
Published: (2025)
by: Sullivan, Michael, et al.
Published: (2025)
Process Reward Model with Q-Value Rankings
by: Li, Wendi, et al.
Published: (2024)
by: Li, Wendi, et al.
Published: (2024)
Process-based Self-Rewarding Language Models
by: Zhang, Shimao, et al.
Published: (2025)
by: Zhang, Shimao, et al.
Published: (2025)
Advancing Reasoning in Diffusion Language Models with Denoising Process Rewards
by: Xie, Shaoan, et al.
Published: (2025)
by: Xie, Shaoan, et al.
Published: (2025)
The Implicit Curriculum: Learning Dynamics in RL with Verifiable Rewards
by: Huang, Yu, et al.
Published: (2026)
by: Huang, Yu, et al.
Published: (2026)
Correct Answers from Sound Reasoning: Verifiable Process Supervision for Language Models
by: Kim, Kyuyoung, et al.
Published: (2026)
by: Kim, Kyuyoung, et al.
Published: (2026)
Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning
by: Zhu, Jiachen, et al.
Published: (2025)
by: Zhu, Jiachen, et al.
Published: (2025)
Rubric-Guided Process Reward for Stepwise Model Routing
by: Ye, Shenghao, et al.
Published: (2026)
by: Ye, Shenghao, et al.
Published: (2026)
MASPRM: Multi-Agent System Process Reward Model
by: Yazdani, Milad, et al.
Published: (2025)
by: Yazdani, Milad, et al.
Published: (2025)
Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers
by: Cai, Xin-Qiang, et al.
Published: (2025)
by: Cai, Xin-Qiang, et al.
Published: (2025)
Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
by: Wang, Shuai, et al.
Published: (2025)
by: Wang, Shuai, et al.
Published: (2025)
Reward Hacking Mitigation using Verifiable Composite Rewards
by: Tarek, Mirza Farhan Bin, et al.
Published: (2025)
by: Tarek, Mirza Farhan Bin, et al.
Published: (2025)
Auditing Data Membership in Reinforcement Learning With Verifiable Rewards
by: Liu, Yule, et al.
Published: (2025)
by: Liu, Yule, et al.
Published: (2025)
A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models
by: Zheng, Congmin, et al.
Published: (2025)
by: Zheng, Congmin, et al.
Published: (2025)
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
by: Peng, Hao, et al.
Published: (2025)
by: Peng, Hao, et al.
Published: (2025)
DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning
by: Cao, Qi, et al.
Published: (2025)
by: Cao, Qi, et al.
Published: (2025)
Beyond the First Error: Process Reward Models for Reflective Mathematical Reasoning
by: Yang, Zhaohui, et al.
Published: (2025)
by: Yang, Zhaohui, et al.
Published: (2025)
Process Reinforcement through Implicit Rewards
by: Cui, Ganqu, et al.
Published: (2025)
by: Cui, Ganqu, et al.
Published: (2025)
StructVRM: Aligning Multimodal Reasoning with Structured and Verifiable Reward Models
by: Zhang, Xiangxiang, et al.
Published: (2025)
by: Zhang, Xiangxiang, et al.
Published: (2025)
Process Rewards with Learned Reliability
by: Li, Jinyuan, et al.
Published: (2026)
by: Li, Jinyuan, et al.
Published: (2026)
Trust Your Memory: Verifiable Control of Smart Homes through Reinforcement Learning with Multi-dimensional Rewards
by: Guo, Kai-Yuan, et al.
Published: (2026)
by: Guo, Kai-Yuan, et al.
Published: (2026)
Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes
by: Rojas, Juan Sebastian, et al.
Published: (2024)
by: Rojas, Juan Sebastian, et al.
Published: (2024)
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
by: Luo, Ruilin, et al.
Published: (2025)
by: Luo, Ruilin, et al.
Published: (2025)
Promoting Efficient Reasoning with Verifiable Stepwise Reward
by: Yue, Chuhuai, et al.
Published: (2025)
by: Yue, Chuhuai, et al.
Published: (2025)
An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical Reasoning
by: Sun, Wei, et al.
Published: (2025)
by: Sun, Wei, et al.
Published: (2025)
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
by: Liu, Xiaoyuan, et al.
Published: (2025)
by: Liu, Xiaoyuan, et al.
Published: (2025)
Efficient Process Reward Model Training via Active Learning
by: Duan, Keyu, et al.
Published: (2025)
by: Duan, Keyu, et al.
Published: (2025)
Process Reward Models for LLM Agents: Practical Framework and Directions
by: Choudhury, Sanjiban
Published: (2025)
by: Choudhury, Sanjiban
Published: (2025)
Process Reward Agents for Steering Knowledge-Intensive Reasoning
by: Sohn, Jiwoong, et al.
Published: (2026)
by: Sohn, Jiwoong, et al.
Published: (2026)
Rewarding Structural Conformance of Reasoning using Process Mining
by: Lee, Yongjae, et al.
Published: (2025)
by: Lee, Yongjae, et al.
Published: (2025)
GUI-PRA: Process Reward Agent for GUI Tasks
by: Xiong, Tao, et al.
Published: (2025)
by: Xiong, Tao, et al.
Published: (2025)
Similar Items
-
Verifiable Process Rewards for Agentic Reasoning
by: Yuan, Huining, et al.
Published: (2026) -
Beyond Outcome Verification: Verifiable Process Reward Models for Structured Reasoning
by: Pronesti, Massimiliano, et al.
Published: (2026) -
Adversarial Training for Process Reward Models
by: Juneja, Gurusha, et al.
Published: (2025) -
Process Reward Models That Think
by: Khalifa, Muhammad, et al.
Published: (2025) -
Time Series Reasoning via Process-Verifiable Thinking Data Synthesis and Scheduling for Tailored LLM Reasoning
by: Zhou, Jiahui, et al.
Published: (2026)