Saved in:
| Main Authors: | Yang, Zhaohui, He, Chenghua, Shi, Xiaowen, Li, Linjing, Yin, Qiyue, Deng, Shihong, Jiang, Daxin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.14391 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning
by: Yang, Zhaohui, et al.
Published: (2025)
by: Yang, Zhaohui, et al.
Published: (2025)
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
by: Luo, Ruilin, et al.
Published: (2025)
by: Luo, Ruilin, et al.
Published: (2025)
GRPO and Reflection Reward for Mathematical Reasoning in Large Language Models
by: Wang, Zhijie
Published: (2026)
by: Wang, Zhijie
Published: (2026)
Coarse-to-Fine Process Reward Modeling for Mathematical Reasoning
by: Hu, Yulan, et al.
Published: (2025)
by: Hu, Yulan, et al.
Published: (2025)
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
by: He, Haoran, et al.
Published: (2025)
by: He, Haoran, et al.
Published: (2025)
Trade-R1: Bridging Verifiable Rewards to Stochastic Environments via Process-Level Reasoning Verification
by: Sun, Rui, et al.
Published: (2026)
by: Sun, Rui, et al.
Published: (2026)
The Lessons of Developing Process Reward Models in Mathematical Reasoning
by: Zhang, Zhenru, et al.
Published: (2025)
by: Zhang, Zhenru, et al.
Published: (2025)
Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning
by: Zhu, Jiachen, et al.
Published: (2025)
by: Zhu, Jiachen, et al.
Published: (2025)
MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning
by: Chen, Jinhao, et al.
Published: (2025)
by: Chen, Jinhao, et al.
Published: (2025)
ProcessBench: Identifying Process Errors in Mathematical Reasoning
by: Zheng, Chujie, et al.
Published: (2024)
by: Zheng, Chujie, et al.
Published: (2024)
CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models in Mathematical Reasoning
by: Zheng, Congmin, et al.
Published: (2025)
by: Zheng, Congmin, et al.
Published: (2025)
Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision
by: Pala, Tej Deep, et al.
Published: (2025)
by: Pala, Tej Deep, et al.
Published: (2025)
First Try Matters: Revisiting the Role of Reflection in Reasoning Models
by: Kang, Liwei, et al.
Published: (2025)
by: Kang, Liwei, et al.
Published: (2025)
Beyond Outcome Verification: Verifiable Process Reward Models for Structured Reasoning
by: Pronesti, Massimiliano, et al.
Published: (2026)
by: Pronesti, Massimiliano, et al.
Published: (2026)
On the Size Complexity and Decidability of First-Order Progression
by: Classen, Jens, et al.
Published: (2026)
by: Classen, Jens, et al.
Published: (2026)
Mathematical Computation and Reasoning Errors by Large Language Models
by: Zhang, Liang, et al.
Published: (2025)
by: Zhang, Liang, et al.
Published: (2025)
Evaluating Robustness of Reward Models for Mathematical Reasoning
by: Kim, Sunghwan, et al.
Published: (2024)
by: Kim, Sunghwan, et al.
Published: (2024)
An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical Reasoning
by: Sun, Wei, et al.
Published: (2025)
by: Sun, Wei, et al.
Published: (2025)
SpotAgent: Grounding Visual Geo-localization in Large Vision-Language Models through Agentic Reasoning
by: Jia, Furong, et al.
Published: (2026)
by: Jia, Furong, et al.
Published: (2026)
Socratic-PRMBench: Benchmarking Process Reward Models with Systematic Reasoning Patterns
by: Li, Xiang, et al.
Published: (2025)
by: Li, Xiang, et al.
Published: (2025)
Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning
by: Rajaee, Sara, et al.
Published: (2025)
by: Rajaee, Sara, et al.
Published: (2025)
Beyond Correctness: Confidence-Aware Reward Modeling for Enhancing Large Language Model Reasoning
by: He, Qianxi, et al.
Published: (2025)
by: He, Qianxi, et al.
Published: (2025)
Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction
by: Li, Xiaoyuan, et al.
Published: (2024)
by: Li, Xiaoyuan, et al.
Published: (2024)
DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs
by: Shu, Yubo, et al.
Published: (2025)
by: Shu, Yubo, et al.
Published: (2025)
Uncertainty-Based Methods for Automated Process Reward Data Construction and Output Aggregation in Mathematical Reasoning
by: Han, Jiuzhou, et al.
Published: (2025)
by: Han, Jiuzhou, et al.
Published: (2025)
Verifiable Process Rewards for Agentic Reasoning
by: Yuan, Huining, et al.
Published: (2026)
by: Yuan, Huining, et al.
Published: (2026)
What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning
by: Ma, Yiran, et al.
Published: (2024)
by: Ma, Yiran, et al.
Published: (2024)
CAMEL: Confidence-Gated Reflection for Reward Modeling
by: Zhu, Zirui, et al.
Published: (2026)
by: Zhu, Zirui, et al.
Published: (2026)
Advancing Reasoning in Diffusion Language Models with Denoising Process Rewards
by: Xie, Shaoan, et al.
Published: (2025)
by: Xie, Shaoan, et al.
Published: (2025)
Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning
by: Yu, Erxin, et al.
Published: (2025)
by: Yu, Erxin, et al.
Published: (2025)
Promoting Efficient Reasoning with Verifiable Stepwise Reward
by: Yue, Chuhuai, et al.
Published: (2025)
by: Yue, Chuhuai, et al.
Published: (2025)
Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
by: Wang, Shuai, et al.
Published: (2025)
by: Wang, Shuai, et al.
Published: (2025)
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
by: Wang, Teng, et al.
Published: (2025)
by: Wang, Teng, et al.
Published: (2025)
FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models
by: Jiang, Kehan, et al.
Published: (2026)
by: Jiang, Kehan, et al.
Published: (2026)
Beyond Accuracy: Evaluating Strategy Diversity in LLM Mathematical Reasoning
by: Yang, Xia, et al.
Published: (2026)
by: Yang, Xia, et al.
Published: (2026)
Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling
by: Liu, Gongye, et al.
Published: (2026)
by: Liu, Gongye, et al.
Published: (2026)
Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence Benchmarks
by: Wang, Yang, et al.
Published: (2025)
by: Wang, Yang, et al.
Published: (2025)
WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents
by: Zhang, Yao, et al.
Published: (2026)
by: Zhang, Yao, et al.
Published: (2026)
Efficient Paths and Dense Rewards: Probabilistic Flow Reasoning for Large Language Models
by: Liu, Yan, et al.
Published: (2026)
by: Liu, Yan, et al.
Published: (2026)
GR-Ben: A General Reasoning Benchmark for Evaluating Process Reward Models
by: Sun, Zhouhao, et al.
Published: (2026)
by: Sun, Zhouhao, et al.
Published: (2026)
Similar Items
-
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning
by: Yang, Zhaohui, et al.
Published: (2025) -
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
by: Luo, Ruilin, et al.
Published: (2025) -
GRPO and Reflection Reward for Mathematical Reasoning in Large Language Models
by: Wang, Zhijie
Published: (2026) -
Coarse-to-Fine Process Reward Modeling for Mathematical Reasoning
by: Hu, Yulan, et al.
Published: (2025) -
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
by: He, Haoran, et al.
Published: (2025)