:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Zhaohui, He, Chenghua, Shi, Xiaowen, Li, Linjing, Yin, Qiyue, Deng, Shihong, Jiang, Daxin
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2505.14391
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning
by: Yang, Zhaohui, et al.
Published: (2025)

Unlocking Multimodal Mathematical Reasoning via Process Reward Model
by: Luo, Ruilin, et al.
Published: (2025)

GRPO and Reflection Reward for Mathematical Reasoning in Large Language Models
by: Wang, Zhijie
Published: (2026)

Coarse-to-Fine Process Reward Modeling for Mathematical Reasoning
by: Hu, Yulan, et al.
Published: (2025)

Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
by: He, Haoran, et al.
Published: (2025)

Trade-R1: Bridging Verifiable Rewards to Stochastic Environments via Process-Level Reasoning Verification
by: Sun, Rui, et al.
Published: (2026)

The Lessons of Developing Process Reward Models in Mathematical Reasoning
by: Zhang, Zhenru, et al.
Published: (2025)

Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning
by: Zhu, Jiachen, et al.
Published: (2025)

MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning
by: Chen, Jinhao, et al.
Published: (2025)

ProcessBench: Identifying Process Errors in Mathematical Reasoning
by: Zheng, Chujie, et al.
Published: (2024)

CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models in Mathematical Reasoning
by: Zheng, Congmin, et al.
Published: (2025)

Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision
by: Pala, Tej Deep, et al.
Published: (2025)

First Try Matters: Revisiting the Role of Reflection in Reasoning Models
by: Kang, Liwei, et al.
Published: (2025)

Beyond Outcome Verification: Verifiable Process Reward Models for Structured Reasoning
by: Pronesti, Massimiliano, et al.
Published: (2026)

On the Size Complexity and Decidability of First-Order Progression
by: Classen, Jens, et al.
Published: (2026)

Mathematical Computation and Reasoning Errors by Large Language Models
by: Zhang, Liang, et al.
Published: (2025)

Evaluating Robustness of Reward Models for Mathematical Reasoning
by: Kim, Sunghwan, et al.
Published: (2024)

An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical Reasoning
by: Sun, Wei, et al.
Published: (2025)

SpotAgent: Grounding Visual Geo-localization in Large Vision-Language Models through Agentic Reasoning
by: Jia, Furong, et al.
Published: (2026)

Socratic-PRMBench: Benchmarking Process Reward Models with Systematic Reasoning Patterns
by: Li, Xiang, et al.
Published: (2025)

Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning
by: Rajaee, Sara, et al.
Published: (2025)

Beyond Correctness: Confidence-Aware Reward Modeling for Enhancing Large Language Model Reasoning
by: He, Qianxi, et al.
Published: (2025)

Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction
by: Li, Xiaoyuan, et al.
Published: (2024)

DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs
by: Shu, Yubo, et al.
Published: (2025)

Uncertainty-Based Methods for Automated Process Reward Data Construction and Output Aggregation in Mathematical Reasoning
by: Han, Jiuzhou, et al.
Published: (2025)

Verifiable Process Rewards for Agentic Reasoning
by: Yuan, Huining, et al.
Published: (2026)

What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning
by: Ma, Yiran, et al.
Published: (2024)

CAMEL: Confidence-Gated Reflection for Reward Modeling
by: Zhu, Zirui, et al.
Published: (2026)

Advancing Reasoning in Diffusion Language Models with Denoising Process Rewards
by: Xie, Shaoan, et al.
Published: (2025)

Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning
by: Yu, Erxin, et al.
Published: (2025)

Promoting Efficient Reasoning with Verifiable Stepwise Reward
by: Yue, Chuhuai, et al.
Published: (2025)

Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
by: Wang, Shuai, et al.
Published: (2025)

Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
by: Wang, Teng, et al.
Published: (2025)

FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models
by: Jiang, Kehan, et al.
Published: (2026)

Beyond Accuracy: Evaluating Strategy Diversity in LLM Mathematical Reasoning
by: Yang, Xia, et al.
Published: (2026)

Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling
by: Liu, Gongye, et al.
Published: (2026)

Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence Benchmarks
by: Wang, Yang, et al.
Published: (2025)

WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents
by: Zhang, Yao, et al.
Published: (2026)

Efficient Paths and Dense Rewards: Probabilistic Flow Reasoning for Large Language Models
by: Liu, Yan, et al.
Published: (2026)

GR-Ben: A General Reasoning Benchmark for Evaluating Process Reward Models
by: Sun, Zhouhao, et al.
Published: (2026)