:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Wang, Zhijie
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.14041
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Beyond the First Error: Process Reward Models for Reflective Mathematical Reasoning
by: Yang, Zhaohui, et al.
Published: (2025)

GRPO is Secretly a Process Reward Model
by: Sullivan, Michael, et al.
Published: (2025)

Large Language Models and Mathematical Reasoning Failures
by: Boye, Johan, et al.
Published: (2025)

Coarse-to-Fine Process Reward Modeling for Mathematical Reasoning
by: Hu, Yulan, et al.
Published: (2025)

Fast-Slow Thinking GRPO for Large Vision-Language Model Reasoning
by: Xiao, Wenyi, et al.
Published: (2025)

A Survey on Large Language Models for Mathematical Reasoning
by: Wang, Peng-Yuan, et al.
Published: (2025)

Mathematical Computation and Reasoning Errors by Large Language Models
by: Zhang, Liang, et al.
Published: (2025)

Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning
by: Zhu, Jiachen, et al.
Published: (2025)

Evaluating Robustness of Reward Models for Mathematical Reasoning
by: Kim, Sunghwan, et al.
Published: (2024)

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training
by: Xu, Yuanda, et al.
Published: (2026)

GanitLLM: Difficulty-Aware Bengali Mathematical Reasoning through Curriculum-GRPO
by: Dipta, Shubhashis Roy, et al.
Published: (2026)

Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
by: Dai, Yanqi, et al.
Published: (2026)

A Survey on Mathematical Reasoning and Optimization with Large Language Models
by: Forootani, Ali
Published: (2025)

Reasoning Relay: Evaluating Stability and Interchangeability of Large Language Models in Mathematical Reasoning
by: Lu, Leo, et al.
Published: (2025)

Unlocking Multimodal Mathematical Reasoning via Process Reward Model
by: Luo, Ruilin, et al.
Published: (2025)

M2A: Synergizing Mathematical and Agentic Reasoning in Large Language Models
by: Wang, Junjian, et al.
Published: (2026)

Shaping Explanations: Semantic Reward Modeling with Encoder-Only Transformers for GRPO
by: Pappone, Francesco, et al.
Published: (2025)

Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning
by: Rajaee, Sara, et al.
Published: (2025)

The Lessons of Developing Process Reward Models in Mathematical Reasoning
by: Zhang, Zhenru, et al.
Published: (2025)

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward
by: Huang, Runhui, et al.
Published: (2026)

Teaching Large Reasoning Models Effective Reflection
by: Wang, Hanbin, et al.
Published: (2026)

Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model
by: Zhu, Xunyu, et al.
Published: (2024)

Beyond Correctness: Confidence-Aware Reward Modeling for Enhancing Large Language Model Reasoning
by: He, Qianxi, et al.
Published: (2025)

GTPO and GRPO-S: Token and Sequence-Level Reward Shaping with Policy Entropy
by: Tan, Hongze, et al.
Published: (2025)

From Reasoning to Code: GRPO Optimization for Underrepresented Languages
by: Pennino, Federico, et al.
Published: (2025)

ShapE-GRPO: Shapley-Enhanced Reward Allocation for Multi-Candidate LLM Training
by: Ai, Rui, et al.
Published: (2026)

Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning
by: Zhao, Jun, et al.
Published: (2024)

MMR-GRPO: Accelerating GRPO-Style Training through Diversity-Aware Reward Reweighting
by: Wei, Kangda, et al.
Published: (2026)

Noise-corrected GRPO: From Noisy Rewards to Unbiased Gradients
by: Mansouri, Omar El, et al.
Published: (2025)

Efficient Paths and Dense Rewards: Probabilistic Flow Reasoning for Large Language Models
by: Liu, Yan, et al.
Published: (2026)

MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning
by: Chen, Jinhao, et al.
Published: (2025)

What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning
by: Ma, Yiran, et al.
Published: (2024)

Revisiting Anthropomorphic Reflection Markers in Large Language Model Reasoning
by: Yu, Yahan, et al.
Published: (2026)

Numerical Sensitivity and Robustness: Exploring the Flaws of Mathematical Reasoning in Large Language Models
by: Sun, Zhishen, et al.
Published: (2025)

Step-GRPO: Internalizing Dynamic Early Exit for Efficient Reasoning
by: Chen, Benteng, et al.
Published: (2026)

Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback
by: Zhang, Xiaoying, et al.
Published: (2025)

Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
by: Wang, Teng, et al.
Published: (2025)

R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO
by: Yao, Huanjin, et al.
Published: (2025)

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
by: Mirzadeh, Iman, et al.
Published: (2024)

CAMA: Enhancing Mathematical Reasoning in Large Language Models with Causal Knowledge
by: Zan, Lei, et al.
Published: (2025)