:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Zongsheng, Sun, Kaili, Wu, Bowen, Yu, Qun, Li, Ying, Wang, Baoxun
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2505.10218
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Improving Generalization in Intent Detection: GRPO with Reward-Based Curriculum Sampling
by: Feng, Zihao, et al.
Published: (2025)

Interpersonal Memory Matters: A New Task for Proactive Dialogue Utilizing Conversational History
by: Wu, Bowen, et al.
Published: (2025)

ToolSample: Dual Dynamic Sampling Methods with Curriculum Learning for RL-based Tool Learning
by: Feng, Zihao, et al.
Published: (2025)

LaF-GRPO: In-Situ Navigation Instruction Generation for the Visually Impaired via GRPO with LLM-as-Follower Reward
by: Zhao, Yi, et al.
Published: (2025)

Towards the Holographic Characteristic of LLMs for Efficient Short-text Generation
by: Qian, Shun, et al.
Published: (2026)

F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
by: Sun, Xiaohui, et al.
Published: (2025)

Lessons from Training Grounded LLMs with Verifiable Rewards
by: Sim, Shang Hong, et al.
Published: (2025)

GTPO and GRPO-S: Token and Sequence-Level Reward Shaping with Policy Entropy
by: Tan, Hongze, et al.
Published: (2025)

RM-R1: Reward Modeling as Reasoning
by: Chen, Xiusi, et al.
Published: (2025)

Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
by: Wen, Xumeng, et al.
Published: (2025)

bi-GRPO: Bidirectional Optimization for Jailbreak Backdoor Injection on LLMs
by: Ji, Wence, et al.
Published: (2025)

Logic-Regularized Verifier Elicits Reasoning from LLMs
by: Wang, Xinyu, et al.
Published: (2026)

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
by: Liu, Shudong, et al.
Published: (2025)

Chart-RL: Generalized Chart Comprehension via Reinforcement Learning with Verifiable Rewards
by: Zhang, Xin, et al.
Published: (2026)

Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards
by: Lara, Luis, et al.
Published: (2026)

Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
by: Su, Yi, et al.
Published: (2025)

From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation
by: Jiang, Yuxin, et al.
Published: (2026)

Shop-R1: Rewarding LLMs to Simulate Human Behavior in Online Shopping via Reinforcement Learning
by: Zhang, Yimeng, et al.
Published: (2025)

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
by: Liu, Xiaoyuan, et al.
Published: (2025)

DRA-GRPO: Your GRPO Needs to Know Diverse Reasoning Paths for Mathematical Reasoning
by: Chen, Xiwen, et al.
Published: (2025)

LongR: Unleashing Long-Context Reasoning via Reinforcement Learning with Dense Utility Rewards
by: Ping, Bowen, et al.
Published: (2026)

R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO
by: Yao, Huanjin, et al.
Published: (2025)

PRISM: A Unified Framework for Post-Training LLMs Without Verifiable Rewards
by: Ghimire, Mukesh, et al.
Published: (2026)

Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance
by: Yan, Kai, et al.
Published: (2026)

ConfClip: Confidence-Weighted and Clipped Reward for Reinforcement Learning in LLMs
by: Zhang, Bonan, et al.
Published: (2025)

MMR-GRPO: Accelerating GRPO-Style Training through Diversity-Aware Reward Reweighting
by: Wei, Kangda, et al.
Published: (2026)

$λ$-GRPO: Unifying the GRPO Frameworks with Learnable Token Preferences
by: Wang, Yining, et al.
Published: (2025)

S-GRPO: Unified Post-Training for Large Vision-Language Models
by: Yan, Yuming, et al.
Published: (2026)

Incentivizing Parametric Knowledge via Reinforcement Learning with Verifiable Rewards for Cross-Cultural Entity Translation
by: Zhou, Jiang, et al.
Published: (2026)

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
by: Bensal, Shelly, et al.
Published: (2025)

Improving Value-based Process Verifier via Structural Prior Injection
by: Sun, Zetian, et al.
Published: (2025)

Rectify Evaluation Preference: Improving LLMs' Critique on Math Reasoning via Perplexity-aware Reinforcement Learning
by: Tian, Changyuan, et al.
Published: (2025)

Learning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewards
by: Ren, Mengjie, et al.
Published: (2026)

RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
by: Wang, Peisong, et al.
Published: (2025)

IFDECORATOR: Wrapping Instruction Following Reinforcement Learning with Verifiable Rewards
by: Guo, Xu, et al.
Published: (2025)

Shaping Explanations: Semantic Reward Modeling with Encoder-Only Transformers for GRPO
by: Pappone, Francesco, et al.
Published: (2025)

Bridging the Semantic Gap: Contrastive Rewards for Multilingual Text-to-SQL with GRPO
by: Kattamuri, Ashish, et al.
Published: (2025)

Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
by: Peng, Hao, et al.
Published: (2025)

Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards
by: Liu, Shuze Daniel, et al.
Published: (2026)

CPO: Addressing Reward Ambiguity in Role-playing Dialogue via Comparative Policy Optimization
by: Ye, Xinge, et al.
Published: (2025)