:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Kim, Youngeun
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.22582
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GRAPH-GRPO-LEX: Contract Graph Modeling and Reinforcement Learning with Group Relative Policy Optimization
by: Dechtiar, Moriya, et al.
Published: (2025)

AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering
by: Cai, Yuzhu, et al.
Published: (2026)

Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
by: Zhang, Xichen, et al.
Published: (2025)

GRPO-VPS: Enhancing Group Relative Policy Optimization with Verifiable Process Supervision for Effective Reasoning
by: Wang, Jingyi, et al.
Published: (2026)

EP-GRPO: Entropy-Progress Aligned Group Relative Policy Optimization with Implicit Process Guidance
by: Yu, Song, et al.
Published: (2026)

Train Less, Learn More: Adaptive Efficient Rollout Optimization for Group-Based Reinforcement Learning
by: Zhang, Zhi, et al.
Published: (2026)

Co-GRPO: Co-Optimized Group Relative Policy Optimization for Masked Diffusion Model
by: Zhou, Renping, et al.
Published: (2025)

IB-GRPO: Aligning LLM-based Learning Path Recommendation with Educational Objectives via Indicator-Based Group Relative Policy Optimization
by: Wang, Shuai, et al.
Published: (2026)

SPEC-RL: Accelerating On-Policy Reinforcement Learning with Speculative Rollouts
by: Liu, Bingshuai, et al.
Published: (2025)

WS-GRPO: Weakly-Supervised Group-Relative Policy Optimization for Rollout-Efficient Reasoning
by: Mundada, Gagan, et al.
Published: (2026)

Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
by: Xu, Yixuan Even, et al.
Published: (2025)

SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization
by: Zheng, Zhi, et al.
Published: (2025)

How to Allocate, How to Learn? Dynamic Rollout Allocation and Advantage Modulation for Policy Optimization
by: Fang, Yangyi, et al.
Published: (2026)

EchoRL: Reinforcement Learning via Rollout Echoing
by: Bi, Jinhe, et al.
Published: (2026)

Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards
by: Lu, Xiaodong, et al.
Published: (2026)

GEPO: Group Expectation Policy Optimization for Stable Heterogeneous Reinforcement Learning
by: Zhang, Han, et al.
Published: (2025)

Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
by: Yao, Chaorui, et al.
Published: (2025)

NGRPO: Negative-enhanced Group Relative Policy Optimization
by: Nan, Gongrui, et al.
Published: (2025)

Guided Cooperation in Hierarchical Reinforcement Learning via Model-based Rollout
by: Wang, Haoran, et al.
Published: (2023)

S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models
by: Dai, Muzhi, et al.
Published: (2025)

Hybrid Group Relative Policy Optimization: A Multi-Sample Approach to Enhancing Policy Optimization
by: Sane, Soham
Published: (2025)

APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation
by: Zhou, Yuzhen, et al.
Published: (2025)

Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO
by: Chen, Peter, et al.
Published: (2025)

F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare
by: Plyusov, Daniil, et al.
Published: (2026)

Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards
by: Nguyen, Hieu Trung, et al.
Published: (2026)

DARTS: Distribution-Aware Active Rollout Trajectory Shaping for Accelerating LLM Reinforcement Learning
by: Wang, Yujie, et al.
Published: (2026)

Nested-ReFT: Efficient Reinforcement Learning for Large Language Model Fine-Tuning via Off-Policy Rollouts
by: Heuillet, Maxime, et al.
Published: (2025)

A Unified Framework for Rethinking Policy Divergence Measures in GRPO
by: Wu, Qingyuan, et al.
Published: (2026)

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO
by: Ren, Yiming, et al.
Published: (2026)

Superior Computer Chess with Model Predictive Control, Reinforcement Learning, and Rollout
by: Gundawar, Atharva, et al.
Published: (2024)

BubbleSpec: Turning Long-Tail Bubbles into Speculative Rollout Drafts for Synchronous Reinforcement Learning
by: Xu, Yuhang, et al.
Published: (2026)

Diffusion World Model: Future Modeling Beyond Step-by-Step Rollout for Offline Reinforcement Learning
by: Ding, Zihan, et al.
Published: (2024)

Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts
by: Zheng, Haizhong, et al.
Published: (2025)

Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing
by: Li, Gengsheng, et al.
Published: (2026)

Information-Consistent Language Model Recommendations through Group Relative Policy Optimization
by: Prabhune, Sonal, et al.
Published: (2025)

Group Orthogonalized Policy Optimization:Group Policy Optimization as Orthogonal Projection in Hilbert Space
by: Zixian, Wang
Published: (2026)

AMIR-GRPO: Inducing Implicit Preference Signals into GRPO
by: Yari, Amir Hossein, et al.
Published: (2026)

Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment
by: Wang, Jialu, et al.
Published: (2026)

Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts
by: Pang, Jing-Cheng, et al.
Published: (2024)

CoPRIS: Efficient and Stable Reinforcement Learning via Concurrency-Controlled Partial Rollout with Importance Sampling
by: Qu, Zekai, et al.
Published: (2025)