Saved in:
| Main Author: | Kim, Youngeun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.22582 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GRAPH-GRPO-LEX: Contract Graph Modeling and Reinforcement Learning with Group Relative Policy Optimization
by: Dechtiar, Moriya, et al.
Published: (2025)
by: Dechtiar, Moriya, et al.
Published: (2025)
AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering
by: Cai, Yuzhu, et al.
Published: (2026)
by: Cai, Yuzhu, et al.
Published: (2026)
Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
by: Zhang, Xichen, et al.
Published: (2025)
by: Zhang, Xichen, et al.
Published: (2025)
GRPO-VPS: Enhancing Group Relative Policy Optimization with Verifiable Process Supervision for Effective Reasoning
by: Wang, Jingyi, et al.
Published: (2026)
by: Wang, Jingyi, et al.
Published: (2026)
EP-GRPO: Entropy-Progress Aligned Group Relative Policy Optimization with Implicit Process Guidance
by: Yu, Song, et al.
Published: (2026)
by: Yu, Song, et al.
Published: (2026)
Train Less, Learn More: Adaptive Efficient Rollout Optimization for Group-Based Reinforcement Learning
by: Zhang, Zhi, et al.
Published: (2026)
by: Zhang, Zhi, et al.
Published: (2026)
Co-GRPO: Co-Optimized Group Relative Policy Optimization for Masked Diffusion Model
by: Zhou, Renping, et al.
Published: (2025)
by: Zhou, Renping, et al.
Published: (2025)
IB-GRPO: Aligning LLM-based Learning Path Recommendation with Educational Objectives via Indicator-Based Group Relative Policy Optimization
by: Wang, Shuai, et al.
Published: (2026)
by: Wang, Shuai, et al.
Published: (2026)
SPEC-RL: Accelerating On-Policy Reinforcement Learning with Speculative Rollouts
by: Liu, Bingshuai, et al.
Published: (2025)
by: Liu, Bingshuai, et al.
Published: (2025)
WS-GRPO: Weakly-Supervised Group-Relative Policy Optimization for Rollout-Efficient Reasoning
by: Mundada, Gagan, et al.
Published: (2026)
by: Mundada, Gagan, et al.
Published: (2026)
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
by: Xu, Yixuan Even, et al.
Published: (2025)
by: Xu, Yixuan Even, et al.
Published: (2025)
SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization
by: Zheng, Zhi, et al.
Published: (2025)
by: Zheng, Zhi, et al.
Published: (2025)
How to Allocate, How to Learn? Dynamic Rollout Allocation and Advantage Modulation for Policy Optimization
by: Fang, Yangyi, et al.
Published: (2026)
by: Fang, Yangyi, et al.
Published: (2026)
EchoRL: Reinforcement Learning via Rollout Echoing
by: Bi, Jinhe, et al.
Published: (2026)
by: Bi, Jinhe, et al.
Published: (2026)
Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards
by: Lu, Xiaodong, et al.
Published: (2026)
by: Lu, Xiaodong, et al.
Published: (2026)
GEPO: Group Expectation Policy Optimization for Stable Heterogeneous Reinforcement Learning
by: Zhang, Han, et al.
Published: (2025)
by: Zhang, Han, et al.
Published: (2025)
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
by: Yao, Chaorui, et al.
Published: (2025)
by: Yao, Chaorui, et al.
Published: (2025)
NGRPO: Negative-enhanced Group Relative Policy Optimization
by: Nan, Gongrui, et al.
Published: (2025)
by: Nan, Gongrui, et al.
Published: (2025)
Guided Cooperation in Hierarchical Reinforcement Learning via Model-based Rollout
by: Wang, Haoran, et al.
Published: (2023)
by: Wang, Haoran, et al.
Published: (2023)
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models
by: Dai, Muzhi, et al.
Published: (2025)
by: Dai, Muzhi, et al.
Published: (2025)
Hybrid Group Relative Policy Optimization: A Multi-Sample Approach to Enhancing Policy Optimization
by: Sane, Soham
Published: (2025)
by: Sane, Soham
Published: (2025)
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation
by: Zhou, Yuzhen, et al.
Published: (2025)
by: Zhou, Yuzhen, et al.
Published: (2025)
Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO
by: Chen, Peter, et al.
Published: (2025)
by: Chen, Peter, et al.
Published: (2025)
F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare
by: Plyusov, Daniil, et al.
Published: (2026)
by: Plyusov, Daniil, et al.
Published: (2026)
Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards
by: Nguyen, Hieu Trung, et al.
Published: (2026)
by: Nguyen, Hieu Trung, et al.
Published: (2026)
DARTS: Distribution-Aware Active Rollout Trajectory Shaping for Accelerating LLM Reinforcement Learning
by: Wang, Yujie, et al.
Published: (2026)
by: Wang, Yujie, et al.
Published: (2026)
Nested-ReFT: Efficient Reinforcement Learning for Large Language Model Fine-Tuning via Off-Policy Rollouts
by: Heuillet, Maxime, et al.
Published: (2025)
by: Heuillet, Maxime, et al.
Published: (2025)
A Unified Framework for Rethinking Policy Divergence Measures in GRPO
by: Wu, Qingyuan, et al.
Published: (2026)
by: Wu, Qingyuan, et al.
Published: (2026)
Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO
by: Ren, Yiming, et al.
Published: (2026)
by: Ren, Yiming, et al.
Published: (2026)
Superior Computer Chess with Model Predictive Control, Reinforcement Learning, and Rollout
by: Gundawar, Atharva, et al.
Published: (2024)
by: Gundawar, Atharva, et al.
Published: (2024)
BubbleSpec: Turning Long-Tail Bubbles into Speculative Rollout Drafts for Synchronous Reinforcement Learning
by: Xu, Yuhang, et al.
Published: (2026)
by: Xu, Yuhang, et al.
Published: (2026)
Diffusion World Model: Future Modeling Beyond Step-by-Step Rollout for Offline Reinforcement Learning
by: Ding, Zihan, et al.
Published: (2024)
by: Ding, Zihan, et al.
Published: (2024)
Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts
by: Zheng, Haizhong, et al.
Published: (2025)
by: Zheng, Haizhong, et al.
Published: (2025)
Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing
by: Li, Gengsheng, et al.
Published: (2026)
by: Li, Gengsheng, et al.
Published: (2026)
Information-Consistent Language Model Recommendations through Group Relative Policy Optimization
by: Prabhune, Sonal, et al.
Published: (2025)
by: Prabhune, Sonal, et al.
Published: (2025)
Group Orthogonalized Policy Optimization:Group Policy Optimization as Orthogonal Projection in Hilbert Space
by: Zixian, Wang
Published: (2026)
by: Zixian, Wang
Published: (2026)
AMIR-GRPO: Inducing Implicit Preference Signals into GRPO
by: Yari, Amir Hossein, et al.
Published: (2026)
by: Yari, Amir Hossein, et al.
Published: (2026)
Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment
by: Wang, Jialu, et al.
Published: (2026)
by: Wang, Jialu, et al.
Published: (2026)
Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts
by: Pang, Jing-Cheng, et al.
Published: (2024)
by: Pang, Jing-Cheng, et al.
Published: (2024)
CoPRIS: Efficient and Stable Reinforcement Learning via Concurrency-Controlled Partial Rollout with Importance Sampling
by: Qu, Zekai, et al.
Published: (2025)
by: Qu, Zekai, et al.
Published: (2025)
Similar Items
-
GRAPH-GRPO-LEX: Contract Graph Modeling and Reinforcement Learning with Group Relative Policy Optimization
by: Dechtiar, Moriya, et al.
Published: (2025) -
AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering
by: Cai, Yuzhu, et al.
Published: (2026) -
Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
by: Zhang, Xichen, et al.
Published: (2025) -
GRPO-VPS: Enhancing Group Relative Policy Optimization with Verifiable Process Supervision for Effective Reasoning
by: Wang, Jingyi, et al.
Published: (2026) -
EP-GRPO: Entropy-Progress Aligned Group Relative Policy Optimization with Implicit Process Guidance
by: Yu, Song, et al.
Published: (2026)