:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hoy, William, Wang, Binxu, Pan, Xu
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2604.01499
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LithoGRPO: Fast Inverse Lithography via GRPO Reinforced Flow Matching
by: Lai, Yao, et al.
Published: (2026)

S-GRPO: Unified Post-Training for Large Vision-Language Models
by: Yan, Yuming, et al.
Published: (2026)

TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models
by: Ding, Zheng, et al.
Published: (2025)

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training
by: Xu, Yuanda, et al.
Published: (2026)

Circuit Mechanisms for Spatial Relation Generation in Diffusion Transformers
by: Wang, Binxu, et al.
Published: (2026)

Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training
by: Wang, Yuanyi, et al.
Published: (2026)

SFT-GRPO Data Overlap as a Post-Training Hyperparameter for Autoformalization
by: Su, Xiaole, et al.
Published: (2026)

Hard Examples Are All You Need: Maximizing GRPO Post-Training Under Annotation Budgets
by: Pikus, Benjamin, et al.
Published: (2025)

Graph-GRPO: Training Graph Flow Models with Reinforcement Learning
by: Zhu, Baoheng, et al.
Published: (2026)

MomentKV: Closing the Directional Gap in KV Cache Eviction for Long-Context Inference
by: Li, Yu, et al.
Published: (2026)

How Off-Policy Can GRPO Be? Mu-GRPO for Efficient LLM Reinforcement Learning
by: Tian, Minghao, et al.
Published: (2026)

Stepwise Credit Assignment for GRPO on Flow-Matching Models
by: Savani, Yash, et al.
Published: (2026)

STABLE: Gated Continual Learning for Large Language Models
by: Hoy, William, et al.
Published: (2025)

JigsawRL: Assembling RL Pipelines for Efficient LLM Post-Training
by: Hu, Zhengding, et al.
Published: (2026)

MMR-GRPO: Accelerating GRPO-Style Training through Diversity-Aware Reward Reweighting
by: Wei, Kangda, et al.
Published: (2026)

PostTrainBench: Can LLM Agents Automate LLM Post-Training?
by: Rank, Ben, et al.
Published: (2026)

GRPO-$λ$: Credit Assignment improves LLM Reasoning
by: Parthasarathi, Prasanna, et al.
Published: (2025)

Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models
by: Nimmaturi, Datta, et al.
Published: (2025)

GRPO-Guard: Mitigating Implicit Over-Optimization in Flow Matching via Regulated Clipping
by: Wang, Jing, et al.
Published: (2025)

Dr. Post-Training: A Data Regularization Perspective on LLM Post-Training
by: Hu, Pingbang, et al.
Published: (2026)

Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training
by: Shi, Hengyu, et al.
Published: (2026)

AMIR-GRPO: Inducing Implicit Preference Signals into GRPO
by: Yari, Amir Hossein, et al.
Published: (2026)

Accuracy vs. Accuracy: Computational Tradeoffs Between Classification Rates and Utility
by: Amit, Noga, et al.
Published: (2025)

Multi-Task GRPO: Reliable LLM Reasoning Across Tasks
by: Ramesh, Shyam Sundhar, et al.
Published: (2026)

Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them
by: Rajani, Neel, et al.
Published: (2025)

DRA-GRPO: Your GRPO Needs to Know Diverse Reasoning Paths for Mathematical Reasoning
by: Chen, Xiwen, et al.
Published: (2025)

GRPO-RM: Fine-Tuning Representation Models via GRPO-Driven Reinforcement Learning
by: Xu, Yanchen, et al.
Published: (2025)

Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training
by: Du, Xianzhi, et al.
Published: (2024)

Prefix Grouper: Efficient GRPO Training through Shared-Prefix Forward
by: Liu, Zikang, et al.
Published: (2025)

Prompt Curriculum Learning for Efficient LLM Post-Training
by: Gao, Zhaolin, et al.
Published: (2025)

On the Evolution of Federated Post-Training Large Language Models: A Model Accessibility View
by: Guo, Tao, et al.
Published: (2025)

Consolidating Rewarded Perturbations for LLM Post-Training
by: Zhang, Zheyu, et al.
Published: (2026)

Automatic Configuration of LLM Post-Training Pipelines
by: Chwa, Channe, et al.
Published: (2026)

AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training
by: Han, Zhenyu, et al.
Published: (2025)

Reinforce Adjoint Matching: Scaling RL Post-Training of Diffusion and Flow-Matching Models
by: Bergmeister, Andreas, et al.
Published: (2026)

Divergence Minimization Preference Optimization for Diffusion Model Alignment
by: Li, Binxu, et al.
Published: (2025)

f-GRPO and Beyond: Divergence-Based Reinforcement Learning Algorithms for General LLM Alignment
by: Haldar, Rajdeep, et al.
Published: (2026)

An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models
by: Wang, Binxu, et al.
Published: (2025)

CapTrack: Multifaceted Evaluation of Forgetting in LLM Post-Training
by: Thede, Lukas, et al.
Published: (2026)

Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement
by: Lian, Yongsheng
Published: (2025)