Sullivan, M., & Koller, A. (2025). GRPO is Secretly a Process Reward Model.
Chicago Style (17th ed.) CitationSullivan, Michael, and Alexander Koller. GRPO Is Secretly a Process Reward Model. 2025.
MLA (9th ed.) CitationSullivan, Michael, and Alexander Koller. GRPO Is Secretly a Process Reward Model. 2025.
Warning: These citations may not always be 100% accurate.