Zhong, H., Zhai, J., Song, L., Bian, J., Liu, Q., & Tan, T. (2026). RC-GRPO: Reward-Conditioned Group Relative Policy Optimization for Multi-Turn Tool Calling Agents.
Chicago Style (17th ed.) CitationZhong, Haitian, Jixiu Zhai, Lei Song, Jiang Bian, Qiang Liu, and Tieniu Tan. RC-GRPO: Reward-Conditioned Group Relative Policy Optimization for Multi-Turn Tool Calling Agents. 2026.
MLA (9th ed.) CitationZhong, Haitian, et al. RC-GRPO: Reward-Conditioned Group Relative Policy Optimization for Multi-Turn Tool Calling Agents. 2026.
Warning: These citations may not always be 100% accurate.