Ai, R., Pan, Y., Simchi-Levi, D., & Wang, C. (2026). ShapE-GRPO: Shapley-Enhanced Reward Allocation for Multi-Candidate LLM Training.
Chicago Style (17th ed.) CitationAi, Rui, Yu Pan, David Simchi-Levi, and Chonghuan Wang. ShapE-GRPO: Shapley-Enhanced Reward Allocation for Multi-Candidate LLM Training. 2026.
MLA (9th ed.) CitationAi, Rui, et al. ShapE-GRPO: Shapley-Enhanced Reward Allocation for Multi-Candidate LLM Training. 2026.
Warning: These citations may not always be 100% accurate.