Wan, X., Wang, Y., Huang, W., & Sun, M. (2026). Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning.
Chicago Style (17th ed.) CitationWan, Xu, Yansheng Wang, Wenqi Huang, and Mingyang Sun. Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning. 2026.
MLA (9th ed.) CitationWan, Xu, et al. Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning. 2026.
Warning: These citations may not always be 100% accurate.