APA (7th ed.) Citation

Wan, X., Wang, Y., Huang, W., & Sun, M. (2026). Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning.

Chicago Style (17th ed.) Citation

Wan, Xu, Yansheng Wang, Wenqi Huang, and Mingyang Sun. Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning. 2026.

MLA (9th ed.) Citation

Wan, Xu, et al. Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning. 2026.

Warning: These citations may not always be 100% accurate.