APA (7th ed.) Citation

Roux, N. L., Bellemare, M. G., Lebensold, J., Bergeron, A., Greaves, J., Fréchette, A., . . . Work, S. (2025). Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs.

Chicago Style (17th ed.) Citation

Roux, Nicolas Le, et al. Tapered Off-Policy REINFORCE: Stable and Efficient Reinforcement Learning for LLMs. 2025.

MLA (9th ed.) Citation

Roux, Nicolas Le, et al. Tapered Off-Policy REINFORCE: Stable and Efficient Reinforcement Learning for LLMs. 2025.

Warning: These citations may not always be 100% accurate.