Zhou, R., Du, S. S., & Li, B. (2024). Reflect-RL: Two-Player Online RL Fine-Tuning for LMs.
Chicago Style (17th ed.) CitationZhou, Runlong, Simon S. Du, and Beibin Li. Reflect-RL: Two-Player Online RL Fine-Tuning for LMs. 2024.
MLA (9th ed.) CitationZhou, Runlong, et al. Reflect-RL: Two-Player Online RL Fine-Tuning for LMs. 2024.
Warning: These citations may not always be 100% accurate.