Search Results - Work, Sam
- Showing 1 - 1 results of 1
-
1
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs by Roux, Nicolas Le, Bellemare, Marc G., Lebensold, Jonathan, Bergeron, Arnaud, Greaves, Joshua, Fréchette, Alex, Pelletier, Carolyne, Thibodeau-Laufer, Eric, Toth, Sándor, Work, Sam
Published 2025Get full text
Preprint