Author Search Results :: Library Catalog

1

Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs by Roux, Nicolas Le, Bellemare, Marc G., Lebensold, Jonathan, Bergeron, Arnaud, Greaves, Joshua, Fréchette, Alex, Pelletier, Carolyne, Thibodeau-Laufer, Eric, Toth, Sándor, Work, Sam

Published 2025

Get full text

Preprint

Standalone Record
Save to List

Saved in: