:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Roux, Nicolas Le, Bellemare, Marc G., Lebensold, Jonathan, Bergeron, Arnaud, Greaves, Joshua, Fréchette, Alex, Pelletier, Carolyne, Thibodeau-Laufer, Eric, Toth, Sándor, Work, Sam
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2503.14286
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
by: Yao, Chaorui, et al.
Published: (2025)

Research Directions for Verifiable Crypto-Physically Secure TEEs
by: Bellemare, Sylvain
Published: (2024)

Marc Bellemare
by: Marc Bellemare
Published: (2024)

QuantFactor REINFORCE: Mining Steady Formulaic Alpha Factors with Variance-bounded REINFORCE
by: Zhao, Junjie, et al.
Published: (2024)

Taper-based scattering formulation of the Helmholtz equation to improve the training process of Physics-Informed Neural Networks
by: Dörfler, W., et al.
Published: (2024)

THE CONVENTION: THE SOLUTION TO REINFORCE ESDP?
by: MANUEL VÁZQUEZ MUÑOZ
Published: (2003)

REINFORCE++: Stabilizing Critic-Free Policy Optimization with Global Advantage Normalization
by: Hu, Jian, et al.
Published: (2025)

Non-Uniform Noise-to-Signal Ratio in the REINFORCE Policy-Gradient Estimator
by: Han, Haoyu, et al.
Published: (2026)

Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards
by: Arnal, Charles, et al.
Published: (2025)

Stable oxygen isotope analysis of water samples during helicopter/ice camp TRANSDRIFT-XX, Laptev Sea
by: Bauch, Dorothea, et al.
Published: (2020)

Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs
by: Huang, Luke J., et al.
Published: (2026)

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
by: Ahmadian, Arash, et al.
Published: (2024)

On the Privacy of Selection Mechanisms with Gaussian Noise
by: Lebensold, Jonathan, et al.
Published: (2024)

PODCASTING: TOOL TO DEVELOP AND REINFORCE EFL LEARNING
by: Anali Carolina Rodríguez-Castro
Published: (2023)

The Nuts and Bolts of Open Source: A Taxonomy of Glue Work in OSS Projects
by: Glue, Work
Published: (2026)

A Rank Order of Accurate Use at the Syntax-Pragmatics Interface: Evidence from French and Spanish L2 Acquisition
by: Nicola Work
Published: (2015)

TH-1517-DFD: Autonomous Interstellar Reconnaissance Spacecraft - Complete System Architecture & Mond Self-Repair
by: Thibodeau, Pascal
Published: (2026)

L'Humanité a Choisi la Mauvaise Guerre depuis 5 500 ans : Analyse métrologique ThibEquation v6.1 du vecteur guerrier et de la réponse institutionnelle à 3I/ATLAS
by: Thibodeau, Pascal
Published: (2026)

Inclusive Science in Canadian Federal Science Laboratories: Exploring Accessibility Policies and Best Practices
by: Sacha Ghandeharian, et al.
Published: (2026)

REINFORCE-ING Chemical Language Models for Drug Discovery
by: Thomas, Morgan, et al.
Published: (2025)

EVENTOS EXTREMOS DE PRECIPITAÇÃO NO ESTADO DO PARANÁ
by: Carolyne B. MACHADO
Published: (2013)

Global agricultural value chains and food prices
by: Bernhard Dalheimer, et al.
Published: (2025)

Final Report of the Pissarides Review into the Future of Work and Wellbeing
by: Institute for the Future of Work
Published: (2025)

Assignment: Library; The Use of Non-Research Library Topics in Composition Courses.
by: Work, James C.
Published: (1975)

Anatomía y fisiología / Gary A. Thibodeau, Kevin T. Patton; traductor, Diorki Servicios Integrales de Edición
by: Thibodeau, Gary A
Published: (2007)

PROJETO DE EXTENSÃO ILUMINE: A ENTRADA DA FIGURA DO PALHAÇO NO AMBIENTE HOSPITALAR
by: Cely Carolyne Pontes MORCERF
Published: (2015)

Complexity, Features, and Comparisons in Forensic Handwriting Examination
by: Kylie Jones, et al.
Published: (2024)

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
by: Jian, Hu
Published: (2025)

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control
by: Rahn, Nate, et al.
Published: (2023)

Increased retention after harvest better maintains carabid abundance in boreal mixedwood forests under two climate change scenarios
by: Lauren Egli, et al.
Published: (2025)

Forest harvest causes rapid changes of maternal investment strategies in ground beetles
by: Lauren Egli, et al.
Published: (2024)

REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective
by: Geisler, Simon, et al.
Published: (2025)

Designing Instance-Level Sampling Schedules via REINFORCE with James-Stein Shrinkage
by: Yu, Peiyu, et al.
Published: (2025)

Comp-LTL: Temporal Logic Planning via Zero-Shot Policy Composition
by: Bergeron, Taylor, et al.
Published: (2024)

Non‐Original and Digitally Captured Handwriting: Considerations for Forensic Handwriting Examinations
by: Kylie Jones, et al.
Published: (2024)

Ultra-Wideband Tapered Transducers in Thin-Film Lithium Niobate on Silicon Carbide
by: Kramer, Jack, et al.
Published: (2024)

Galahs near Melbourne
by: Greaves, T.
Published: (1928)

The work environment for managers during the Covid-19 pandemic
by: Swedish Agency for Work Environment Expertise
Published: (2023)

Guidelines for Managing social health risks at work– victimization and bullying
by: Swedish Agency for Work Environment Expertise
Published: (2022)

The work environment of those who remained in their regular workplaces during the Covid-19 pandemic – retail, transport, and social care
by: Swedish Agency for Work Environment Expertise
Published: (2023)