:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Vaartjes, Nathan, Francois-Lavet, Vincent
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2507.00011
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Hadamax Encoding: Elevating Performance in Model-Free Atari
by: Kooi, Jacob E., et al.
Published: (2025)

Disentangled (Un)Controllable Features
by: Kooi, Jacob E., et al.
Published: (2022)

Short-Term-to-Long-Term Memory Transfer for Knowledge Graphs under Partial Observability
by: Kim, Taewoon, et al.
Published: (2026)

Temporal Knowledge-Graph Memory in a Partially Observable Environment
by: Kim, Taewoon, et al.
Published: (2024)

Hadamard Representation: Scaffolding Performance Across Model-free RL
by: Kooi, Jacob E., et al.
Published: (2024)

Shielded Controller Units for RL with Operational Constraints Applied to Remote Microgrids
by: Nekoei, Hadi, et al.
Published: (2025)

Sample-efficient and Scalable Exploration in Continuous-Time RL
by: Iten, Klemens, et al.
Published: (2025)

On Entropy Control in LLM-RL Algorithms
by: Shen, Han
Published: (2025)

Faster Synchronous On-Policy RL via Straggler-Aware Group Sizing
by: Khan, Azal Ahmad, et al.
Published: (2026)

TreeAdv: Tree-Structured Advantage Redistribution for Group-Based RL
by: Cao, Lang, et al.
Published: (2026)

A Machine With Human-Like Memory Systems
by: Kim, Taewoon, et al.
Published: (2022)

Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning
by: Vanlioglu, Abdullah
Published: (2025)

A Machine with Short-Term, Episodic, and Semantic Memory Systems
by: Kim, Taewoon, et al.
Published: (2022)

Scilab-RL: A software framework for efficient reinforcement learning and cognitive modeling research
by: Dohmen, Jan, et al.
Published: (2024)

Deep RL With Information Constrained Policies: Generalization in Continuous Control
by: Malloy, Tailia, et al.
Published: (2020)

Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs
by: Huang, Luke J., et al.
Published: (2026)

Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?
by: Qu, Yun, et al.
Published: (2025)

RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$
by: Bhatia, Abhinav, et al.
Published: (2023)

Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
by: Mark, Max Sobol, et al.
Published: (2024)

Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models
by: Lin, Nianyi, et al.
Published: (2025)

GAC: Stabilizing Asynchronous RL Training for LLMs via Gradient Alignment Control
by: Xu, Haofeng, et al.
Published: (2026)

Learning Abstract World Models with a Group-Structured Latent Space
by: Delliaux, Thomas, et al.
Published: (2025)

An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines
by: Su, Jianhai, et al.
Published: (2025)

FastDSAC: Unlocking the Potential of Maximum Entropy RL in High-Dimensional Humanoid Control
by: Xue, Jun, et al.
Published: (2026)

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
by: Liu, Shih-Yang, et al.
Published: (2026)

RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs
by: Samineni, Soumya Rani, et al.
Published: (2025)

Rethinking RL Evaluation: Can Benchmarks Truly Reveal Failures of RL Methods?
by: Chen, Zihan, et al.
Published: (2025)

Why Goal-Conditioned Reinforcement Learning Works: Relation to Dual Control
by: Lawrence, Nathan P., et al.
Published: (2025)

Leveraging weights signals -- Predicting and improving generalizability in reinforcement learning
by: Moulin, Olivier, et al.
Published: (2025)

Infra-Bayesian Reinforcement Learning Agents Outperform Classical RL For Worst-Case Robustness
by: Aryal, Manish, et al.
Published: (2026)

Combining LLM decision and RL action selection to improve RL policy for adaptive interventions
by: Karine, Karine, et al.
Published: (2025)

Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners
by: Muslimani, Calarina, et al.
Published: (2025)

SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling
by: Zhang, Yiqi, et al.
Published: (2026)

SpikeRL: A Scalable and Energy-efficient Framework for Deep Spiking Reinforcement Learning
by: Tahmid, Tokey, et al.
Published: (2025)

Debiased Model-based Representations for Sample-efficient Continuous Control
by: Lyu, Jiafei, et al.
Published: (2026)

A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control
by: Choi, Wonhyeok, et al.
Published: (2026)

CtRL-Sim: Reactive and Controllable Driving Agents with Offline Reinforcement Learning
by: Rowe, Luke, et al.
Published: (2024)

Budgeting Counterfactual for Offline RL
by: Liu, Yao, et al.
Published: (2023)

Explaining RL Decisions with Trajectories
by: Deshmukh, Shripad Vilasrao, et al.
Published: (2023)

VAM: Verbalized Action Masking for Controllable Exploration in RL Post-Training -- A Chess Case Study
by: Zhang, Zhicheng, et al.
Published: (2026)