Saved in:
| Main Authors: | Vaartjes, Nathan, Francois-Lavet, Vincent |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.00011 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hadamax Encoding: Elevating Performance in Model-Free Atari
by: Kooi, Jacob E., et al.
Published: (2025)
by: Kooi, Jacob E., et al.
Published: (2025)
Disentangled (Un)Controllable Features
by: Kooi, Jacob E., et al.
Published: (2022)
by: Kooi, Jacob E., et al.
Published: (2022)
Short-Term-to-Long-Term Memory Transfer for Knowledge Graphs under Partial Observability
by: Kim, Taewoon, et al.
Published: (2026)
by: Kim, Taewoon, et al.
Published: (2026)
Temporal Knowledge-Graph Memory in a Partially Observable Environment
by: Kim, Taewoon, et al.
Published: (2024)
by: Kim, Taewoon, et al.
Published: (2024)
Hadamard Representation: Scaffolding Performance Across Model-free RL
by: Kooi, Jacob E., et al.
Published: (2024)
by: Kooi, Jacob E., et al.
Published: (2024)
Shielded Controller Units for RL with Operational Constraints Applied to Remote Microgrids
by: Nekoei, Hadi, et al.
Published: (2025)
by: Nekoei, Hadi, et al.
Published: (2025)
Sample-efficient and Scalable Exploration in Continuous-Time RL
by: Iten, Klemens, et al.
Published: (2025)
by: Iten, Klemens, et al.
Published: (2025)
On Entropy Control in LLM-RL Algorithms
by: Shen, Han
Published: (2025)
by: Shen, Han
Published: (2025)
Faster Synchronous On-Policy RL via Straggler-Aware Group Sizing
by: Khan, Azal Ahmad, et al.
Published: (2026)
by: Khan, Azal Ahmad, et al.
Published: (2026)
TreeAdv: Tree-Structured Advantage Redistribution for Group-Based RL
by: Cao, Lang, et al.
Published: (2026)
by: Cao, Lang, et al.
Published: (2026)
A Machine With Human-Like Memory Systems
by: Kim, Taewoon, et al.
Published: (2022)
by: Kim, Taewoon, et al.
Published: (2022)
Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning
by: Vanlioglu, Abdullah
Published: (2025)
by: Vanlioglu, Abdullah
Published: (2025)
A Machine with Short-Term, Episodic, and Semantic Memory Systems
by: Kim, Taewoon, et al.
Published: (2022)
by: Kim, Taewoon, et al.
Published: (2022)
Scilab-RL: A software framework for efficient reinforcement learning and cognitive modeling research
by: Dohmen, Jan, et al.
Published: (2024)
by: Dohmen, Jan, et al.
Published: (2024)
Deep RL With Information Constrained Policies: Generalization in Continuous Control
by: Malloy, Tailia, et al.
Published: (2020)
by: Malloy, Tailia, et al.
Published: (2020)
Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs
by: Huang, Luke J., et al.
Published: (2026)
by: Huang, Luke J., et al.
Published: (2026)
Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?
by: Qu, Yun, et al.
Published: (2025)
by: Qu, Yun, et al.
Published: (2025)
RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$
by: Bhatia, Abhinav, et al.
Published: (2023)
by: Bhatia, Abhinav, et al.
Published: (2023)
Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
by: Mark, Max Sobol, et al.
Published: (2024)
by: Mark, Max Sobol, et al.
Published: (2024)
Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models
by: Lin, Nianyi, et al.
Published: (2025)
by: Lin, Nianyi, et al.
Published: (2025)
GAC: Stabilizing Asynchronous RL Training for LLMs via Gradient Alignment Control
by: Xu, Haofeng, et al.
Published: (2026)
by: Xu, Haofeng, et al.
Published: (2026)
Learning Abstract World Models with a Group-Structured Latent Space
by: Delliaux, Thomas, et al.
Published: (2025)
by: Delliaux, Thomas, et al.
Published: (2025)
An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines
by: Su, Jianhai, et al.
Published: (2025)
by: Su, Jianhai, et al.
Published: (2025)
FastDSAC: Unlocking the Potential of Maximum Entropy RL in High-Dimensional Humanoid Control
by: Xue, Jun, et al.
Published: (2026)
by: Xue, Jun, et al.
Published: (2026)
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
by: Liu, Shih-Yang, et al.
Published: (2026)
by: Liu, Shih-Yang, et al.
Published: (2026)
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs
by: Samineni, Soumya Rani, et al.
Published: (2025)
by: Samineni, Soumya Rani, et al.
Published: (2025)
Rethinking RL Evaluation: Can Benchmarks Truly Reveal Failures of RL Methods?
by: Chen, Zihan, et al.
Published: (2025)
by: Chen, Zihan, et al.
Published: (2025)
Why Goal-Conditioned Reinforcement Learning Works: Relation to Dual Control
by: Lawrence, Nathan P., et al.
Published: (2025)
by: Lawrence, Nathan P., et al.
Published: (2025)
Leveraging weights signals -- Predicting and improving generalizability in reinforcement learning
by: Moulin, Olivier, et al.
Published: (2025)
by: Moulin, Olivier, et al.
Published: (2025)
Infra-Bayesian Reinforcement Learning Agents Outperform Classical RL For Worst-Case Robustness
by: Aryal, Manish, et al.
Published: (2026)
by: Aryal, Manish, et al.
Published: (2026)
Combining LLM decision and RL action selection to improve RL policy for adaptive interventions
by: Karine, Karine, et al.
Published: (2025)
by: Karine, Karine, et al.
Published: (2025)
Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners
by: Muslimani, Calarina, et al.
Published: (2025)
by: Muslimani, Calarina, et al.
Published: (2025)
SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling
by: Zhang, Yiqi, et al.
Published: (2026)
by: Zhang, Yiqi, et al.
Published: (2026)
SpikeRL: A Scalable and Energy-efficient Framework for Deep Spiking Reinforcement Learning
by: Tahmid, Tokey, et al.
Published: (2025)
by: Tahmid, Tokey, et al.
Published: (2025)
Debiased Model-based Representations for Sample-efficient Continuous Control
by: Lyu, Jiafei, et al.
Published: (2026)
by: Lyu, Jiafei, et al.
Published: (2026)
A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control
by: Choi, Wonhyeok, et al.
Published: (2026)
by: Choi, Wonhyeok, et al.
Published: (2026)
CtRL-Sim: Reactive and Controllable Driving Agents with Offline Reinforcement Learning
by: Rowe, Luke, et al.
Published: (2024)
by: Rowe, Luke, et al.
Published: (2024)
Budgeting Counterfactual for Offline RL
by: Liu, Yao, et al.
Published: (2023)
by: Liu, Yao, et al.
Published: (2023)
Explaining RL Decisions with Trajectories
by: Deshmukh, Shripad Vilasrao, et al.
Published: (2023)
by: Deshmukh, Shripad Vilasrao, et al.
Published: (2023)
VAM: Verbalized Action Masking for Controllable Exploration in RL Post-Training -- A Chess Case Study
by: Zhang, Zhicheng, et al.
Published: (2026)
by: Zhang, Zhicheng, et al.
Published: (2026)
Similar Items
-
Hadamax Encoding: Elevating Performance in Model-Free Atari
by: Kooi, Jacob E., et al.
Published: (2025) -
Disentangled (Un)Controllable Features
by: Kooi, Jacob E., et al.
Published: (2022) -
Short-Term-to-Long-Term Memory Transfer for Knowledge Graphs under Partial Observability
by: Kim, Taewoon, et al.
Published: (2026) -
Temporal Knowledge-Graph Memory in a Partially Observable Environment
by: Kim, Taewoon, et al.
Published: (2024) -
Hadamard Representation: Scaffolding Performance Across Model-free RL
by: Kooi, Jacob E., et al.
Published: (2024)