:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Perrin-Gilbert, Nicolas
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2404.16159
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Solving Bayesian inverse problems with diffusion priors and off-policy RL
by: Scimeca, Luca, et al.
Published: (2025)

The Role of Deep Learning Regularizations on Actors in Offline RL
by: Tarasov, Denis, et al.
Published: (2024)

Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning
by: Tiofack, Franki Nguimatsia, et al.
Published: (2025)

Combining LLM decision and RL action selection to improve RL policy for adaptive interventions
by: Karine, Karine, et al.
Published: (2025)

SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling
by: Gaven, Loris, et al.
Published: (2024)

RL-GPT: Integrating Reinforcement Learning and Code-as-policy
by: Liu, Shaoteng, et al.
Published: (2024)

Relative Importance Sampling for off-Policy Actor-Critic in Deep Reinforcement Learning
by: Humayoo, Mahammad, et al.
Published: (2018)

Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL
by: Luo, Yu, et al.
Published: (2024)

Dissecting Deep RL with High Update Ratios: Combatting Value Divergence
by: Hussing, Marcel, et al.
Published: (2024)

Intelligent Switching for Reset-Free RL
by: Patil, Darshan, et al.
Published: (2024)

Improving Zero-Shot Offline RL via Behavioral Task Sampling
by: Bendib, Nazim, et al.
Published: (2026)

Forager: a lightweight testbed for continual learning with partial observability in RL
by: Tang, Steven, et al.
Published: (2026)

Counterfactual experience augmented off-policy reinforcement learning
by: Lee, Sunbowen, et al.
Published: (2025)

Actor-Free Continuous Control via Structurally Maximizable Q-Functions
by: Korkmaz, Yigit, et al.
Published: (2025)

ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL Problems
by: Cherepanov, Egor, et al.
Published: (2025)

Cost Trade-offs in Matrix Inversion Updates for Streaming Outlier Detection
by: Grivet, Florian, et al.
Published: (2026)

Robust off-policy Reinforcement Learning via Soft Constrained Adversary
by: Nakanishi, Kosuke, et al.
Published: (2024)

Investigating Memory in Model-Free RL with POPGym Arcade
by: Wang, Zekang, et al.
Published: (2025)

Action-Free Offline-to-Online RL via Discretised State Policies
by: Neggatu, Natinael Solomon, et al.
Published: (2026)

Verifier-Free RL for LLMs via Intrinsic Gradient-Norm Reward
by: Wen, Xuexiang, et al.
Published: (2026)

RL for Reasoning by Adaptively Revealing Rationales
by: Amani, Mohammad Hossein, et al.
Published: (2025)

PROMA: Projected Microbatch Accumulation for Reference-Free Proximal Policy Updates
by: Abrahamsen, Nilin
Published: (2026)

Value Improved Actor Critic Algorithms
by: Oren, Yaniv, et al.
Published: (2024)

Diffusion Actor-Critic with Entropy Regulator
by: Wang, Yinuo, et al.
Published: (2024)

Revisiting Discrete Soft Actor-Critic
by: Zhou, Haibin, et al.
Published: (2022)

Average-Reward Soft Actor-Critic
by: Adamczyk, Jacob, et al.
Published: (2025)

MAGICS: Adversarial RL with Minimax Actors Guided by Implicit Critic Stackelberg for Convergent Neural Synthesis of Robot Safety
by: Wang, Justin, et al.
Published: (2024)

Offline Actor-Critic Reinforcement Learning Scales to Large Models
by: Springenberg, Jost Tobias, et al.
Published: (2024)

Flow Actor-Critic for Offline Reinforcement Learning
by: Chae, Jongseong, et al.
Published: (2026)

Distributional Soft Actor-Critic with Diffusion Policy
by: Liu, Tong, et al.
Published: (2025)

A Safety Modulator Actor-Critic Method in Model-Free Safe Reinforcement Learning and Application in UAV Hovering
by: Qi, Qihan, et al.
Published: (2024)

RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$
by: Bhatia, Abhinav, et al.
Published: (2023)

You Need Reasoning to Learn Reasoning: The Limitations of Label-Free RL in Weak Base Models
by: Roy, Shuvendu, et al.
Published: (2025)

Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
by: Mark, Max Sobol, et al.
Published: (2024)

Relational Object-Centric Actor-Critic
by: Ugadiarov, Leonid, et al.
Published: (2023)

RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
by: Dong, Yihong, et al.
Published: (2025)

Scalable Neighborhood-Based Multi-Agent Actor-Critic
by: Goppelsroeder, Tim, et al.
Published: (2026)

Revisiting Mixture Policies in Entropy-Regularized Actor-Critic
by: He, Jiamin, et al.
Published: (2026)

SACn: Soft Actor-Critic with n-step Returns
by: Łyskawa, Jakub, et al.
Published: (2025)

Actor-Critics Can Achieve Optimal Sample Efficiency
by: Tan, Kevin, et al.
Published: (2025)