Saved in:
| Main Author: | Perrin-Gilbert, Nicolas |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.16159 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Solving Bayesian inverse problems with diffusion priors and off-policy RL
by: Scimeca, Luca, et al.
Published: (2025)
by: Scimeca, Luca, et al.
Published: (2025)
The Role of Deep Learning Regularizations on Actors in Offline RL
by: Tarasov, Denis, et al.
Published: (2024)
by: Tarasov, Denis, et al.
Published: (2024)
Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning
by: Tiofack, Franki Nguimatsia, et al.
Published: (2025)
by: Tiofack, Franki Nguimatsia, et al.
Published: (2025)
Combining LLM decision and RL action selection to improve RL policy for adaptive interventions
by: Karine, Karine, et al.
Published: (2025)
by: Karine, Karine, et al.
Published: (2025)
SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling
by: Gaven, Loris, et al.
Published: (2024)
by: Gaven, Loris, et al.
Published: (2024)
RL-GPT: Integrating Reinforcement Learning and Code-as-policy
by: Liu, Shaoteng, et al.
Published: (2024)
by: Liu, Shaoteng, et al.
Published: (2024)
Relative Importance Sampling for off-Policy Actor-Critic in Deep Reinforcement Learning
by: Humayoo, Mahammad, et al.
Published: (2018)
by: Humayoo, Mahammad, et al.
Published: (2018)
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL
by: Luo, Yu, et al.
Published: (2024)
by: Luo, Yu, et al.
Published: (2024)
Dissecting Deep RL with High Update Ratios: Combatting Value Divergence
by: Hussing, Marcel, et al.
Published: (2024)
by: Hussing, Marcel, et al.
Published: (2024)
Intelligent Switching for Reset-Free RL
by: Patil, Darshan, et al.
Published: (2024)
by: Patil, Darshan, et al.
Published: (2024)
Improving Zero-Shot Offline RL via Behavioral Task Sampling
by: Bendib, Nazim, et al.
Published: (2026)
by: Bendib, Nazim, et al.
Published: (2026)
Forager: a lightweight testbed for continual learning with partial observability in RL
by: Tang, Steven, et al.
Published: (2026)
by: Tang, Steven, et al.
Published: (2026)
Counterfactual experience augmented off-policy reinforcement learning
by: Lee, Sunbowen, et al.
Published: (2025)
by: Lee, Sunbowen, et al.
Published: (2025)
Actor-Free Continuous Control via Structurally Maximizable Q-Functions
by: Korkmaz, Yigit, et al.
Published: (2025)
by: Korkmaz, Yigit, et al.
Published: (2025)
ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL Problems
by: Cherepanov, Egor, et al.
Published: (2025)
by: Cherepanov, Egor, et al.
Published: (2025)
Cost Trade-offs in Matrix Inversion Updates for Streaming Outlier Detection
by: Grivet, Florian, et al.
Published: (2026)
by: Grivet, Florian, et al.
Published: (2026)
Robust off-policy Reinforcement Learning via Soft Constrained Adversary
by: Nakanishi, Kosuke, et al.
Published: (2024)
by: Nakanishi, Kosuke, et al.
Published: (2024)
Investigating Memory in Model-Free RL with POPGym Arcade
by: Wang, Zekang, et al.
Published: (2025)
by: Wang, Zekang, et al.
Published: (2025)
Action-Free Offline-to-Online RL via Discretised State Policies
by: Neggatu, Natinael Solomon, et al.
Published: (2026)
by: Neggatu, Natinael Solomon, et al.
Published: (2026)
Verifier-Free RL for LLMs via Intrinsic Gradient-Norm Reward
by: Wen, Xuexiang, et al.
Published: (2026)
by: Wen, Xuexiang, et al.
Published: (2026)
RL for Reasoning by Adaptively Revealing Rationales
by: Amani, Mohammad Hossein, et al.
Published: (2025)
by: Amani, Mohammad Hossein, et al.
Published: (2025)
PROMA: Projected Microbatch Accumulation for Reference-Free Proximal Policy Updates
by: Abrahamsen, Nilin
Published: (2026)
by: Abrahamsen, Nilin
Published: (2026)
Value Improved Actor Critic Algorithms
by: Oren, Yaniv, et al.
Published: (2024)
by: Oren, Yaniv, et al.
Published: (2024)
Diffusion Actor-Critic with Entropy Regulator
by: Wang, Yinuo, et al.
Published: (2024)
by: Wang, Yinuo, et al.
Published: (2024)
Revisiting Discrete Soft Actor-Critic
by: Zhou, Haibin, et al.
Published: (2022)
by: Zhou, Haibin, et al.
Published: (2022)
Average-Reward Soft Actor-Critic
by: Adamczyk, Jacob, et al.
Published: (2025)
by: Adamczyk, Jacob, et al.
Published: (2025)
MAGICS: Adversarial RL with Minimax Actors Guided by Implicit Critic Stackelberg for Convergent Neural Synthesis of Robot Safety
by: Wang, Justin, et al.
Published: (2024)
by: Wang, Justin, et al.
Published: (2024)
Offline Actor-Critic Reinforcement Learning Scales to Large Models
by: Springenberg, Jost Tobias, et al.
Published: (2024)
by: Springenberg, Jost Tobias, et al.
Published: (2024)
Flow Actor-Critic for Offline Reinforcement Learning
by: Chae, Jongseong, et al.
Published: (2026)
by: Chae, Jongseong, et al.
Published: (2026)
Distributional Soft Actor-Critic with Diffusion Policy
by: Liu, Tong, et al.
Published: (2025)
by: Liu, Tong, et al.
Published: (2025)
A Safety Modulator Actor-Critic Method in Model-Free Safe Reinforcement Learning and Application in UAV Hovering
by: Qi, Qihan, et al.
Published: (2024)
by: Qi, Qihan, et al.
Published: (2024)
RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$
by: Bhatia, Abhinav, et al.
Published: (2023)
by: Bhatia, Abhinav, et al.
Published: (2023)
You Need Reasoning to Learn Reasoning: The Limitations of Label-Free RL in Weak Base Models
by: Roy, Shuvendu, et al.
Published: (2025)
by: Roy, Shuvendu, et al.
Published: (2025)
Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
by: Mark, Max Sobol, et al.
Published: (2024)
by: Mark, Max Sobol, et al.
Published: (2024)
Relational Object-Centric Actor-Critic
by: Ugadiarov, Leonid, et al.
Published: (2023)
by: Ugadiarov, Leonid, et al.
Published: (2023)
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
by: Dong, Yihong, et al.
Published: (2025)
by: Dong, Yihong, et al.
Published: (2025)
Scalable Neighborhood-Based Multi-Agent Actor-Critic
by: Goppelsroeder, Tim, et al.
Published: (2026)
by: Goppelsroeder, Tim, et al.
Published: (2026)
Revisiting Mixture Policies in Entropy-Regularized Actor-Critic
by: He, Jiamin, et al.
Published: (2026)
by: He, Jiamin, et al.
Published: (2026)
SACn: Soft Actor-Critic with n-step Returns
by: Łyskawa, Jakub, et al.
Published: (2025)
by: Łyskawa, Jakub, et al.
Published: (2025)
Actor-Critics Can Achieve Optimal Sample Efficiency
by: Tan, Kevin, et al.
Published: (2025)
by: Tan, Kevin, et al.
Published: (2025)
Similar Items
-
Solving Bayesian inverse problems with diffusion priors and off-policy RL
by: Scimeca, Luca, et al.
Published: (2025) -
The Role of Deep Learning Regularizations on Actors in Offline RL
by: Tarasov, Denis, et al.
Published: (2024) -
Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning
by: Tiofack, Franki Nguimatsia, et al.
Published: (2025) -
Combining LLM decision and RL action selection to improve RL policy for adaptive interventions
by: Karine, Karine, et al.
Published: (2025) -
SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling
by: Gaven, Loris, et al.
Published: (2024)