Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Serra-Gomez, Álvaro, Ornia, Daniel Jarne, Tirumala, Dhruva, Moerland, Thomas
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Robotics
Online Access:	https://arxiv.org/abs/2510.04280
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914583920246784
author	Serra-Gomez, Álvaro Ornia, Daniel Jarne Tirumala, Dhruva Moerland, Thomas
author_facet	Serra-Gomez, Álvaro Ornia, Daniel Jarne Tirumala, Dhruva Moerland, Thomas
contents	Effective exploration remains a central challenge in model-based reinforcement learning (MBRL), particularly in high-dimensional continuous control tasks where sample efficiency is crucial. A prominent line of recent work leverages learned policies as proposal distributions for Model-Predictive Path Integral (MPPI) planning. Initial approaches update the sampling policy independently of the planner distribution, typically maximizing a learned value function with deterministic policy gradient and entropy regularization. However, because the states encountered during training depend on the MPPI planner, aligning the sampling policy with the planner improves the accuracy of value estimation and long-term performance. To this end, recent methods update the sampling policy by minimizing KL divergence to the planner distribution or by introducing planner-guided regularization into the policy update. In this work, we unify these MPPI-based reinforcement learning methods under a single framework by introducing Policy Optimization-Model Predictive Control (PO-MPC), a family of KL-regularized MBRL methods that integrate the planner's action distribution as a prior in policy optimization. By aligning the learned policy with the planner's behavior, PO-MPC allows more flexibility in the policy updates to trade off Return maximization and KL divergence minimization. We clarify how prior approaches emerge as special cases of this family, and we explore previously unstudied variations. Our experiments show that these extended configurations yield significant performance improvements, advancing the state of the art in MPPI-based RL.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_04280
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	A KL-regularization Framework for Learning to Plan with Adaptive Priors Serra-Gomez, Álvaro Ornia, Daniel Jarne Tirumala, Dhruva Moerland, Thomas Machine Learning Artificial Intelligence Robotics Effective exploration remains a central challenge in model-based reinforcement learning (MBRL), particularly in high-dimensional continuous control tasks where sample efficiency is crucial. A prominent line of recent work leverages learned policies as proposal distributions for Model-Predictive Path Integral (MPPI) planning. Initial approaches update the sampling policy independently of the planner distribution, typically maximizing a learned value function with deterministic policy gradient and entropy regularization. However, because the states encountered during training depend on the MPPI planner, aligning the sampling policy with the planner improves the accuracy of value estimation and long-term performance. To this end, recent methods update the sampling policy by minimizing KL divergence to the planner distribution or by introducing planner-guided regularization into the policy update. In this work, we unify these MPPI-based reinforcement learning methods under a single framework by introducing Policy Optimization-Model Predictive Control (PO-MPC), a family of KL-regularized MBRL methods that integrate the planner's action distribution as a prior in policy optimization. By aligning the learned policy with the planner's behavior, PO-MPC allows more flexibility in the policy updates to trade off Return maximization and KL divergence minimization. We clarify how prior approaches emerge as special cases of this family, and we explore previously unstudied variations. Our experiments show that these extended configurations yield significant performance improvements, advancing the state of the art in MPPI-based RL.
title	A KL-regularization Framework for Learning to Plan with Adaptive Priors
topic	Machine Learning Artificial Intelligence Robotics
url	https://arxiv.org/abs/2510.04280

Similar Items