MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Banker, Thomas, Lawrence, Nathan P., Mesbah, Ali
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Machine Learning Systems and Control
Accesso online:	https://arxiv.org/abs/2604.01477
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866908933024645120
author	Banker, Thomas Lawrence, Nathan P. Mesbah, Ali
author_facet	Banker, Thomas Lawrence, Nathan P. Mesbah, Ali
contents	Reinforcement learning (RL) and model predictive control (MPC) offer complementary strengths, yet combining them at scale remains computationally challenging. We propose soft MPCritic, an RL-MPC framework that learns in (soft) value space while using sample-based planning for both online control and value target generation. soft MPCritic instantiates MPC through model predictive path integral control (MPPI) and trains a terminal Q-function with fitted value iteration, aligning the learned value function with the planner and implicitly extending the effective planning horizon. We introduce an amortized warm-start strategy that recycles planned open-loop action sequences from online observations when computing batched MPPI-based value targets. This makes soft MPCritic computationally practical, while preserving solution quality. soft MPCritic plans in a scenario-based fashion with an ensemble of dynamic models trained for next-step prediction accuracy. Together, these ingredients enable soft MPCritic to learn effectively through robust, short-horizon planning on classic and complex control tasks. These results establish soft MPCritic as a practical and scalable blueprint for synthesizing MPC policies in settings where policy extraction and direct, long-horizon planning may fail.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_01477
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Soft MPCritic: Amortized Model Predictive Value Iteration Banker, Thomas Lawrence, Nathan P. Mesbah, Ali Machine Learning Systems and Control Reinforcement learning (RL) and model predictive control (MPC) offer complementary strengths, yet combining them at scale remains computationally challenging. We propose soft MPCritic, an RL-MPC framework that learns in (soft) value space while using sample-based planning for both online control and value target generation. soft MPCritic instantiates MPC through model predictive path integral control (MPPI) and trains a terminal Q-function with fitted value iteration, aligning the learned value function with the planner and implicitly extending the effective planning horizon. We introduce an amortized warm-start strategy that recycles planned open-loop action sequences from online observations when computing batched MPPI-based value targets. This makes soft MPCritic computationally practical, while preserving solution quality. soft MPCritic plans in a scenario-based fashion with an ensemble of dynamic models trained for next-step prediction accuracy. Together, these ingredients enable soft MPCritic to learn effectively through robust, short-horizon planning on classic and complex control tasks. These results establish soft MPCritic as a practical and scalable blueprint for synthesizing MPC policies in settings where policy extraction and direct, long-horizon planning may fail.
title	Soft MPCritic: Amortized Model Predictive Value Iteration
topic	Machine Learning Systems and Control
url	https://arxiv.org/abs/2604.01477

Documenti analoghi