Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Rozada, Sergio, Ding, Dongsheng, Marques, Antonio G., Ribeiro, Alejandro
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence Optimization and Control
Online Access:	https://arxiv.org/abs/2408.10015
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917976436899840
author	Rozada, Sergio Ding, Dongsheng Marques, Antonio G. Ribeiro, Alejandro
author_facet	Rozada, Sergio Ding, Dongsheng Marques, Antonio G. Ribeiro, Alejandro
contents	We study the problem of computing deterministic optimal policies for constrained Markov decision processes (MDPs) with continuous state and action spaces, which are widely encountered in constrained dynamical systems. Designing deterministic policy gradient methods in continuous state and action spaces is particularly challenging due to the lack of enumerable state-action pairs and the adoption of deterministic policies, hindering the application of existing policy gradient methods. To this end, we develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence. Specifically, we leverage regularization of the Lagrangian of the constrained MDP to propose a deterministic policy gradient primal-dual (D-PGPD) algorithm that updates the deterministic policy via a quadratic-regularized gradient ascent step and the dual variable via a quadratic-regularized gradient descent step. We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair. We instantiate D-PGPD with function approximation and prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair, up to a function approximation error. Furthermore, we demonstrate the effectiveness of our method in two continuous control problems: robot navigation and fluid control. This appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_10015
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs Rozada, Sergio Ding, Dongsheng Marques, Antonio G. Ribeiro, Alejandro Artificial Intelligence Optimization and Control We study the problem of computing deterministic optimal policies for constrained Markov decision processes (MDPs) with continuous state and action spaces, which are widely encountered in constrained dynamical systems. Designing deterministic policy gradient methods in continuous state and action spaces is particularly challenging due to the lack of enumerable state-action pairs and the adoption of deterministic policies, hindering the application of existing policy gradient methods. To this end, we develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence. Specifically, we leverage regularization of the Lagrangian of the constrained MDP to propose a deterministic policy gradient primal-dual (D-PGPD) algorithm that updates the deterministic policy via a quadratic-regularized gradient ascent step and the dual variable via a quadratic-regularized gradient descent step. We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair. We instantiate D-PGPD with function approximation and prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair, up to a function approximation error. Furthermore, we demonstrate the effectiveness of our method in two continuous control problems: robot navigation and fluid control. This appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.
title	Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs
topic	Artificial Intelligence Optimization and Control
url	https://arxiv.org/abs/2408.10015

Similar Items