Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Müller, Johannes, Montúfar, Guido
Format:	Preprint
Published:	2022
Subjects:	Optimization and Control Machine Learning Systems and Control 90C40, 53B12, 90C53
Online Access:	https://arxiv.org/abs/2211.02105
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909112702337024
author	Müller, Johannes Montúfar, Guido
author_facet	Müller, Johannes Montúfar, Guido
contents	We study the convergence of several natural policy gradient (NPG) methods in infinite-horizon discounted Markov decision processes with regular policy parametrizations. For a variety of NPGs and reward functions we show that the trajectories in state-action space are solutions of gradient flows with respect to Hessian geometries, based on which we obtain global convergence guarantees and convergence rates. In particular, we show linear convergence for unregularized and regularized NPG flows with the metrics proposed by Kakade and Morimura and co-authors by observing that these arise from the Hessian geometries of conditional entropy and entropy respectively. Further, we obtain sublinear convergence rates for Hessian geometries arising from other convex functions like log-barriers. Finally, we interpret the discrete-time NPG methods with regularized rewards as inexact Newton methods if the NPG is defined with respect to the Hessian geometry of the regularizer. This yields local quadratic convergence rates of these methods for step size equal to the penalization strength.
format	Preprint
id	arxiv_https___arxiv_org_abs_2211_02105
institution	arXiv
publishDate	2022
record_format	arxiv
spellingShingle	Geometry and convergence of natural policy gradient methods Müller, Johannes Montúfar, Guido Optimization and Control Machine Learning Systems and Control 90C40, 53B12, 90C53 We study the convergence of several natural policy gradient (NPG) methods in infinite-horizon discounted Markov decision processes with regular policy parametrizations. For a variety of NPGs and reward functions we show that the trajectories in state-action space are solutions of gradient flows with respect to Hessian geometries, based on which we obtain global convergence guarantees and convergence rates. In particular, we show linear convergence for unregularized and regularized NPG flows with the metrics proposed by Kakade and Morimura and co-authors by observing that these arise from the Hessian geometries of conditional entropy and entropy respectively. Further, we obtain sublinear convergence rates for Hessian geometries arising from other convex functions like log-barriers. Finally, we interpret the discrete-time NPG methods with regularized rewards as inexact Newton methods if the NPG is defined with respect to the Hessian geometry of the regularizer. This yields local quadratic convergence rates of these methods for step size equal to the penalization strength.
title	Geometry and convergence of natural policy gradient methods
topic	Optimization and Control Machine Learning Systems and Control 90C40, 53B12, 90C53
url	https://arxiv.org/abs/2211.02105

Similar Items