Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Qiuhao, Ho, Chin Pang, Petrik, Marek
Format:	Preprint
Published:	2022
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2212.10439
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914794815094784
author	Wang, Qiuhao Ho, Chin Pang Petrik, Marek
author_facet	Wang, Qiuhao Ho, Chin Pang Petrik, Marek
contents	Robust Markov decision processes (RMDPs) provide a promising framework for computing reliable policies in the face of model errors. Many successful reinforcement learning algorithms build on variations of policy-gradient methods, but adapting these methods to RMDPs has been challenging. As a result, the applicability of RMDPs to large, practical domains remains limited. This paper proposes a new Double-Loop Robust Policy Gradient (DRPG), the first generic policy gradient method for RMDPs. In contrast with prior robust policy gradient algorithms, DRPG monotonically reduces approximation errors to guarantee convergence to a globally optimal policy in tabular RMDPs. We introduce a novel parametric transition kernel and solve the inner loop robust policy via a gradient-based method. Finally, our numerical results demonstrate the utility of our new algorithm and confirm its global convergence properties.
format	Preprint
id	arxiv_https___arxiv_org_abs_2212_10439
institution	arXiv
publishDate	2022
record_format	arxiv
spellingShingle	Policy Gradient in Robust MDPs with Global Convergence Guarantee Wang, Qiuhao Ho, Chin Pang Petrik, Marek Machine Learning Robust Markov decision processes (RMDPs) provide a promising framework for computing reliable policies in the face of model errors. Many successful reinforcement learning algorithms build on variations of policy-gradient methods, but adapting these methods to RMDPs has been challenging. As a result, the applicability of RMDPs to large, practical domains remains limited. This paper proposes a new Double-Loop Robust Policy Gradient (DRPG), the first generic policy gradient method for RMDPs. In contrast with prior robust policy gradient algorithms, DRPG monotonically reduces approximation errors to guarantee convergence to a globally optimal policy in tabular RMDPs. We introduce a novel parametric transition kernel and solve the inner loop robust policy via a gradient-based method. Finally, our numerical results demonstrate the utility of our new algorithm and confirm its global convergence properties.
title	Policy Gradient in Robust MDPs with Global Convergence Guarantee
topic	Machine Learning
url	https://arxiv.org/abs/2212.10439

Similar Items