Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2022
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2212.10439 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866914794815094784 |
|---|---|
| author | Wang, Qiuhao Ho, Chin Pang Petrik, Marek |
| author_facet | Wang, Qiuhao Ho, Chin Pang Petrik, Marek |
| contents | Robust Markov decision processes (RMDPs) provide a promising framework for computing reliable policies in the face of model errors. Many successful reinforcement learning algorithms build on variations of policy-gradient methods, but adapting these methods to RMDPs has been challenging. As a result, the applicability of RMDPs to large, practical domains remains limited. This paper proposes a new Double-Loop Robust Policy Gradient (DRPG), the first generic policy gradient method for RMDPs. In contrast with prior robust policy gradient algorithms, DRPG monotonically reduces approximation errors to guarantee convergence to a globally optimal policy in tabular RMDPs. We introduce a novel parametric transition kernel and solve the inner loop robust policy via a gradient-based method. Finally, our numerical results demonstrate the utility of our new algorithm and confirm its global convergence properties. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2212_10439 |
| institution | arXiv |
| publishDate | 2022 |
| record_format | arxiv |
| spellingShingle | Policy Gradient in Robust MDPs with Global Convergence Guarantee Wang, Qiuhao Ho, Chin Pang Petrik, Marek Machine Learning Robust Markov decision processes (RMDPs) provide a promising framework for computing reliable policies in the face of model errors. Many successful reinforcement learning algorithms build on variations of policy-gradient methods, but adapting these methods to RMDPs has been challenging. As a result, the applicability of RMDPs to large, practical domains remains limited. This paper proposes a new Double-Loop Robust Policy Gradient (DRPG), the first generic policy gradient method for RMDPs. In contrast with prior robust policy gradient algorithms, DRPG monotonically reduces approximation errors to guarantee convergence to a globally optimal policy in tabular RMDPs. We introduce a novel parametric transition kernel and solve the inner loop robust policy via a gradient-based method. Finally, our numerical results demonstrate the utility of our new algorithm and confirm its global convergence properties. |
| title | Policy Gradient in Robust MDPs with Global Convergence Guarantee |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2212.10439 |