Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.16601 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911888341729280 |
|---|---|
| author | Khattar, Vanshaj Ding, Yuhao Sel, Bilgehan Lavaei, Javad Jin, Ming |
| author_facet | Khattar, Vanshaj Ding, Yuhao Sel, Bilgehan Lavaei, Javad Jin, Ming |
| contents | Meta-reinforcement learning has widely been used as a learning-to-learn framework to solve unseen tasks with limited experience. However, the aspect of constraint violations has not been adequately addressed in the existing works, making their application restricted in real-world settings. In this paper, we study the problem of meta-safe reinforcement learning (Meta-SRL) through the CMDP-within-online framework to establish the first provable guarantees in this important setting. We obtain task-averaged regret bounds for the reward maximization (optimality gap) and constraint violations using gradient-based meta-learning and show that the task-averaged optimality gap and constraint satisfaction improve with task-similarity in a static environment or task-relatedness in a dynamic environment. Several technical challenges arise when making this framework practical. To this end, we propose a meta-algorithm that performs inexact online learning on the upper bounds of within-task optimality gap and constraint violations estimated by off-policy stationary distribution corrections. Furthermore, we enable the learning rates to be adapted for every task and extend our approach to settings with a competing dynamically changing oracle. Finally, experiments are conducted to demonstrate the effectiveness of our approach. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2405_16601 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | A CMDP-within-online framework for Meta-Safe Reinforcement Learning Khattar, Vanshaj Ding, Yuhao Sel, Bilgehan Lavaei, Javad Jin, Ming Machine Learning Meta-reinforcement learning has widely been used as a learning-to-learn framework to solve unseen tasks with limited experience. However, the aspect of constraint violations has not been adequately addressed in the existing works, making their application restricted in real-world settings. In this paper, we study the problem of meta-safe reinforcement learning (Meta-SRL) through the CMDP-within-online framework to establish the first provable guarantees in this important setting. We obtain task-averaged regret bounds for the reward maximization (optimality gap) and constraint violations using gradient-based meta-learning and show that the task-averaged optimality gap and constraint satisfaction improve with task-similarity in a static environment or task-relatedness in a dynamic environment. Several technical challenges arise when making this framework practical. To this end, we propose a meta-algorithm that performs inexact online learning on the upper bounds of within-task optimality gap and constraint violations estimated by off-policy stationary distribution corrections. Furthermore, we enable the learning rates to be adapted for every task and extend our approach to settings with a competing dynamically changing oracle. Finally, experiments are conducted to demonstrate the effectiveness of our approach. |
| title | A CMDP-within-online framework for Meta-Safe Reinforcement Learning |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2405.16601 |