Guardado en:
| Autores principales: | , , , |
|---|---|
| Formato: | Preprint |
| Publicado: |
2024
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2404.15199 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| _version_ | 1866917824268599296 |
|---|---|
| author | Tian, Haozhe Hamedmoghadam, Homayoun Shorten, Robert Ferraro, Pietro |
| author_facet | Tian, Haozhe Hamedmoghadam, Homayoun Shorten, Robert Ferraro, Pietro |
| contents | Reinforcement Learning (RL) is a powerful method for controlling dynamic systems, but its learning mechanism can lead to unpredictable actions that undermine the safety of critical systems. Here, we propose RL with Adaptive Regularization (RL-AR), an algorithm that enables safe RL exploration by combining the RL policy with a policy regularizer that hard-codes the safety constraints. RL-AR performs policy combination via a "focus module," which determines the appropriate combination depending on the state--relying more on the safe policy regularizer for less-exploited states while allowing unbiased convergence for well-exploited states. In a series of critical control applications, we demonstrate that RL-AR not only ensures safety during training but also achieves a return competitive with the standards of model-free RL that disregards safety. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2404_15199 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Reinforcement Learning with Adaptive Regularization for Safe Control of Critical Systems Tian, Haozhe Hamedmoghadam, Homayoun Shorten, Robert Ferraro, Pietro Machine Learning Reinforcement Learning (RL) is a powerful method for controlling dynamic systems, but its learning mechanism can lead to unpredictable actions that undermine the safety of critical systems. Here, we propose RL with Adaptive Regularization (RL-AR), an algorithm that enables safe RL exploration by combining the RL policy with a policy regularizer that hard-codes the safety constraints. RL-AR performs policy combination via a "focus module," which determines the appropriate combination depending on the state--relying more on the safe policy regularizer for less-exploited states while allowing unbiased convergence for well-exploited states. In a series of critical control applications, we demonstrate that RL-AR not only ensures safety during training but also achieves a return competitive with the standards of model-free RL that disregards safety. |
| title | Reinforcement Learning with Adaptive Regularization for Safe Control of Critical Systems |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2404.15199 |