Guardado en:
Detalles Bibliográficos
Autores principales: Tian, Haozhe, Hamedmoghadam, Homayoun, Shorten, Robert, Ferraro, Pietro
Formato: Preprint
Publicado: 2024
Materias:
Acceso en línea:https://arxiv.org/abs/2404.15199
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866917824268599296
author Tian, Haozhe
Hamedmoghadam, Homayoun
Shorten, Robert
Ferraro, Pietro
author_facet Tian, Haozhe
Hamedmoghadam, Homayoun
Shorten, Robert
Ferraro, Pietro
contents Reinforcement Learning (RL) is a powerful method for controlling dynamic systems, but its learning mechanism can lead to unpredictable actions that undermine the safety of critical systems. Here, we propose RL with Adaptive Regularization (RL-AR), an algorithm that enables safe RL exploration by combining the RL policy with a policy regularizer that hard-codes the safety constraints. RL-AR performs policy combination via a "focus module," which determines the appropriate combination depending on the state--relying more on the safe policy regularizer for less-exploited states while allowing unbiased convergence for well-exploited states. In a series of critical control applications, we demonstrate that RL-AR not only ensures safety during training but also achieves a return competitive with the standards of model-free RL that disregards safety.
format Preprint
id arxiv_https___arxiv_org_abs_2404_15199
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Reinforcement Learning with Adaptive Regularization for Safe Control of Critical Systems
Tian, Haozhe
Hamedmoghadam, Homayoun
Shorten, Robert
Ferraro, Pietro
Machine Learning
Reinforcement Learning (RL) is a powerful method for controlling dynamic systems, but its learning mechanism can lead to unpredictable actions that undermine the safety of critical systems. Here, we propose RL with Adaptive Regularization (RL-AR), an algorithm that enables safe RL exploration by combining the RL policy with a policy regularizer that hard-codes the safety constraints. RL-AR performs policy combination via a "focus module," which determines the appropriate combination depending on the state--relying more on the safe policy regularizer for less-exploited states while allowing unbiased convergence for well-exploited states. In a series of critical control applications, we demonstrate that RL-AR not only ensures safety during training but also achieves a return competitive with the standards of model-free RL that disregards safety.
title Reinforcement Learning with Adaptive Regularization for Safe Control of Critical Systems
topic Machine Learning
url https://arxiv.org/abs/2404.15199