Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Tian, Haozhe, Hamedmoghadam, Homayoun, Shorten, Robert, Ferraro, Pietro
Formato:	Preprint
Publicado:	2024
Materias:	Machine Learning
Acceso en línea:	https://arxiv.org/abs/2404.15199
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866917824268599296
author	Tian, Haozhe Hamedmoghadam, Homayoun Shorten, Robert Ferraro, Pietro
author_facet	Tian, Haozhe Hamedmoghadam, Homayoun Shorten, Robert Ferraro, Pietro
contents	Reinforcement Learning (RL) is a powerful method for controlling dynamic systems, but its learning mechanism can lead to unpredictable actions that undermine the safety of critical systems. Here, we propose RL with Adaptive Regularization (RL-AR), an algorithm that enables safe RL exploration by combining the RL policy with a policy regularizer that hard-codes the safety constraints. RL-AR performs policy combination via a "focus module," which determines the appropriate combination depending on the state--relying more on the safe policy regularizer for less-exploited states while allowing unbiased convergence for well-exploited states. In a series of critical control applications, we demonstrate that RL-AR not only ensures safety during training but also achieves a return competitive with the standards of model-free RL that disregards safety.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_15199
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Reinforcement Learning with Adaptive Regularization for Safe Control of Critical Systems Tian, Haozhe Hamedmoghadam, Homayoun Shorten, Robert Ferraro, Pietro Machine Learning Reinforcement Learning (RL) is a powerful method for controlling dynamic systems, but its learning mechanism can lead to unpredictable actions that undermine the safety of critical systems. Here, we propose RL with Adaptive Regularization (RL-AR), an algorithm that enables safe RL exploration by combining the RL policy with a policy regularizer that hard-codes the safety constraints. RL-AR performs policy combination via a "focus module," which determines the appropriate combination depending on the state--relying more on the safe policy regularizer for less-exploited states while allowing unbiased convergence for well-exploited states. In a series of critical control applications, we demonstrate that RL-AR not only ensures safety during training but also achieves a return competitive with the standards of model-free RL that disregards safety.
title	Reinforcement Learning with Adaptive Regularization for Safe Control of Critical Systems
topic	Machine Learning
url	https://arxiv.org/abs/2404.15199

Ejemplares similares