Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Remman, Sindre Benjamin, Kristiansen, Bjørn Andreas, Lekkas, Anastasios M.
Format:	Preprint
Publié:	2024
Sujets:	Machine Learning Systems and Control
Accès en ligne:	https://arxiv.org/abs/2406.01178
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866909216560644096
author	Remman, Sindre Benjamin Kristiansen, Bjørn Andreas Lekkas, Anastasios M.
author_facet	Remman, Sindre Benjamin Kristiansen, Bjørn Andreas Lekkas, Anastasios M.
contents	In this work, we use optimal control to change the behavior of a deep reinforcement learning policy by optimizing directly in the policy's latent space. We hypothesize that distinct behavioral patterns, termed behavioral modes, can be identified within certain regions of a deep reinforcement learning policy's latent space, meaning that specific actions or strategies are preferred within these regions. We identify these behavioral modes using latent space dimension-reduction with \ac*{pacmap}. Using the actions generated by the optimal control procedure, we move the system from one behavioral mode to another. We subsequently utilize these actions as a filter for interpreting the neural network policy. The results show that this approach can impose desired behavioral modes in the policy, demonstrated by showing how a failed episode can be made successful and vice versa using the lunar lander reinforcement learning environment.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_01178
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Deep Reinforcement Learning Behavioral Mode Switching Using Optimal Control Based on a Latent Space Objective Remman, Sindre Benjamin Kristiansen, Bjørn Andreas Lekkas, Anastasios M. Machine Learning Systems and Control In this work, we use optimal control to change the behavior of a deep reinforcement learning policy by optimizing directly in the policy's latent space. We hypothesize that distinct behavioral patterns, termed behavioral modes, can be identified within certain regions of a deep reinforcement learning policy's latent space, meaning that specific actions or strategies are preferred within these regions. We identify these behavioral modes using latent space dimension-reduction with \ac*{pacmap}. Using the actions generated by the optimal control procedure, we move the system from one behavioral mode to another. We subsequently utilize these actions as a filter for interpreting the neural network policy. The results show that this approach can impose desired behavioral modes in the policy, demonstrated by showing how a failed episode can be made successful and vice versa using the lunar lander reinforcement learning environment.
title	Deep Reinforcement Learning Behavioral Mode Switching Using Optimal Control Based on a Latent Space Objective
topic	Machine Learning Systems and Control
url	https://arxiv.org/abs/2406.01178

Documents similaires