Enregistré dans:
| Auteurs principaux: | , , |
|---|---|
| Format: | Preprint |
| Publié: |
2024
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2406.01178 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
| _version_ | 1866909216560644096 |
|---|---|
| author | Remman, Sindre Benjamin Kristiansen, Bjørn Andreas Lekkas, Anastasios M. |
| author_facet | Remman, Sindre Benjamin Kristiansen, Bjørn Andreas Lekkas, Anastasios M. |
| contents | In this work, we use optimal control to change the behavior of a deep reinforcement learning policy by optimizing directly in the policy's latent space. We hypothesize that distinct behavioral patterns, termed behavioral modes, can be identified within certain regions of a deep reinforcement learning policy's latent space, meaning that specific actions or strategies are preferred within these regions. We identify these behavioral modes using latent space dimension-reduction with \ac*{pacmap}. Using the actions generated by the optimal control procedure, we move the system from one behavioral mode to another. We subsequently utilize these actions as a filter for interpreting the neural network policy. The results show that this approach can impose desired behavioral modes in the policy, demonstrated by showing how a failed episode can be made successful and vice versa using the lunar lander reinforcement learning environment. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2406_01178 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Deep Reinforcement Learning Behavioral Mode Switching Using Optimal Control Based on a Latent Space Objective Remman, Sindre Benjamin Kristiansen, Bjørn Andreas Lekkas, Anastasios M. Machine Learning Systems and Control In this work, we use optimal control to change the behavior of a deep reinforcement learning policy by optimizing directly in the policy's latent space. We hypothesize that distinct behavioral patterns, termed behavioral modes, can be identified within certain regions of a deep reinforcement learning policy's latent space, meaning that specific actions or strategies are preferred within these regions. We identify these behavioral modes using latent space dimension-reduction with \ac*{pacmap}. Using the actions generated by the optimal control procedure, we move the system from one behavioral mode to another. We subsequently utilize these actions as a filter for interpreting the neural network policy. The results show that this approach can impose desired behavioral modes in the policy, demonstrated by showing how a failed episode can be made successful and vice versa using the lunar lander reinforcement learning environment. |
| title | Deep Reinforcement Learning Behavioral Mode Switching Using Optimal Control Based on a Latent Space Objective |
| topic | Machine Learning Systems and Control |
| url | https://arxiv.org/abs/2406.01178 |