Saved in:
Bibliographic Details
Main Authors: Pappalardo, Octavio, Ramele, Rodrigo, Santos, Juan Miguel
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.21546
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912940568870912
author Pappalardo, Octavio
Ramele, Rodrigo
Santos, Juan Miguel
author_facet Pappalardo, Octavio
Ramele, Rodrigo
Santos, Juan Miguel
contents The broader application of reinforcement learning (RL) is limited by challenges including data efficiency, generalization capability, and ability to learn in sparse-reward environments. Meta-learning has emerged as a promising approach to address these issues by optimizing components of the learning algorithm to meet desired characteristics. Additionally, a different line of work has extensively studied the use of intrinsic rewards to enhance the exploration capabilities of algorithms. This work investigates how meta-learning can improve the training signal received by RL agents. We introduce a method to learn intrinsic rewards within a reinforcement learning framework that bypasses the typical computation of meta-gradients through an optimization process by treating policy updates as black boxes. We validate our approach against training with extrinsic rewards, demonstrating its effectiveness, and additionally compare it to the use of a meta-learned advantage function. Experiments are carried out on distributions of continuous control tasks with both parametric and non-parametric variations. Furthermore, only sparse rewards are used during evaluation. Code is available at: https: //github.com/Octavio-Pappalardo/Meta-learning-rewards
format Preprint
id arxiv_https___arxiv_org_abs_2407_21546
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Black Box Meta-Learning Intrinsic Rewards
Pappalardo, Octavio
Ramele, Rodrigo
Santos, Juan Miguel
Machine Learning
The broader application of reinforcement learning (RL) is limited by challenges including data efficiency, generalization capability, and ability to learn in sparse-reward environments. Meta-learning has emerged as a promising approach to address these issues by optimizing components of the learning algorithm to meet desired characteristics. Additionally, a different line of work has extensively studied the use of intrinsic rewards to enhance the exploration capabilities of algorithms. This work investigates how meta-learning can improve the training signal received by RL agents. We introduce a method to learn intrinsic rewards within a reinforcement learning framework that bypasses the typical computation of meta-gradients through an optimization process by treating policy updates as black boxes. We validate our approach against training with extrinsic rewards, demonstrating its effectiveness, and additionally compare it to the use of a meta-learned advantage function. Experiments are carried out on distributions of continuous control tasks with both parametric and non-parametric variations. Furthermore, only sparse rewards are used during evaluation. Code is available at: https: //github.com/Octavio-Pappalardo/Meta-learning-rewards
title Black Box Meta-Learning Intrinsic Rewards
topic Machine Learning
url https://arxiv.org/abs/2407.21546