Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Pappalardo, Octavio, Ramele, Rodrigo, Santos, Juan Miguel
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2407.21546
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912940568870912
author	Pappalardo, Octavio Ramele, Rodrigo Santos, Juan Miguel
author_facet	Pappalardo, Octavio Ramele, Rodrigo Santos, Juan Miguel
contents	The broader application of reinforcement learning (RL) is limited by challenges including data efficiency, generalization capability, and ability to learn in sparse-reward environments. Meta-learning has emerged as a promising approach to address these issues by optimizing components of the learning algorithm to meet desired characteristics. Additionally, a different line of work has extensively studied the use of intrinsic rewards to enhance the exploration capabilities of algorithms. This work investigates how meta-learning can improve the training signal received by RL agents. We introduce a method to learn intrinsic rewards within a reinforcement learning framework that bypasses the typical computation of meta-gradients through an optimization process by treating policy updates as black boxes. We validate our approach against training with extrinsic rewards, demonstrating its effectiveness, and additionally compare it to the use of a meta-learned advantage function. Experiments are carried out on distributions of continuous control tasks with both parametric and non-parametric variations. Furthermore, only sparse rewards are used during evaluation. Code is available at: https: //github.com/Octavio-Pappalardo/Meta-learning-rewards
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_21546
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Black Box Meta-Learning Intrinsic Rewards Pappalardo, Octavio Ramele, Rodrigo Santos, Juan Miguel Machine Learning The broader application of reinforcement learning (RL) is limited by challenges including data efficiency, generalization capability, and ability to learn in sparse-reward environments. Meta-learning has emerged as a promising approach to address these issues by optimizing components of the learning algorithm to meet desired characteristics. Additionally, a different line of work has extensively studied the use of intrinsic rewards to enhance the exploration capabilities of algorithms. This work investigates how meta-learning can improve the training signal received by RL agents. We introduce a method to learn intrinsic rewards within a reinforcement learning framework that bypasses the typical computation of meta-gradients through an optimization process by treating policy updates as black boxes. We validate our approach against training with extrinsic rewards, demonstrating its effectiveness, and additionally compare it to the use of a meta-learned advantage function. Experiments are carried out on distributions of continuous control tasks with both parametric and non-parametric variations. Furthermore, only sparse rewards are used during evaluation. Code is available at: https: //github.com/Octavio-Pappalardo/Meta-learning-rewards
title	Black Box Meta-Learning Intrinsic Rewards
topic	Machine Learning
url	https://arxiv.org/abs/2407.21546

Similar Items