Saved in:
Bibliographic Details
Main Authors: Lintunen, Erik M., Ady, Nadia M., Guckelsberger, Christian
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2411.01521
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916469968732160
author Lintunen, Erik M.
Ady, Nadia M.
Guckelsberger, Christian
author_facet Lintunen, Erik M.
Ady, Nadia M.
Guckelsberger, Christian
contents Non-uniform goal selection has the potential to improve the reinforcement learning (RL) of skills over uniform-random selection. In this paper, we introduce a method for learning a goal-selection policy in intrinsically-motivated goal-conditioned RL: "Diversity Progress" (DP). The learner forms a curriculum based on observed improvement in discriminability over its set of goals. Our proposed method is applicable to the class of discriminability-motivated agents, where the intrinsic reward is computed as a function of the agent's certainty of following the true goal being pursued. This reward can motivate the agent to learn a set of diverse skills without extrinsic rewards. We demonstrate empirically that a DP-motivated agent can learn a set of distinguishable skills faster than previous approaches, and do so without suffering from a collapse of the goal distribution -- a known issue with some prior approaches. We end with plans to take this proof-of-concept forward.
format Preprint
id arxiv_https___arxiv_org_abs_2411_01521
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Diversity Progress for Goal Selection in Discriminability-Motivated RL
Lintunen, Erik M.
Ady, Nadia M.
Guckelsberger, Christian
Artificial Intelligence
Machine Learning
Non-uniform goal selection has the potential to improve the reinforcement learning (RL) of skills over uniform-random selection. In this paper, we introduce a method for learning a goal-selection policy in intrinsically-motivated goal-conditioned RL: "Diversity Progress" (DP). The learner forms a curriculum based on observed improvement in discriminability over its set of goals. Our proposed method is applicable to the class of discriminability-motivated agents, where the intrinsic reward is computed as a function of the agent's certainty of following the true goal being pursued. This reward can motivate the agent to learn a set of diverse skills without extrinsic rewards. We demonstrate empirically that a DP-motivated agent can learn a set of distinguishable skills faster than previous approaches, and do so without suffering from a collapse of the goal distribution -- a known issue with some prior approaches. We end with plans to take this proof-of-concept forward.
title Diversity Progress for Goal Selection in Discriminability-Motivated RL
topic Artificial Intelligence
Machine Learning
url https://arxiv.org/abs/2411.01521