Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.01521 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866916469968732160 |
|---|---|
| author | Lintunen, Erik M. Ady, Nadia M. Guckelsberger, Christian |
| author_facet | Lintunen, Erik M. Ady, Nadia M. Guckelsberger, Christian |
| contents | Non-uniform goal selection has the potential to improve the reinforcement learning (RL) of skills over uniform-random selection. In this paper, we introduce a method for learning a goal-selection policy in intrinsically-motivated goal-conditioned RL: "Diversity Progress" (DP). The learner forms a curriculum based on observed improvement in discriminability over its set of goals. Our proposed method is applicable to the class of discriminability-motivated agents, where the intrinsic reward is computed as a function of the agent's certainty of following the true goal being pursued. This reward can motivate the agent to learn a set of diverse skills without extrinsic rewards. We demonstrate empirically that a DP-motivated agent can learn a set of distinguishable skills faster than previous approaches, and do so without suffering from a collapse of the goal distribution -- a known issue with some prior approaches. We end with plans to take this proof-of-concept forward. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2411_01521 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Diversity Progress for Goal Selection in Discriminability-Motivated RL Lintunen, Erik M. Ady, Nadia M. Guckelsberger, Christian Artificial Intelligence Machine Learning Non-uniform goal selection has the potential to improve the reinforcement learning (RL) of skills over uniform-random selection. In this paper, we introduce a method for learning a goal-selection policy in intrinsically-motivated goal-conditioned RL: "Diversity Progress" (DP). The learner forms a curriculum based on observed improvement in discriminability over its set of goals. Our proposed method is applicable to the class of discriminability-motivated agents, where the intrinsic reward is computed as a function of the agent's certainty of following the true goal being pursued. This reward can motivate the agent to learn a set of diverse skills without extrinsic rewards. We demonstrate empirically that a DP-motivated agent can learn a set of distinguishable skills faster than previous approaches, and do so without suffering from a collapse of the goal distribution -- a known issue with some prior approaches. We end with plans to take this proof-of-concept forward. |
| title | Diversity Progress for Goal Selection in Discriminability-Motivated RL |
| topic | Artificial Intelligence Machine Learning |
| url | https://arxiv.org/abs/2411.01521 |