Saved in:
Bibliographic Details
Main Authors: Åström, Hampus, Topp, Elin Anna, Malec, Jacek
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2511.04598
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914141281714176
author Åström, Hampus
Topp, Elin Anna
Malec, Jacek
author_facet Åström, Hampus
Topp, Elin Anna
Malec, Jacek
contents In this paper we study how transforming regular reinforcement learning environments into goal-conditioned environments can let agents learn to solve tasks autonomously and reward-free. We show that an agent can learn to solve tasks by selecting its own goals in an environment-agnostic way, at training times comparable to externally guided reinforcement learning. Our method is independent of the underlying off-policy learning algorithm. Since our method is environment-agnostic, the agent does not value any goals higher than others, leading to instability in performance for individual goals. However, in our experiments, we show that the average goal success rate improves and stabilizes. An agent trained with this method can be instructed to seek any observations made in the environment, enabling generic training of agents prior to specific use cases.
format Preprint
id arxiv_https___arxiv_org_abs_2511_04598
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Environment Agnostic Goal-Conditioning, A Study of Reward-Free Autonomous Learning
Åström, Hampus
Topp, Elin Anna
Malec, Jacek
Machine Learning
In this paper we study how transforming regular reinforcement learning environments into goal-conditioned environments can let agents learn to solve tasks autonomously and reward-free. We show that an agent can learn to solve tasks by selecting its own goals in an environment-agnostic way, at training times comparable to externally guided reinforcement learning. Our method is independent of the underlying off-policy learning algorithm. Since our method is environment-agnostic, the agent does not value any goals higher than others, leading to instability in performance for individual goals. However, in our experiments, we show that the average goal success rate improves and stabilizes. An agent trained with this method can be instructed to seek any observations made in the environment, enabling generic training of agents prior to specific use cases.
title Environment Agnostic Goal-Conditioning, A Study of Reward-Free Autonomous Learning
topic Machine Learning
url https://arxiv.org/abs/2511.04598