Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Åström, Hampus, Topp, Elin Anna, Malec, Jacek
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2511.04598
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914141281714176
author	Åström, Hampus Topp, Elin Anna Malec, Jacek
author_facet	Åström, Hampus Topp, Elin Anna Malec, Jacek
contents	In this paper we study how transforming regular reinforcement learning environments into goal-conditioned environments can let agents learn to solve tasks autonomously and reward-free. We show that an agent can learn to solve tasks by selecting its own goals in an environment-agnostic way, at training times comparable to externally guided reinforcement learning. Our method is independent of the underlying off-policy learning algorithm. Since our method is environment-agnostic, the agent does not value any goals higher than others, leading to instability in performance for individual goals. However, in our experiments, we show that the average goal success rate improves and stabilizes. An agent trained with this method can be instructed to seek any observations made in the environment, enabling generic training of agents prior to specific use cases.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_04598
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Environment Agnostic Goal-Conditioning, A Study of Reward-Free Autonomous Learning Åström, Hampus Topp, Elin Anna Malec, Jacek Machine Learning In this paper we study how transforming regular reinforcement learning environments into goal-conditioned environments can let agents learn to solve tasks autonomously and reward-free. We show that an agent can learn to solve tasks by selecting its own goals in an environment-agnostic way, at training times comparable to externally guided reinforcement learning. Our method is independent of the underlying off-policy learning algorithm. Since our method is environment-agnostic, the agent does not value any goals higher than others, leading to instability in performance for individual goals. However, in our experiments, we show that the average goal success rate improves and stabilizes. An agent trained with this method can be instructed to seek any observations made in the environment, enabling generic training of agents prior to specific use cases.
title	Environment Agnostic Goal-Conditioning, A Study of Reward-Free Autonomous Learning
topic	Machine Learning
url	https://arxiv.org/abs/2511.04598

Similar Items