MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Zisselman, Ev, Lavie, Itai, Soudry, Daniel, Tamar, Aviv
Natura:	Preprint
Pubblicazione:	2023
Soggetti:	Machine Learning
Accesso online:	https://arxiv.org/abs/2306.03072
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866916090653704192
author	Zisselman, Ev Lavie, Itai Soudry, Daniel Tamar, Aviv
author_facet	Zisselman, Ev Lavie, Itai Soudry, Daniel Tamar, Aviv
contents	We study zero-shot generalization in reinforcement learning-optimizing a policy on a set of training tasks to perform well on a similar but unseen test task. To mitigate overfitting, previous work explored different notions of invariance to the task. However, on problems such as the ProcGen Maze, an adequate solution that is invariant to the task visualization does not exist, and therefore invariance-based approaches fail. Our insight is that learning a policy that effectively $\textit{explores}$ the domain is harder to memorize than a policy that maximizes reward for a specific task, and therefore we expect such learned behavior to generalize well; we indeed demonstrate this empirically on several domains that are difficult for invariance-based approaches. Our $\textit{Explore to Generalize}$ algorithm (ExpGen) builds on this insight: we train an additional ensemble of agents that optimize reward. At test time, either the ensemble agrees on an action, and we generalize well, or we take exploratory actions, which generalize well and drive us to a novel part of the state space, where the ensemble may potentially agree again. We show that our approach is the state-of-the-art on tasks of the ProcGen challenge that have thus far eluded effective generalization, yielding a success rate of $83\%$ on the Maze task and $74\%$ on Heist with $200$ training levels. ExpGen can also be combined with an invariance based approach to gain the best of both worlds, setting new state-of-the-art results on ProcGen.
format	Preprint
id	arxiv_https___arxiv_org_abs_2306_03072
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Explore to Generalize in Zero-Shot RL Zisselman, Ev Lavie, Itai Soudry, Daniel Tamar, Aviv Machine Learning We study zero-shot generalization in reinforcement learning-optimizing a policy on a set of training tasks to perform well on a similar but unseen test task. To mitigate overfitting, previous work explored different notions of invariance to the task. However, on problems such as the ProcGen Maze, an adequate solution that is invariant to the task visualization does not exist, and therefore invariance-based approaches fail. Our insight is that learning a policy that effectively $\textit{explores}$ the domain is harder to memorize than a policy that maximizes reward for a specific task, and therefore we expect such learned behavior to generalize well; we indeed demonstrate this empirically on several domains that are difficult for invariance-based approaches. Our $\textit{Explore to Generalize}$ algorithm (ExpGen) builds on this insight: we train an additional ensemble of agents that optimize reward. At test time, either the ensemble agrees on an action, and we generalize well, or we take exploratory actions, which generalize well and drive us to a novel part of the state space, where the ensemble may potentially agree again. We show that our approach is the state-of-the-art on tasks of the ProcGen challenge that have thus far eluded effective generalization, yielding a success rate of $83\%$ on the Maze task and $74\%$ on Heist with $200$ training levels. ExpGen can also be combined with an invariance based approach to gain the best of both worlds, setting new state-of-the-art results on ProcGen.
title	Explore to Generalize in Zero-Shot RL
topic	Machine Learning
url	https://arxiv.org/abs/2306.03072

Documenti analoghi