Зміст: :: Library Catalog

Збережено в:

Бібліографічні деталі
Автори:	Kamat, Anand, Precup, Doina
Формат:	Preprint
Опубліковано:	2020
Предмети:	Machine Learning Artificial Intelligence
Онлайн доступ:	https://arxiv.org/abs/2011.02565
Теги:	Додати тег Немає тегів, Будьте першим, хто поставить тег для цього запису!

Зміст:

Temporal abstraction allows reinforcement learning agents to represent knowledge and develop strategies over different temporal scales. The option-critic framework has been demonstrated to learn temporally extended actions, represented as options, end-to-end in a model-free setting. However, feasibility of option-critic remains limited due to two major challenges, multiple options adopting very similar behavior, or a shrinking set of task relevant options. These occurrences not only void the need for temporal abstraction, they also affect performance. In this paper, we tackle these problems by learning a diverse set of options. We introduce an information-theoretic intrinsic reward, which augments the task reward, as well as a novel termination objective, in order to encourage behavioral diversity in the option set. We show empirically that our proposed method is capable of learning options end-to-end on several discrete and continuous control tasks, outperforms option-critic by a wide margin. Furthermore, we show that our approach sustainably generates robust, reusable, reliable and interpretable options, in contrast to option-critic.

Схожі ресурси