Saved in:
Bibliographic Details
Main Authors: Yao, Zhiyuan, Florescu, Ionut, Lee, Chihoon
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.00313
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • In this paper we are introducing a new reinforcement learning method for control problems in environments with delayed feedback. Specifically, our method employs stochastic planning, versus previous methods that used deterministic planning. This allows us to embed risk preference in the policy optimization problem. We show that this formulation can recover the optimal policy for problems with deterministic transitions. We contrast our policy with two prior methods from literature. We apply the methodology to simple tasks to understand its features. Then, we compare the performance of the methods in controlling multiple Atari games.