Saved in:
Bibliographic Details
Main Authors: Naik, Abhishek, Wan, Yi, Tomar, Manan, Sutton, Richard S.
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2405.09999
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917822804787200
author Naik, Abhishek
Wan, Yi
Tomar, Manan
Sutton, Richard S.
author_facet Naik, Abhishek
Wan, Yi
Tomar, Manan
Sutton, Richard S.
contents We show that discounted methods for solving continuing reinforcement learning problems can perform significantly better if they center their rewards by subtracting out the rewards' empirical average. The improvement is substantial at commonly used discount factors and increases further as the discount factor approaches one. In addition, we show that if a problem's rewards are shifted by a constant, then standard methods perform much worse, whereas methods with reward centering are unaffected. Estimating the average reward is straightforward in the on-policy setting; we propose a slightly more sophisticated method for the off-policy setting. Reward centering is a general idea, so we expect almost every reinforcement-learning algorithm to benefit by the addition of reward centering.
format Preprint
id arxiv_https___arxiv_org_abs_2405_09999
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Reward Centering
Naik, Abhishek
Wan, Yi
Tomar, Manan
Sutton, Richard S.
Machine Learning
Artificial Intelligence
We show that discounted methods for solving continuing reinforcement learning problems can perform significantly better if they center their rewards by subtracting out the rewards' empirical average. The improvement is substantial at commonly used discount factors and increases further as the discount factor approaches one. In addition, we show that if a problem's rewards are shifted by a constant, then standard methods perform much worse, whereas methods with reward centering are unaffected. Estimating the average reward is straightforward in the on-policy setting; we propose a slightly more sophisticated method for the off-policy setting. Reward centering is a general idea, so we expect almost every reinforcement-learning algorithm to benefit by the addition of reward centering.
title Reward Centering
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2405.09999