Saved in:
Bibliographic Details
Main Authors: Bayraktar, Erhan, Huang, Yu-Jui, Wang, Zhenhua, Zhou, Zhou
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2307.04227
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917860111024128
author Bayraktar, Erhan
Huang, Yu-Jui
Wang, Zhenhua
Zhou, Zhou
author_facet Bayraktar, Erhan
Huang, Yu-Jui
Wang, Zhenhua
Zhou, Zhou
contents This paper considers an infinite-horizon Markov decision process (MDP) that allows for general non-exponential discount functions, in both discrete and continuous time. Due to the inherent time inconsistency, we look for a randomized equilibrium policy (i.e., relaxed equilibrium) in an intra-personal game between an agent's current and future selves. When we modify the MDP by entropy regularization, a relaxed equilibrium is shown to exist by a nontrivial entropy estimate. As the degree of regularization diminishes, the entropy-regularized MDPs approximate the original MDP, which gives the general existence of a relaxed equilibrium in the limit by weak convergence arguments. As opposed to prior studies that consider only deterministic policies, our existence of an equilibrium does not require any convexity (or concavity) of the controlled transition probabilities and reward function. Interestingly, this benefit of considering randomized policies is unique to the time-inconsistent case.
format Preprint
id arxiv_https___arxiv_org_abs_2307_04227
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Relaxed Equilibria for Time-Inconsistent Markov Decision Processes
Bayraktar, Erhan
Huang, Yu-Jui
Wang, Zhenhua
Zhou, Zhou
Optimization and Control
60J10, 60J27, 91A11
This paper considers an infinite-horizon Markov decision process (MDP) that allows for general non-exponential discount functions, in both discrete and continuous time. Due to the inherent time inconsistency, we look for a randomized equilibrium policy (i.e., relaxed equilibrium) in an intra-personal game between an agent's current and future selves. When we modify the MDP by entropy regularization, a relaxed equilibrium is shown to exist by a nontrivial entropy estimate. As the degree of regularization diminishes, the entropy-regularized MDPs approximate the original MDP, which gives the general existence of a relaxed equilibrium in the limit by weak convergence arguments. As opposed to prior studies that consider only deterministic policies, our existence of an equilibrium does not require any convexity (or concavity) of the controlled transition probabilities and reward function. Interestingly, this benefit of considering randomized policies is unique to the time-inconsistent case.
title Relaxed Equilibria for Time-Inconsistent Markov Decision Processes
topic Optimization and Control
60J10, 60J27, 91A11
url https://arxiv.org/abs/2307.04227