Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Bayraktar, Erhan, Huang, Yu-Jui, Wang, Zhenhua, Zhou, Zhou
Format:	Preprint
Published:	2023
Subjects:	Optimization and Control 60J10, 60J27, 91A11
Online Access:	https://arxiv.org/abs/2307.04227
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917860111024128
author	Bayraktar, Erhan Huang, Yu-Jui Wang, Zhenhua Zhou, Zhou
author_facet	Bayraktar, Erhan Huang, Yu-Jui Wang, Zhenhua Zhou, Zhou
contents	This paper considers an infinite-horizon Markov decision process (MDP) that allows for general non-exponential discount functions, in both discrete and continuous time. Due to the inherent time inconsistency, we look for a randomized equilibrium policy (i.e., relaxed equilibrium) in an intra-personal game between an agent's current and future selves. When we modify the MDP by entropy regularization, a relaxed equilibrium is shown to exist by a nontrivial entropy estimate. As the degree of regularization diminishes, the entropy-regularized MDPs approximate the original MDP, which gives the general existence of a relaxed equilibrium in the limit by weak convergence arguments. As opposed to prior studies that consider only deterministic policies, our existence of an equilibrium does not require any convexity (or concavity) of the controlled transition probabilities and reward function. Interestingly, this benefit of considering randomized policies is unique to the time-inconsistent case.
format	Preprint
id	arxiv_https___arxiv_org_abs_2307_04227
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Relaxed Equilibria for Time-Inconsistent Markov Decision Processes Bayraktar, Erhan Huang, Yu-Jui Wang, Zhenhua Zhou, Zhou Optimization and Control 60J10, 60J27, 91A11 This paper considers an infinite-horizon Markov decision process (MDP) that allows for general non-exponential discount functions, in both discrete and continuous time. Due to the inherent time inconsistency, we look for a randomized equilibrium policy (i.e., relaxed equilibrium) in an intra-personal game between an agent's current and future selves. When we modify the MDP by entropy regularization, a relaxed equilibrium is shown to exist by a nontrivial entropy estimate. As the degree of regularization diminishes, the entropy-regularized MDPs approximate the original MDP, which gives the general existence of a relaxed equilibrium in the limit by weak convergence arguments. As opposed to prior studies that consider only deterministic policies, our existence of an equilibrium does not require any convexity (or concavity) of the controlled transition probabilities and reward function. Interestingly, this benefit of considering randomized policies is unique to the time-inconsistent case.
title	Relaxed Equilibria for Time-Inconsistent Markov Decision Processes
topic	Optimization and Control 60J10, 60J27, 91A11
url	https://arxiv.org/abs/2307.04227

Similar Items