Saved in:
Bibliographic Details
Main Authors: Xi, Jiachen, Garcia, Alfredo, Momcilovic, Petar
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2401.15196
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913684131938304
author Xi, Jiachen
Garcia, Alfredo
Momcilovic, Petar
author_facet Xi, Jiachen
Garcia, Alfredo
Momcilovic, Petar
contents Regularized Markov Decision Processes serve as models of sequential decision making under uncertainty wherein the decision maker has limited information processing capacity and/or aversion to model ambiguity. With functional approximation, the convergence properties of learning algorithms for regularized MDPs (e.g. soft Q-learning) are not well understood because the composition of the regularized Bellman operator and a projection onto the span of basis vectors is not a contraction with respect to any norm. In this paper, we consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation. The {\em lower} level optimization problem aims to identify a value function approximation that satisfies Bellman's recursive optimality condition and the {\em upper} level aims to find the projection onto the span of basis vectors. This formulation motivates a single-loop algorithm with finite time convergence guarantees. The algorithm operates on two time-scales: updates to the projection of state-action values are `slow' in that they are implemented with a step size that is smaller than the one used for `faster' updates of approximate solutions to Bellman's recursive optimality equation. We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise. In addition, we provide a performance guarantee for the policies derived from the proposed algorithm.
format Preprint
id arxiv_https___arxiv_org_abs_2401_15196
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Regularized Q-Learning with Linear Function Approximation
Xi, Jiachen
Garcia, Alfredo
Momcilovic, Petar
Artificial Intelligence
Regularized Markov Decision Processes serve as models of sequential decision making under uncertainty wherein the decision maker has limited information processing capacity and/or aversion to model ambiguity. With functional approximation, the convergence properties of learning algorithms for regularized MDPs (e.g. soft Q-learning) are not well understood because the composition of the regularized Bellman operator and a projection onto the span of basis vectors is not a contraction with respect to any norm. In this paper, we consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation. The {\em lower} level optimization problem aims to identify a value function approximation that satisfies Bellman's recursive optimality condition and the {\em upper} level aims to find the projection onto the span of basis vectors. This formulation motivates a single-loop algorithm with finite time convergence guarantees. The algorithm operates on two time-scales: updates to the projection of state-action values are `slow' in that they are implemented with a step size that is smaller than the one used for `faster' updates of approximate solutions to Bellman's recursive optimality equation. We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise. In addition, we provide a performance guarantee for the policies derived from the proposed algorithm.
title Regularized Q-Learning with Linear Function Approximation
topic Artificial Intelligence
url https://arxiv.org/abs/2401.15196