Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lim, Han-Dong, Lee, Donghwan
Format:	Preprint
Published:	2023
Subjects:	Machine Learning Optimization and Control
Online Access:	https://arxiv.org/abs/2310.00638
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

The goal of this paper is to investigate distributed temporal difference (TD) learning for a networked multi-agent Markov decision process. The proposed approach is based on distributed optimization algorithms, which can be interpreted as primal-dual Ordinary differential equation (ODE) dynamics subject to null-space constraints. Based on the exponential convergence behavior of the primal-dual ODE dynamics subject to null-space constraints, we examine the behavior of the final iterate in various distributed TD-learning scenarios, considering both constant and diminishing step-sizes and incorporating both i.i.d. and Markovian observation models. Unlike existing methods, the proposed algorithm does not require the assumption that the underlying communication network structure is characterized by a doubly stochastic matrix.

Similar Items