Saved in:
Bibliographic Details
Main Authors: Borkar, Vivek, Chen, Shuhang, Devraj, Adithya, Kontoyiannis, Ioannis, Meyn, Sean
Format: Preprint
Published: 2021
Subjects:
Online Access:https://arxiv.org/abs/2110.14427
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • The paper concerns the $d$-dimensional stochastic approximation recursion, $$ θ_{n+1}= θ_n + α_{n + 1} f(θ_n, Φ_{n+1}) $$ where $ \{ Φ_n \}$ is a stochastic process on a general state space, satisfying a conditional Markov property that allows for parameter-dependent noise. The main results are established under additional conditions on the mean flow and a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3): (i) An appropriate Lyapunov function is constructed that implies convergence of the estimates in $L_4$. (ii) A functional central limit theorem (CLT) is established, as well as the usual one-dimensional CLT for the normalized error. Moment bounds combined with the CLT imply convergence of the normalized covariance $\textsf{E}[ z_n z_n^T ]$ to the asymptotic covariance in the CLT, where $z_n =: (θ_n-θ^*)/\sqrt{α_n}$. (iii) The CLT holds for the normalized version $z^{\text{PR}}_n =: \sqrt{n} [θ^{\text{PR}}_n -θ^*]$, of the averaged parameters $θ^{\text{PR}}_n =:n^{-1} \sum_{k=1}^nθ_k$, subject to standard assumptions on the step-size. Moreover, the covariance in the CLT coincides with the minimal covariance of Polyak and Ruppert. (iv) An example is given where $f$ and $\bar{f}$ are linear in $θ$, and $Φ$ is a geometrically ergodic Markov chain but does not satisfy (DV3). While the algorithm is convergent, the second moment of $θ_n$ is unbounded and in fact diverges. This arXiv version represents a major extension of the results in prior versions.The main results now allow for parameter-dependent noise, as is often the case in applications to reinforcement learning.