Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Yunxiang, Schmidt, Mark, Babanezhad, Reza, Vaswani, Sharan
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2603.02577
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912939124981760
author	Li, Yunxiang Schmidt, Mark Babanezhad, Reza Vaswani, Sharan
author_facet	Li, Yunxiang Schmidt, Mark Babanezhad, Reza Vaswani, Sharan
contents	Temporal difference (TD) learning is a fundamental algorithm for estimating value functions in reinforcement learning. Recent finite-time analyses of TD with linear function approximation quantify its theoretical convergence rate. However, they often require setting the algorithm parameters using problem-dependent quantities that are difficult to estimate in practice -- such as the minimum eigenvalue of the feature covariance (\(ω\)) or the mixing time of the underlying Markov chain (\(τ_{\text{mix}}\)). In addition, some analyses rely on nonstandard and impractical modifications, exacerbating the gap between theory and practice. To address these limitations, we use an exponential step-size schedule with the standard TD(0) algorithm. We analyze the resulting method under two sampling regimes: independent and identically distributed (i.i.d.) sampling from the stationary distribution, and the more practical Markovian sampling along a single trajectory. In the i.i.d.\ setting, the proposed algorithm does not require knowledge of problem-dependent quantities such as \(ω\), and attains the optimal bias-variance trade-off for the last iterate. In the Markovian setting, we propose a regularized TD(0) algorithm with an exponential step-size schedule. The resulting algorithm achieves a comparable convergence rate to prior works, without requiring projections, iterate averaging, or knowledge of \(τ_{\text{mix}}\) or \(ω\).
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_02577
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Towards Parameter-Free Temporal Difference Learning Li, Yunxiang Schmidt, Mark Babanezhad, Reza Vaswani, Sharan Machine Learning Temporal difference (TD) learning is a fundamental algorithm for estimating value functions in reinforcement learning. Recent finite-time analyses of TD with linear function approximation quantify its theoretical convergence rate. However, they often require setting the algorithm parameters using problem-dependent quantities that are difficult to estimate in practice -- such as the minimum eigenvalue of the feature covariance (\(ω\)) or the mixing time of the underlying Markov chain (\(τ_{\text{mix}}\)). In addition, some analyses rely on nonstandard and impractical modifications, exacerbating the gap between theory and practice. To address these limitations, we use an exponential step-size schedule with the standard TD(0) algorithm. We analyze the resulting method under two sampling regimes: independent and identically distributed (i.i.d.) sampling from the stationary distribution, and the more practical Markovian sampling along a single trajectory. In the i.i.d.\ setting, the proposed algorithm does not require knowledge of problem-dependent quantities such as \(ω\), and attains the optimal bias-variance trade-off for the last iterate. In the Markovian setting, we propose a regularized TD(0) algorithm with an exponential step-size schedule. The resulting algorithm achieves a comparable convergence rate to prior works, without requiring projections, iterate averaging, or knowledge of \(τ_{\text{mix}}\) or \(ω\).
title	Towards Parameter-Free Temporal Difference Learning
topic	Machine Learning
url	https://arxiv.org/abs/2603.02577

Similar Items