MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Mandal, Saptarshi, Murthy, Yashaswini, Srikant, R.
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Machine Learning I.2.6
Accesso online:	https://arxiv.org/abs/2510.01721
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866918391008198656
author	Mandal, Saptarshi Murthy, Yashaswini Srikant, R.
author_facet	Mandal, Saptarshi Murthy, Yashaswini Srikant, R.
contents	Distributionally robust reinforcement learning (DRRL) focuses on designing policies that achieve good performance under model uncertainties. The goal is to maximize the worst-case long-term discounted reward, where the data for RL comes from a nominal model while the deployed environment can deviate from the nominal model within a prescribed uncertainty set. Existing convergence guarantees for DRRL are limited to tabular MDPs or are dependent on restrictive discount factor assumptions when function approximation is used. We present a convergence result for a robust Q-learning algorithm with linear function approximation without any discount factor restrictions. In this paper, the robustness is measured with respect to the total-variation distance uncertainty set. Our model free algorithm does not require generative access to the MDP and achieves an $\tilde{\mathcal{O}}(1/ε^{4})$ sample complexity for an $ε$-accurate value estimate. Our results close a key gap between the empirical success of robust RL algorithms and the non-asymptotic guarantees enjoyed by their non-robust counterparts. The key ideas in the paper also extend in a relatively straightforward fashion to robust Temporal-Difference (TD) learning with function approximation. The robust TD learning algorithm is discussed in the Appendix.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_01721
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Convergence of Distributionally Robust Q-Learning with Linear Function Approximation Mandal, Saptarshi Murthy, Yashaswini Srikant, R. Machine Learning I.2.6 Distributionally robust reinforcement learning (DRRL) focuses on designing policies that achieve good performance under model uncertainties. The goal is to maximize the worst-case long-term discounted reward, where the data for RL comes from a nominal model while the deployed environment can deviate from the nominal model within a prescribed uncertainty set. Existing convergence guarantees for DRRL are limited to tabular MDPs or are dependent on restrictive discount factor assumptions when function approximation is used. We present a convergence result for a robust Q-learning algorithm with linear function approximation without any discount factor restrictions. In this paper, the robustness is measured with respect to the total-variation distance uncertainty set. Our model free algorithm does not require generative access to the MDP and achieves an $\tilde{\mathcal{O}}(1/ε^{4})$ sample complexity for an $ε$-accurate value estimate. Our results close a key gap between the empirical success of robust RL algorithms and the non-asymptotic guarantees enjoyed by their non-robust counterparts. The key ideas in the paper also extend in a relatively straightforward fashion to robust Temporal-Difference (TD) learning with function approximation. The robust TD learning algorithm is discussed in the Appendix.
title	Convergence of Distributionally Robust Q-Learning with Linear Function Approximation
topic	Machine Learning I.2.6
url	https://arxiv.org/abs/2510.01721

Documenti analoghi