Salvato in:
Dettagli Bibliografici
Autori principali: Mandal, Saptarshi, Murthy, Yashaswini, Srikant, R.
Natura: Preprint
Pubblicazione: 2025
Soggetti:
Accesso online:https://arxiv.org/abs/2510.01721
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866918391008198656
author Mandal, Saptarshi
Murthy, Yashaswini
Srikant, R.
author_facet Mandal, Saptarshi
Murthy, Yashaswini
Srikant, R.
contents Distributionally robust reinforcement learning (DRRL) focuses on designing policies that achieve good performance under model uncertainties. The goal is to maximize the worst-case long-term discounted reward, where the data for RL comes from a nominal model while the deployed environment can deviate from the nominal model within a prescribed uncertainty set. Existing convergence guarantees for DRRL are limited to tabular MDPs or are dependent on restrictive discount factor assumptions when function approximation is used. We present a convergence result for a robust Q-learning algorithm with linear function approximation without any discount factor restrictions. In this paper, the robustness is measured with respect to the total-variation distance uncertainty set. Our model free algorithm does not require generative access to the MDP and achieves an $\tilde{\mathcal{O}}(1/ε^{4})$ sample complexity for an $ε$-accurate value estimate. Our results close a key gap between the empirical success of robust RL algorithms and the non-asymptotic guarantees enjoyed by their non-robust counterparts. The key ideas in the paper also extend in a relatively straightforward fashion to robust Temporal-Difference (TD) learning with function approximation. The robust TD learning algorithm is discussed in the Appendix.
format Preprint
id arxiv_https___arxiv_org_abs_2510_01721
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Convergence of Distributionally Robust Q-Learning with Linear Function Approximation
Mandal, Saptarshi
Murthy, Yashaswini
Srikant, R.
Machine Learning
I.2.6
Distributionally robust reinforcement learning (DRRL) focuses on designing policies that achieve good performance under model uncertainties. The goal is to maximize the worst-case long-term discounted reward, where the data for RL comes from a nominal model while the deployed environment can deviate from the nominal model within a prescribed uncertainty set. Existing convergence guarantees for DRRL are limited to tabular MDPs or are dependent on restrictive discount factor assumptions when function approximation is used. We present a convergence result for a robust Q-learning algorithm with linear function approximation without any discount factor restrictions. In this paper, the robustness is measured with respect to the total-variation distance uncertainty set. Our model free algorithm does not require generative access to the MDP and achieves an $\tilde{\mathcal{O}}(1/ε^{4})$ sample complexity for an $ε$-accurate value estimate. Our results close a key gap between the empirical success of robust RL algorithms and the non-asymptotic guarantees enjoyed by their non-robust counterparts. The key ideas in the paper also extend in a relatively straightforward fashion to robust Temporal-Difference (TD) learning with function approximation. The robust TD learning algorithm is discussed in the Appendix.
title Convergence of Distributionally Robust Q-Learning with Linear Function Approximation
topic Machine Learning
I.2.6
url https://arxiv.org/abs/2510.01721