Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kitamura, Toshinori, Ghosh, Arnob, Ayoub, Alex, Chu, Thang D., Szepesvári, Csaba
Format:	Preprint
Published:	2026
Subjects:	Optimization and Control
Online Access:	https://arxiv.org/abs/2604.21177
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917430427648000
author	Kitamura, Toshinori Ghosh, Arnob Ayoub, Alex Chu, Thang D. Szepesvári, Csaba
author_facet	Kitamura, Toshinori Ghosh, Arnob Ayoub, Alex Chu, Thang D. Szepesvári, Csaba
contents	Projected subgradient descent (PSD) has gained popularity for solving robust Markov decision processes (RMDPs) because it applies to a broader class of uncertainty sets than traditional dynamic programming. Existing work claims that RMDPs with a general compact uncertainty set satisfy the subgradient dominance property, under which exact PSD converges to an $\varepsilon$-optimal policy in a polynomial number of updates (e.g., Wang et al., 2023). We show that these claims are incorrect. Even when the uncertainty set has cardinality two, the RMDP objective is not subgradient-dominant and can admit suboptimal strict local minima. Moreover, we prove that finding an $\varepsilon$-optimal policy can be NP-hard even in settings where subgradients are efficiently computable: (i) finite transition uncertainty sets and (ii) $sa$-rectangular finite transition uncertainty sets with finite cost uncertainty sets. Finally, we identify two conditions under which RMDPs do satisfy subgradient dominance: when, for each policy, either the worst-case transition kernel or the worst-case action-value function is unique.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_21177
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Revisiting Subgradient Dominance in Robust MDPs: Counterexamples, Hardness, and Sufficient Conditions Kitamura, Toshinori Ghosh, Arnob Ayoub, Alex Chu, Thang D. Szepesvári, Csaba Optimization and Control Projected subgradient descent (PSD) has gained popularity for solving robust Markov decision processes (RMDPs) because it applies to a broader class of uncertainty sets than traditional dynamic programming. Existing work claims that RMDPs with a general compact uncertainty set satisfy the subgradient dominance property, under which exact PSD converges to an $\varepsilon$-optimal policy in a polynomial number of updates (e.g., Wang et al., 2023). We show that these claims are incorrect. Even when the uncertainty set has cardinality two, the RMDP objective is not subgradient-dominant and can admit suboptimal strict local minima. Moreover, we prove that finding an $\varepsilon$-optimal policy can be NP-hard even in settings where subgradients are efficiently computable: (i) finite transition uncertainty sets and (ii) $sa$-rectangular finite transition uncertainty sets with finite cost uncertainty sets. Finally, we identify two conditions under which RMDPs do satisfy subgradient dominance: when, for each policy, either the worst-case transition kernel or the worst-case action-value function is unique.
title	Revisiting Subgradient Dominance in Robust MDPs: Counterexamples, Hardness, and Sufficient Conditions
topic	Optimization and Control
url	https://arxiv.org/abs/2604.21177

Similar Items