Saved in:
Bibliographic Details
Main Authors: Kitamura, Toshinori, Ghosh, Arnob, Ayoub, Alex, Chu, Thang D., Szepesvári, Csaba
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.21177
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917430427648000
author Kitamura, Toshinori
Ghosh, Arnob
Ayoub, Alex
Chu, Thang D.
Szepesvári, Csaba
author_facet Kitamura, Toshinori
Ghosh, Arnob
Ayoub, Alex
Chu, Thang D.
Szepesvári, Csaba
contents Projected subgradient descent (PSD) has gained popularity for solving robust Markov decision processes (RMDPs) because it applies to a broader class of uncertainty sets than traditional dynamic programming. Existing work claims that RMDPs with a general compact uncertainty set satisfy the subgradient dominance property, under which exact PSD converges to an $\varepsilon$-optimal policy in a polynomial number of updates (e.g., Wang et al., 2023). We show that these claims are incorrect. Even when the uncertainty set has cardinality two, the RMDP objective is not subgradient-dominant and can admit suboptimal strict local minima. Moreover, we prove that finding an $\varepsilon$-optimal policy can be NP-hard even in settings where subgradients are efficiently computable: (i) finite transition uncertainty sets and (ii) $sa$-rectangular finite transition uncertainty sets with finite cost uncertainty sets. Finally, we identify two conditions under which RMDPs do satisfy subgradient dominance: when, for each policy, either the worst-case transition kernel or the worst-case action-value function is unique.
format Preprint
id arxiv_https___arxiv_org_abs_2604_21177
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Revisiting Subgradient Dominance in Robust MDPs: Counterexamples, Hardness, and Sufficient Conditions
Kitamura, Toshinori
Ghosh, Arnob
Ayoub, Alex
Chu, Thang D.
Szepesvári, Csaba
Optimization and Control
Projected subgradient descent (PSD) has gained popularity for solving robust Markov decision processes (RMDPs) because it applies to a broader class of uncertainty sets than traditional dynamic programming. Existing work claims that RMDPs with a general compact uncertainty set satisfy the subgradient dominance property, under which exact PSD converges to an $\varepsilon$-optimal policy in a polynomial number of updates (e.g., Wang et al., 2023). We show that these claims are incorrect. Even when the uncertainty set has cardinality two, the RMDP objective is not subgradient-dominant and can admit suboptimal strict local minima. Moreover, we prove that finding an $\varepsilon$-optimal policy can be NP-hard even in settings where subgradients are efficiently computable: (i) finite transition uncertainty sets and (ii) $sa$-rectangular finite transition uncertainty sets with finite cost uncertainty sets. Finally, we identify two conditions under which RMDPs do satisfy subgradient dominance: when, for each policy, either the worst-case transition kernel or the worst-case action-value function is unique.
title Revisiting Subgradient Dominance in Robust MDPs: Counterexamples, Hardness, and Sufficient Conditions
topic Optimization and Control
url https://arxiv.org/abs/2604.21177