Saved in:
Bibliographic Details
Main Author: Ramani, Sivaramakrishnan
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.08979
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917327872720896
author Ramani, Sivaramakrishnan
author_facet Ramani, Sivaramakrishnan
contents We consider Markov decision processes (MDPs) with unknown disturbance distribution and address this problem using the robust Markov decision process (RMDP) approach. We construct the empirical distribution of the unknown disturbance distribution and characterize our ambiguity set of distributions as the sublevel set of a nonnegative distance function from the empirical distribution. By connecting the weak convergence of distributions to convergence with respect to the distance function, we prove that the robust optimal value function and the out-of-sample value function converge to the true optimal value function with increasing sample-sizes. We establish that, for finite sample-sizes, the robust optimal value function serves as a high probability upper bound on the out-of-sample value function. We also obtain probabilistic convergence rates, sample complexity bounds, and out-of-distribution performance bounds. The finite sample performance guarantees rely on the distance function satisfying a certain concentration type inequality. Several well-studied distances in the literature meet the requirements imposed on the distance function. We also analyze the data-driven properties of empirical MDPs and demonstrate that, unlike our data-driven RMDPs, empirical MDPs fail to satisfy some of the finite sample performance guarantees.
format Preprint
id arxiv_https___arxiv_org_abs_2603_08979
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Data-driven robust Markov decision processes on Borel spaces: performance guarantees via an axiomatic approach
Ramani, Sivaramakrishnan
Optimization and Control
Machine Learning
We consider Markov decision processes (MDPs) with unknown disturbance distribution and address this problem using the robust Markov decision process (RMDP) approach. We construct the empirical distribution of the unknown disturbance distribution and characterize our ambiguity set of distributions as the sublevel set of a nonnegative distance function from the empirical distribution. By connecting the weak convergence of distributions to convergence with respect to the distance function, we prove that the robust optimal value function and the out-of-sample value function converge to the true optimal value function with increasing sample-sizes. We establish that, for finite sample-sizes, the robust optimal value function serves as a high probability upper bound on the out-of-sample value function. We also obtain probabilistic convergence rates, sample complexity bounds, and out-of-distribution performance bounds. The finite sample performance guarantees rely on the distance function satisfying a certain concentration type inequality. Several well-studied distances in the literature meet the requirements imposed on the distance function. We also analyze the data-driven properties of empirical MDPs and demonstrate that, unlike our data-driven RMDPs, empirical MDPs fail to satisfy some of the finite sample performance guarantees.
title Data-driven robust Markov decision processes on Borel spaces: performance guarantees via an axiomatic approach
topic Optimization and Control
Machine Learning
url https://arxiv.org/abs/2603.08979