Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Ramani, Sivaramakrishnan
Format:	Preprint
Published:	2026
Subjects:	Optimization and Control Machine Learning
Online Access:	https://arxiv.org/abs/2603.08979
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917327872720896
author	Ramani, Sivaramakrishnan
author_facet	Ramani, Sivaramakrishnan
contents	We consider Markov decision processes (MDPs) with unknown disturbance distribution and address this problem using the robust Markov decision process (RMDP) approach. We construct the empirical distribution of the unknown disturbance distribution and characterize our ambiguity set of distributions as the sublevel set of a nonnegative distance function from the empirical distribution. By connecting the weak convergence of distributions to convergence with respect to the distance function, we prove that the robust optimal value function and the out-of-sample value function converge to the true optimal value function with increasing sample-sizes. We establish that, for finite sample-sizes, the robust optimal value function serves as a high probability upper bound on the out-of-sample value function. We also obtain probabilistic convergence rates, sample complexity bounds, and out-of-distribution performance bounds. The finite sample performance guarantees rely on the distance function satisfying a certain concentration type inequality. Several well-studied distances in the literature meet the requirements imposed on the distance function. We also analyze the data-driven properties of empirical MDPs and demonstrate that, unlike our data-driven RMDPs, empirical MDPs fail to satisfy some of the finite sample performance guarantees.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_08979
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Data-driven robust Markov decision processes on Borel spaces: performance guarantees via an axiomatic approach Ramani, Sivaramakrishnan Optimization and Control Machine Learning We consider Markov decision processes (MDPs) with unknown disturbance distribution and address this problem using the robust Markov decision process (RMDP) approach. We construct the empirical distribution of the unknown disturbance distribution and characterize our ambiguity set of distributions as the sublevel set of a nonnegative distance function from the empirical distribution. By connecting the weak convergence of distributions to convergence with respect to the distance function, we prove that the robust optimal value function and the out-of-sample value function converge to the true optimal value function with increasing sample-sizes. We establish that, for finite sample-sizes, the robust optimal value function serves as a high probability upper bound on the out-of-sample value function. We also obtain probabilistic convergence rates, sample complexity bounds, and out-of-distribution performance bounds. The finite sample performance guarantees rely on the distance function satisfying a certain concentration type inequality. Several well-studied distances in the literature meet the requirements imposed on the distance function. We also analyze the data-driven properties of empirical MDPs and demonstrate that, unlike our data-driven RMDPs, empirical MDPs fail to satisfy some of the finite sample performance guarantees.
title	Data-driven robust Markov decision processes on Borel spaces: performance guarantees via an axiomatic approach
topic	Optimization and Control Machine Learning
url	https://arxiv.org/abs/2603.08979

Similar Items