Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Bhattacharyya, Riddhiman, Chakrabarty, Sayak, Banerjee, Imon
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2605.03393
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918482919030784
author	Bhattacharyya, Riddhiman Chakrabarty, Sayak Banerjee, Imon
author_facet	Bhattacharyya, Riddhiman Chakrabarty, Sayak Banerjee, Imon
contents	Contextual MDPs are powerful tools with wide applicability in areas from biostatistics to machine learning. However, specializing them to offline datasets has been challenging due to a lack of robust, theoretically backed methods. Our work tackles this problem by introducing a new approach towards adaptive estimation and cost optimization of contextual MDPs. This estimator, to the best of our knowledge, is the first of its kind, and is endowed with strong optimality guarantees. We achieve this by overcoming the key technical challenges evolving from the endogenous properties of contextual MDPs; such as non-stationarity, or model irregularity. Our guarantees are established under complete generality by utilizing the relatively recent and powerful statistical technique of $T$-estimation (Baraud, 2011). We first provide a procedure for selecting an estimator given a sample from a contextual MDP and use it to derive oracle risk bounds under two distinct, but nevertheless meaningful, loss functions. We then consider the problem of determining the optimal control with the aid of the aforementioned density estimate and provide finite sample guarantees for the cost function.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_03393
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Adaptive Estimation and Optimal Control in Offline Contextual MDPs without Stationarity Bhattacharyya, Riddhiman Chakrabarty, Sayak Banerjee, Imon Machine Learning Contextual MDPs are powerful tools with wide applicability in areas from biostatistics to machine learning. However, specializing them to offline datasets has been challenging due to a lack of robust, theoretically backed methods. Our work tackles this problem by introducing a new approach towards adaptive estimation and cost optimization of contextual MDPs. This estimator, to the best of our knowledge, is the first of its kind, and is endowed with strong optimality guarantees. We achieve this by overcoming the key technical challenges evolving from the endogenous properties of contextual MDPs; such as non-stationarity, or model irregularity. Our guarantees are established under complete generality by utilizing the relatively recent and powerful statistical technique of $T$-estimation (Baraud, 2011). We first provide a procedure for selecting an estimator given a sample from a contextual MDP and use it to derive oracle risk bounds under two distinct, but nevertheless meaningful, loss functions. We then consider the problem of determining the optimal control with the aid of the aforementioned density estimate and provide finite sample guarantees for the cost function.
title	Adaptive Estimation and Optimal Control in Offline Contextual MDPs without Stationarity
topic	Machine Learning
url	https://arxiv.org/abs/2605.03393

Similar Items