Saved in:
Bibliographic Details
Main Authors: Bhattacharyya, Riddhiman, Chakrabarty, Sayak, Banerjee, Imon
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.03393
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918482919030784
author Bhattacharyya, Riddhiman
Chakrabarty, Sayak
Banerjee, Imon
author_facet Bhattacharyya, Riddhiman
Chakrabarty, Sayak
Banerjee, Imon
contents Contextual MDPs are powerful tools with wide applicability in areas from biostatistics to machine learning. However, specializing them to offline datasets has been challenging due to a lack of robust, theoretically backed methods. Our work tackles this problem by introducing a new approach towards adaptive estimation and cost optimization of contextual MDPs. This estimator, to the best of our knowledge, is the first of its kind, and is endowed with strong optimality guarantees. We achieve this by overcoming the key technical challenges evolving from the endogenous properties of contextual MDPs; such as non-stationarity, or model irregularity. Our guarantees are established under complete generality by utilizing the relatively recent and powerful statistical technique of $T$-estimation (Baraud, 2011). We first provide a procedure for selecting an estimator given a sample from a contextual MDP and use it to derive oracle risk bounds under two distinct, but nevertheless meaningful, loss functions. We then consider the problem of determining the optimal control with the aid of the aforementioned density estimate and provide finite sample guarantees for the cost function.
format Preprint
id arxiv_https___arxiv_org_abs_2605_03393
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Adaptive Estimation and Optimal Control in Offline Contextual MDPs without Stationarity
Bhattacharyya, Riddhiman
Chakrabarty, Sayak
Banerjee, Imon
Machine Learning
Contextual MDPs are powerful tools with wide applicability in areas from biostatistics to machine learning. However, specializing them to offline datasets has been challenging due to a lack of robust, theoretically backed methods. Our work tackles this problem by introducing a new approach towards adaptive estimation and cost optimization of contextual MDPs. This estimator, to the best of our knowledge, is the first of its kind, and is endowed with strong optimality guarantees. We achieve this by overcoming the key technical challenges evolving from the endogenous properties of contextual MDPs; such as non-stationarity, or model irregularity. Our guarantees are established under complete generality by utilizing the relatively recent and powerful statistical technique of $T$-estimation (Baraud, 2011). We first provide a procedure for selecting an estimator given a sample from a contextual MDP and use it to derive oracle risk bounds under two distinct, but nevertheless meaningful, loss functions. We then consider the problem of determining the optimal control with the aid of the aforementioned density estimate and provide finite sample guarantees for the cost function.
title Adaptive Estimation and Optimal Control in Offline Contextual MDPs without Stationarity
topic Machine Learning
url https://arxiv.org/abs/2605.03393