Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhu, Youheng, Lu, Yiping
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Optimization and Control
Online Access:	https://arxiv.org/abs/2603.03191
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914366148837376
author	Zhu, Youheng Lu, Yiping
author_facet	Zhu, Youheng Lu, Yiping
contents	In off policy evaluation (OPE) for partially observable Markov decision processes (POMDPs), an agent must infer hidden states from past observations, which exacerbates both the curse of horizon and the curse of memory in existing OPE methods. This paper introduces a novel covering analysis framework that exploits the intrinsic metric structure of the belief space (distributions over latent states) to relax traditional coverage assumptions. By assuming value relevant functions are Lipschitz continuous in the belief space, we derive error bounds that mitigate exponential blow ups in horizon and memory length. Our unified analysis technique applies to a broad class of OPE algorithms, yielding concrete error bounds and coverage requirements expressed in terms of belief space metrics rather than raw history coverage. We illustrate the improved sample efficiency of this framework via case studies: the double sampling Bellman error minimization algorithm, and the memory based future dependent value functions (FDVF). In both cases, our coverage definition based on the belief space metric yields tighter bounds.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_03191
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	A Covering Framework for Offline POMDPs Learning using Belief Space Metric Zhu, Youheng Lu, Yiping Machine Learning Optimization and Control In off policy evaluation (OPE) for partially observable Markov decision processes (POMDPs), an agent must infer hidden states from past observations, which exacerbates both the curse of horizon and the curse of memory in existing OPE methods. This paper introduces a novel covering analysis framework that exploits the intrinsic metric structure of the belief space (distributions over latent states) to relax traditional coverage assumptions. By assuming value relevant functions are Lipschitz continuous in the belief space, we derive error bounds that mitigate exponential blow ups in horizon and memory length. Our unified analysis technique applies to a broad class of OPE algorithms, yielding concrete error bounds and coverage requirements expressed in terms of belief space metrics rather than raw history coverage. We illustrate the improved sample efficiency of this framework via case studies: the double sampling Bellman error minimization algorithm, and the memory based future dependent value functions (FDVF). In both cases, our coverage definition based on the belief space metric yields tighter bounds.
title	A Covering Framework for Offline POMDPs Learning using Belief Space Metric
topic	Machine Learning Optimization and Control
url	https://arxiv.org/abs/2603.03191

Similar Items