Saved in:
Bibliographic Details
Main Authors: Nie, Allen, Chandak, Yash, Yuan, Christina J., Badrinath, Anirudhan, Flet-Berliac, Yannis, Brunskil, Emma
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2405.17708
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910679659708416
author Nie, Allen
Chandak, Yash
Yuan, Christina J.
Badrinath, Anirudhan
Flet-Berliac, Yannis
Brunskil, Emma
author_facet Nie, Allen
Chandak, Yash
Yuan, Christina J.
Badrinath, Anirudhan
Flet-Berliac, Yannis
Brunskil, Emma
contents Offline policy evaluation (OPE) allows us to evaluate and estimate a new sequential decision-making policy's performance by leveraging historical interaction data collected from other policies. Evaluating a new policy online without a confident estimate of its performance can lead to costly, unsafe, or hazardous outcomes, especially in education and healthcare. Several OPE estimators have been proposed in the last decade, many of which have hyperparameters and require training. Unfortunately, choosing the best OPE algorithm for each task and domain is still unclear. In this paper, we propose a new algorithm that adaptively blends a set of OPE estimators given a dataset without relying on an explicit selection using a statistical procedure. We prove that our estimator is consistent and satisfies several desirable properties for policy evaluation. Additionally, we demonstrate that when compared to alternative approaches, our estimator can be used to select higher-performing policies in healthcare and robotics. Our work contributes to improving ease of use for a general-purpose, estimator-agnostic, off-policy evaluation framework for offline RL.
format Preprint
id arxiv_https___arxiv_org_abs_2405_17708
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators
Nie, Allen
Chandak, Yash
Yuan, Christina J.
Badrinath, Anirudhan
Flet-Berliac, Yannis
Brunskil, Emma
Machine Learning
Artificial Intelligence
Offline policy evaluation (OPE) allows us to evaluate and estimate a new sequential decision-making policy's performance by leveraging historical interaction data collected from other policies. Evaluating a new policy online without a confident estimate of its performance can lead to costly, unsafe, or hazardous outcomes, especially in education and healthcare. Several OPE estimators have been proposed in the last decade, many of which have hyperparameters and require training. Unfortunately, choosing the best OPE algorithm for each task and domain is still unclear. In this paper, we propose a new algorithm that adaptively blends a set of OPE estimators given a dataset without relying on an explicit selection using a statistical procedure. We prove that our estimator is consistent and satisfies several desirable properties for policy evaluation. Additionally, we demonstrate that when compared to alternative approaches, our estimator can be used to select higher-performing policies in healthcare and robotics. Our work contributes to improving ease of use for a general-purpose, estimator-agnostic, off-policy evaluation framework for offline RL.
title OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2405.17708