Saved in:
Bibliographic Details
Main Authors: Liao, Luofeng, Fu, Zuyue, Yang, Zhuoran, Wang, Yixin, Kolar, Mladen, Wang, Zhaoran
Format: Preprint
Published: 2021
Subjects:
Online Access:https://arxiv.org/abs/2102.09907
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916438721167360
author Liao, Luofeng
Fu, Zuyue
Yang, Zhuoran
Wang, Yixin
Kolar, Mladen
Wang, Zhaoran
author_facet Liao, Luofeng
Fu, Zuyue
Yang, Zhuoran
Wang, Yixin
Kolar, Mladen
Wang, Zhaoran
contents In offline reinforcement learning (RL) an optimal policy is learned solely from a priori collected observational data. However, in observational data, actions are often confounded by unobserved variables. Instrumental variables (IVs), in the context of RL, are the variables whose influence on the state variables is all mediated by the action. When a valid instrument is present, we can recover the confounded transition dynamics through observational data. We study a confounded Markov decision process where the transition dynamics admit an additive nonlinear functional form. Using IVs, we derive a conditional moment restriction through which we can identify transition dynamics based on observational data. We propose a provably efficient IV-aided Value Iteration (IVVI) algorithm based on a primal-dual reformulation of the conditional moment restriction. To our knowledge, this is the first provably efficient algorithm for instrument-aided offline RL.
format Preprint
id arxiv_https___arxiv_org_abs_2102_09907
institution arXiv
publishDate 2021
record_format arxiv
spellingShingle Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning
Liao, Luofeng
Fu, Zuyue
Yang, Zhuoran
Wang, Yixin
Kolar, Mladen
Wang, Zhaoran
Machine Learning
In offline reinforcement learning (RL) an optimal policy is learned solely from a priori collected observational data. However, in observational data, actions are often confounded by unobserved variables. Instrumental variables (IVs), in the context of RL, are the variables whose influence on the state variables is all mediated by the action. When a valid instrument is present, we can recover the confounded transition dynamics through observational data. We study a confounded Markov decision process where the transition dynamics admit an additive nonlinear functional form. Using IVs, we derive a conditional moment restriction through which we can identify transition dynamics based on observational data. We propose a provably efficient IV-aided Value Iteration (IVVI) algorithm based on a primal-dual reformulation of the conditional moment restriction. To our knowledge, this is the first provably efficient algorithm for instrument-aided offline RL.
title Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning
topic Machine Learning
url https://arxiv.org/abs/2102.09907