Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liao, Luofeng, Fu, Zuyue, Yang, Zhuoran, Wang, Yixin, Kolar, Mladen, Wang, Zhaoran
Format:	Preprint
Published:	2021
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2102.09907
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916438721167360
author	Liao, Luofeng Fu, Zuyue Yang, Zhuoran Wang, Yixin Kolar, Mladen Wang, Zhaoran
author_facet	Liao, Luofeng Fu, Zuyue Yang, Zhuoran Wang, Yixin Kolar, Mladen Wang, Zhaoran
contents	In offline reinforcement learning (RL) an optimal policy is learned solely from a priori collected observational data. However, in observational data, actions are often confounded by unobserved variables. Instrumental variables (IVs), in the context of RL, are the variables whose influence on the state variables is all mediated by the action. When a valid instrument is present, we can recover the confounded transition dynamics through observational data. We study a confounded Markov decision process where the transition dynamics admit an additive nonlinear functional form. Using IVs, we derive a conditional moment restriction through which we can identify transition dynamics based on observational data. We propose a provably efficient IV-aided Value Iteration (IVVI) algorithm based on a primal-dual reformulation of the conditional moment restriction. To our knowledge, this is the first provably efficient algorithm for instrument-aided offline RL.
format	Preprint
id	arxiv_https___arxiv_org_abs_2102_09907
institution	arXiv
publishDate	2021
record_format	arxiv
spellingShingle	Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning Liao, Luofeng Fu, Zuyue Yang, Zhuoran Wang, Yixin Kolar, Mladen Wang, Zhaoran Machine Learning In offline reinforcement learning (RL) an optimal policy is learned solely from a priori collected observational data. However, in observational data, actions are often confounded by unobserved variables. Instrumental variables (IVs), in the context of RL, are the variables whose influence on the state variables is all mediated by the action. When a valid instrument is present, we can recover the confounded transition dynamics through observational data. We study a confounded Markov decision process where the transition dynamics admit an additive nonlinear functional form. Using IVs, we derive a conditional moment restriction through which we can identify transition dynamics based on observational data. We propose a provably efficient IV-aided Value Iteration (IVVI) algorithm based on a primal-dual reformulation of the conditional moment restriction. To our knowledge, this is the first provably efficient algorithm for instrument-aided offline RL.
title	Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning
topic	Machine Learning
url	https://arxiv.org/abs/2102.09907

Similar Items