Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Jin, Luo, Ye, Wang, Zigan, Zhang, Xiaowei
Format:	Preprint
Published:	2021
Subjects:	Machine Learning Econometrics Optimization and Control
Online Access:	https://arxiv.org/abs/2103.04021
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910760080244736
author	Li, Jin Luo, Ye Wang, Zigan Zhang, Xiaowei
author_facet	Li, Jin Luo, Ye Wang, Zigan Zhang, Xiaowei
contents	In the standard data analysis framework, data is collected (once and for all), and then data analysis is carried out. However, with the advancement of digital technology, decision-makers constantly analyze past data and generate new data through their decisions. We model this as a Markov decision process and show that the dynamic interaction between data generation and data analysis leads to a new type of bias -- reinforcement bias -- that exacerbates the endogeneity problem in standard data analysis. We propose a class of instrument variable (IV)-based reinforcement learning (RL) algorithms to correct for the bias and establish their theoretical properties by incorporating them into a stochastic approximation (SA) framework. Our analysis accommodates iterate-dependent Markovian structures and, therefore, can be used to study RL algorithms with policy improvement. We also provide formulas for inference on optimal policies of the IV-RL algorithms. These formulas highlight how intertemporal dependencies of the Markovian environment affect the inference.
format	Preprint
id	arxiv_https___arxiv_org_abs_2103_04021
institution	arXiv
publishDate	2021
record_format	arxiv
spellingShingle	Asymptotic Theory for IV-Based Reinforcement Learning with Potential Endogeneity Li, Jin Luo, Ye Wang, Zigan Zhang, Xiaowei Machine Learning Econometrics Optimization and Control In the standard data analysis framework, data is collected (once and for all), and then data analysis is carried out. However, with the advancement of digital technology, decision-makers constantly analyze past data and generate new data through their decisions. We model this as a Markov decision process and show that the dynamic interaction between data generation and data analysis leads to a new type of bias -- reinforcement bias -- that exacerbates the endogeneity problem in standard data analysis. We propose a class of instrument variable (IV)-based reinforcement learning (RL) algorithms to correct for the bias and establish their theoretical properties by incorporating them into a stochastic approximation (SA) framework. Our analysis accommodates iterate-dependent Markovian structures and, therefore, can be used to study RL algorithms with policy improvement. We also provide formulas for inference on optimal policies of the IV-RL algorithms. These formulas highlight how intertemporal dependencies of the Markovian environment affect the inference.
title	Asymptotic Theory for IV-Based Reinforcement Learning with Potential Endogeneity
topic	Machine Learning Econometrics Optimization and Control
url	https://arxiv.org/abs/2103.04021

Similar Items