Saved in:
Bibliographic Details
Main Authors: Wu, Peng, Luo, Shanshan, Geng, Zhi
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2311.00528
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910602915479552
author Wu, Peng
Luo, Shanshan
Geng, Zhi
author_facet Wu, Peng
Luo, Shanshan
Geng, Zhi
contents There is growing interest in exploring causal effects in target populations via data combination. However, most approaches are tailored to specific settings and lack comprehensive comparative analyses. In this article, we focus on a typical scenario involving a source dataset and a target dataset. We first design six settings under covariate shift and conduct a comparative analysis by deriving the semiparametric efficiency bounds for the ATE in the target population. We then extend this analysis to six new settings that incorporate both covariate shift and posterior drift. Our study uncovers the key factors that influence efficiency gains and the ``effective sample size" when combining two datasets, with a particular emphasis on the roles of the variance ratio of potential outcomes between datasets and the derivatives of the posterior drift function. To the best of our knowledge, this is the first paper that explicitly explores the role of the posterior drift functions in causal inference. Additionally, we also propose novel methods for conducting sensitivity analysis to address violations of transportability between the two datasets. We empirically validate our findings by constructing locally efficient estimators and conducting extensive simulations. We demonstrate the proposed methods in two real-world applications.
format Preprint
id arxiv_https___arxiv_org_abs_2311_00528
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle On the Comparative Analysis of Average Treatment Effects Estimation via Data Combination
Wu, Peng
Luo, Shanshan
Geng, Zhi
Methodology
There is growing interest in exploring causal effects in target populations via data combination. However, most approaches are tailored to specific settings and lack comprehensive comparative analyses. In this article, we focus on a typical scenario involving a source dataset and a target dataset. We first design six settings under covariate shift and conduct a comparative analysis by deriving the semiparametric efficiency bounds for the ATE in the target population. We then extend this analysis to six new settings that incorporate both covariate shift and posterior drift. Our study uncovers the key factors that influence efficiency gains and the ``effective sample size" when combining two datasets, with a particular emphasis on the roles of the variance ratio of potential outcomes between datasets and the derivatives of the posterior drift function. To the best of our knowledge, this is the first paper that explicitly explores the role of the posterior drift functions in causal inference. Additionally, we also propose novel methods for conducting sensitivity analysis to address violations of transportability between the two datasets. We empirically validate our findings by constructing locally efficient estimators and conducting extensive simulations. We demonstrate the proposed methods in two real-world applications.
title On the Comparative Analysis of Average Treatment Effects Estimation via Data Combination
topic Methodology
url https://arxiv.org/abs/2311.00528