Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Lipman, Erin, Rodriguez, Abel
Format: Preprint
Veröffentlicht: 2024
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2402.04461
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866911778104934400
author Lipman, Erin
Rodriguez, Abel
author_facet Lipman, Erin
Rodriguez, Abel
contents The most common approach to implementing data analysis pipelines involves obtaining point estimates from the upstream modules and then treating these as known quantities when working with the downstream ones. This approach is straightforward, but it is likely to underestimate the overall uncertainty associated with any final estimates. An alternative approach involves estimating parameters from the modules jointly using a Bayesian hierarchical model, which has the advantage of propagating upstream uncertainty into the downstream estimates. However, when modules are misspecified, such a joint model can behave in unexpected ways. Furthermore, hierarchical models require the development of ad-hoc computational implementations that can be laborious and computationally expensive. Cut inference modifies the posterior distribution to prevent information flow between certain parameters and provides a third alternative for statistical inference in data analysis pipelines. This paper presents a unified framework that encompasses two-step, cut, and joint inference in the context of data analysis pipelines with two modules and uses two examples to illustrate the tradeoffs associated with these approaches. Our work shows that cut inference provides both some level of robustness and ease of implementation for data analysis pipelines at a lower cost in terms of statistical inference.
format Preprint
id arxiv_https___arxiv_org_abs_2402_04461
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle On Data Analysis Pipelines and Modular Bayesian Modeling
Lipman, Erin
Rodriguez, Abel
Methodology
The most common approach to implementing data analysis pipelines involves obtaining point estimates from the upstream modules and then treating these as known quantities when working with the downstream ones. This approach is straightforward, but it is likely to underestimate the overall uncertainty associated with any final estimates. An alternative approach involves estimating parameters from the modules jointly using a Bayesian hierarchical model, which has the advantage of propagating upstream uncertainty into the downstream estimates. However, when modules are misspecified, such a joint model can behave in unexpected ways. Furthermore, hierarchical models require the development of ad-hoc computational implementations that can be laborious and computationally expensive. Cut inference modifies the posterior distribution to prevent information flow between certain parameters and provides a third alternative for statistical inference in data analysis pipelines. This paper presents a unified framework that encompasses two-step, cut, and joint inference in the context of data analysis pipelines with two modules and uses two examples to illustrate the tradeoffs associated with these approaches. Our work shows that cut inference provides both some level of robustness and ease of implementation for data analysis pipelines at a lower cost in terms of statistical inference.
title On Data Analysis Pipelines and Modular Bayesian Modeling
topic Methodology
url https://arxiv.org/abs/2402.04461