Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Lipman, Erin, Rodriguez, Abel
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Methodology
Online-Zugang:	https://arxiv.org/abs/2402.04461
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866911778104934400
author	Lipman, Erin Rodriguez, Abel
author_facet	Lipman, Erin Rodriguez, Abel
contents	The most common approach to implementing data analysis pipelines involves obtaining point estimates from the upstream modules and then treating these as known quantities when working with the downstream ones. This approach is straightforward, but it is likely to underestimate the overall uncertainty associated with any final estimates. An alternative approach involves estimating parameters from the modules jointly using a Bayesian hierarchical model, which has the advantage of propagating upstream uncertainty into the downstream estimates. However, when modules are misspecified, such a joint model can behave in unexpected ways. Furthermore, hierarchical models require the development of ad-hoc computational implementations that can be laborious and computationally expensive. Cut inference modifies the posterior distribution to prevent information flow between certain parameters and provides a third alternative for statistical inference in data analysis pipelines. This paper presents a unified framework that encompasses two-step, cut, and joint inference in the context of data analysis pipelines with two modules and uses two examples to illustrate the tradeoffs associated with these approaches. Our work shows that cut inference provides both some level of robustness and ease of implementation for data analysis pipelines at a lower cost in terms of statistical inference.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_04461
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	On Data Analysis Pipelines and Modular Bayesian Modeling Lipman, Erin Rodriguez, Abel Methodology The most common approach to implementing data analysis pipelines involves obtaining point estimates from the upstream modules and then treating these as known quantities when working with the downstream ones. This approach is straightforward, but it is likely to underestimate the overall uncertainty associated with any final estimates. An alternative approach involves estimating parameters from the modules jointly using a Bayesian hierarchical model, which has the advantage of propagating upstream uncertainty into the downstream estimates. However, when modules are misspecified, such a joint model can behave in unexpected ways. Furthermore, hierarchical models require the development of ad-hoc computational implementations that can be laborious and computationally expensive. Cut inference modifies the posterior distribution to prevent information flow between certain parameters and provides a third alternative for statistical inference in data analysis pipelines. This paper presents a unified framework that encompasses two-step, cut, and joint inference in the context of data analysis pipelines with two modules and uses two examples to illustrate the tradeoffs associated with these approaches. Our work shows that cut inference provides both some level of robustness and ease of implementation for data analysis pipelines at a lower cost in terms of statistical inference.
title	On Data Analysis Pipelines and Modular Bayesian Modeling
topic	Methodology
url	https://arxiv.org/abs/2402.04461

Ähnliche Einträge