Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Guan, Zoe, Parmigiani, Giovanni, Patil, Prasad
Format:	Preprint
Published:	2019
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/1905.07382
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917866481123328
author	Guan, Zoe Parmigiani, Giovanni Patil, Prasad
author_facet	Guan, Zoe Parmigiani, Giovanni Patil, Prasad
contents	A critical decision point when training predictors using multiple studies is whether studies should be combined or treated separately. We compare two multi-study prediction approaches in the presence of potential heterogeneity in predictor-outcome relationships across datasets: 1) merging all of the datasets and training a single learner, and 2) multi-study ensembling, which involves training a separate learner on each dataset and combining the predictions resulting from each learner. For ridge regression, we show analytically and confirm via simulation that merging yields lower prediction error than ensembling when the predictor-outcome relationships are relatively homogeneous across studies. However, as cross-study heterogeneity increases, there exists a transition point beyond which ensembling outperforms merging. We provide analytic expressions for the transition point in various scenarios, study asymptotic properties, and illustrate how transition point theory can be used for deciding when studies should be combined with an application from metagenomics.
format	Preprint
id	arxiv_https___arxiv_org_abs_1905_07382
institution	arXiv
publishDate	2019
record_format	arxiv
spellingShingle	Merging versus Ensembling in Multi-Study Prediction: Theoretical Insight from Random Effects Guan, Zoe Parmigiani, Giovanni Patil, Prasad Machine Learning A critical decision point when training predictors using multiple studies is whether studies should be combined or treated separately. We compare two multi-study prediction approaches in the presence of potential heterogeneity in predictor-outcome relationships across datasets: 1) merging all of the datasets and training a single learner, and 2) multi-study ensembling, which involves training a separate learner on each dataset and combining the predictions resulting from each learner. For ridge regression, we show analytically and confirm via simulation that merging yields lower prediction error than ensembling when the predictor-outcome relationships are relatively homogeneous across studies. However, as cross-study heterogeneity increases, there exists a transition point beyond which ensembling outperforms merging. We provide analytic expressions for the transition point in various scenarios, study asymptotic properties, and illustrate how transition point theory can be used for deciding when studies should be combined with an application from metagenomics.
title	Merging versus Ensembling in Multi-Study Prediction: Theoretical Insight from Random Effects
topic	Machine Learning
url	https://arxiv.org/abs/1905.07382

Similar Items