Saved in:
Bibliographic Details
Main Authors: Lei, Haochen, Li, Yan, Cao, Hongyuan
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2310.09701
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918369047871488
author Lei, Haochen
Li, Yan
Cao, Hongyuan
author_facet Lei, Haochen
Li, Yan
Cao, Hongyuan
contents Identifying signals that replicate across multiple studies is essential for establishing robust scientific evidence, yet existing methods for high-dimensional replicability analysis either rely on restrictive modeling assumptions, are limited to two-study settings, or lack statistical power. We propose a general empirical Bayes framework for multi-study replicability analysis that jointly models summary-level $p$-values while explicitly accounting for between-study heterogeneity. Within each study, non-null $p$-value densities are estimated nonparametrically under monotonicity constraints, enabling flexible and tuning-free inference. For two studies, we develop a local false discovery rate (Lfdr) statistic for the composite null of non-replicability and establish identifiability, consistency, and a cubic-rate convergence of the nonparametric MLE, along with minimax optimality. Extending replicability analysis to $n$ studies typically requires estimating $2^n$ latent configurations, which is computationally infeasible. To address this challenge, we introduce a scalable pairwise rejection strategy that decomposes the exponentially large composite null into disjoint components, yielding linear complexity in the number of studies. We prove asymptotic FDR control under mild regularity conditions and show that Lfdr-based thresholding is power-optimal. Extensive simulations demonstrate that our method provides substantial power gains while maintaining valid FDR control, outperforming state-of-the-art alternatives across a wide range of scenarios. Applying our framework to East Asian- and European-ancestry genome-wide association studies of type 2 diabetes reveals replicable genetic associations that competing approaches fail to detect, illustrating the method's practical utility in large-scale biomedical research.
format Preprint
id arxiv_https___arxiv_org_abs_2310_09701
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle A robust and powerful method for assessing replicability of high dimensional data
Lei, Haochen
Li, Yan
Cao, Hongyuan
Methodology
Identifying signals that replicate across multiple studies is essential for establishing robust scientific evidence, yet existing methods for high-dimensional replicability analysis either rely on restrictive modeling assumptions, are limited to two-study settings, or lack statistical power. We propose a general empirical Bayes framework for multi-study replicability analysis that jointly models summary-level $p$-values while explicitly accounting for between-study heterogeneity. Within each study, non-null $p$-value densities are estimated nonparametrically under monotonicity constraints, enabling flexible and tuning-free inference. For two studies, we develop a local false discovery rate (Lfdr) statistic for the composite null of non-replicability and establish identifiability, consistency, and a cubic-rate convergence of the nonparametric MLE, along with minimax optimality. Extending replicability analysis to $n$ studies typically requires estimating $2^n$ latent configurations, which is computationally infeasible. To address this challenge, we introduce a scalable pairwise rejection strategy that decomposes the exponentially large composite null into disjoint components, yielding linear complexity in the number of studies. We prove asymptotic FDR control under mild regularity conditions and show that Lfdr-based thresholding is power-optimal. Extensive simulations demonstrate that our method provides substantial power gains while maintaining valid FDR control, outperforming state-of-the-art alternatives across a wide range of scenarios. Applying our framework to East Asian- and European-ancestry genome-wide association studies of type 2 diabetes reveals replicable genetic associations that competing approaches fail to detect, illustrating the method's practical utility in large-scale biomedical research.
title A robust and powerful method for assessing replicability of high dimensional data
topic Methodology
url https://arxiv.org/abs/2310.09701