Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lyu, Pengfei, Zhang, Xianyang, Cao, Hongyuan
Format:	Preprint
Published:	2024
Subjects:	Methodology
Online Access:	https://arxiv.org/abs/2404.05808
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908337976639488
author	Lyu, Pengfei Zhang, Xianyang Cao, Hongyuan
author_facet	Lyu, Pengfei Zhang, Xianyang Cao, Hongyuan
contents	Testing composite null hypotheses arises in various applications, such as mediation and replicability analyses. The problem becomes more challenging in high-throughput experiments where tens of thousands of features are examined simultaneously. Existing large-scale inference methods for composite null hypothesis testing often fail to explicitly incorporate the dependence structure, producing overly conservative or overly liberal results. In this work, we first develop a four-state hidden Markov model (HMM) to model a bivariate $p$-value sequence from replicability analysis with two studies, accounting for local feature dependence and study heterogeneity. Building on the HMM, we propose a multiple testing procedure that controls the false discovery rate (FDR). Extending the HMM to model the $p$-values from $n$ studies requires a computational cost of exponential order of $n$. To address this challenge, we introduce a novel e-value framework that reduces the computational cost to quadratic growth in the number of studies while maintaining FDR control. We show that the proposed method asymptotically controls the FDR and exhibits higher power numerically than competing methods at the same FDR level. In a real data application to genome-wide association studies (GWAS), our method reveals new biological insights that are overlooked by existing methods.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_05808
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Testing composite null hypotheses with high-dimensional dependent data: a computationally scalable FDR-controlling procedure Lyu, Pengfei Zhang, Xianyang Cao, Hongyuan Methodology Testing composite null hypotheses arises in various applications, such as mediation and replicability analyses. The problem becomes more challenging in high-throughput experiments where tens of thousands of features are examined simultaneously. Existing large-scale inference methods for composite null hypothesis testing often fail to explicitly incorporate the dependence structure, producing overly conservative or overly liberal results. In this work, we first develop a four-state hidden Markov model (HMM) to model a bivariate $p$-value sequence from replicability analysis with two studies, accounting for local feature dependence and study heterogeneity. Building on the HMM, we propose a multiple testing procedure that controls the false discovery rate (FDR). Extending the HMM to model the $p$-values from $n$ studies requires a computational cost of exponential order of $n$. To address this challenge, we introduce a novel e-value framework that reduces the computational cost to quadratic growth in the number of studies while maintaining FDR control. We show that the proposed method asymptotically controls the FDR and exhibits higher power numerically than competing methods at the same FDR level. In a real data application to genome-wide association studies (GWAS), our method reveals new biological insights that are overlooked by existing methods.
title	Testing composite null hypotheses with high-dimensional dependent data: a computationally scalable FDR-controlling procedure
topic	Methodology
url	https://arxiv.org/abs/2404.05808

Similar Items