Saved in:
Bibliographic Details
Main Authors: Roy, Asmita, Chen, Jun, Zhang, Xianyang
Format: Preprint
Published: 2022
Subjects:
Online Access:https://arxiv.org/abs/2205.11617
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Genomic data are subject to various sources of confounding, such as demographic variables, biological heterogeneity, and batch effects. To identify genomic features associated with a variable of interest in the presence of confounders, the traditional approach involves fitting a confounder-adjusted regression model to each genomic feature, followed by multiplicity correction. This study shows that the traditional approach was sub-optimal and proposes a new two-dimensional false discovery rate control framework (2dFDR+) that provides significant power improvement over the conventional method and applies to a wide range of settings. 2dFDR+ uses marginal independence test statistics as auxiliary information to filter out less promising features, and FDR control is performed based on conditional independence test statistics in the remaining features. 2dFDR+ provides (asymptotically) valid inference from samples in settings where the conditional distribution of the genomic variables given the covariate of interest and the confounders is arbitrary and completely unknown. To achieve this goal, our method requires the conditional distribution of the covariate given the confounders to be known or can be estimated from the data. We develop a new procedure to simultaneously select the two cutoff values for the marginal and conditional independence test statistics. 2dFDR+ is proved to offer asymptotic FDR control and dominate the power of the traditional procedure. Promising finite sample performance is demonstrated via extensive simulations and real data applications.