Saved in:
Bibliographic Details
Main Authors: Clausen, David S, Teichman, Sarah, Willis, Amy D
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.05231
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913056717537280
author Clausen, David S
Teichman, Sarah
Willis, Amy D
author_facet Clausen, David S
Teichman, Sarah
Willis, Amy D
contents We consider the problem of estimating fold-changes in the expected value of a multivariate outcome observed with unknown sample-specific and category-specific perturbations. This challenge arises in high-throughput sequencing studies of the abundance of microbial taxa because microbes are systematically over- and under-detected relative to their true abundances. Our model admits a partially identifiable estimand, and we establish full identifiability by imposing interpretable parameter constraints. To reduce bias and guarantee the existence of estimators in the presence of sparse observations, we apply an asymptotically negligible and constraint-invariant penalty to our estimating function. We develop a fast coordinate descent algorithm for estimation, and an augmented Lagrangian algorithm for estimation under null hypotheses. We construct a model-robust score test and demonstrate valid inference even for small sample sizes and violated distributional assumptions. The flexibility of the approach and comparisons to related methods are illustrated through a meta-analysis of microbial associations with colorectal cancer.
format Preprint
id arxiv_https___arxiv_org_abs_2402_05231
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Estimating Fold Changes from Partially Observed Outcomes with Applications in Microbial Metagenomics
Clausen, David S
Teichman, Sarah
Willis, Amy D
Methodology
Applications
We consider the problem of estimating fold-changes in the expected value of a multivariate outcome observed with unknown sample-specific and category-specific perturbations. This challenge arises in high-throughput sequencing studies of the abundance of microbial taxa because microbes are systematically over- and under-detected relative to their true abundances. Our model admits a partially identifiable estimand, and we establish full identifiability by imposing interpretable parameter constraints. To reduce bias and guarantee the existence of estimators in the presence of sparse observations, we apply an asymptotically negligible and constraint-invariant penalty to our estimating function. We develop a fast coordinate descent algorithm for estimation, and an augmented Lagrangian algorithm for estimation under null hypotheses. We construct a model-robust score test and demonstrate valid inference even for small sample sizes and violated distributional assumptions. The flexibility of the approach and comparisons to related methods are illustrated through a meta-analysis of microbial associations with colorectal cancer.
title Estimating Fold Changes from Partially Observed Outcomes with Applications in Microbial Metagenomics
topic Methodology
Applications
url https://arxiv.org/abs/2402.05231