Saved in:
Bibliographic Details
Main Authors: Andrew, Bailey, Harris, Erica L., Poulter, James A., Westhead, David R., Cutillo, Luisa
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.19892
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917349028790272
author Andrew, Bailey
Harris, Erica L.
Poulter, James A.
Westhead, David R.
Cutillo, Luisa
author_facet Andrew, Bailey
Harris, Erica L.
Poulter, James A.
Westhead, David R.
Cutillo, Luisa
contents Motivation: Networks underlie the generation and interpretation of many biological datasets: gene networks shed light on the regulatory structure of the genome, and cell networks can capture structure of the tumor micro-environment. However, most methods that learn such networks make the faulty 'independence assumption'; to learn the gene network, they assume that no cell network exists. 'Multi-axis' methods, which do not make this assumption, fail to scale beyond a few thousand cells or genes. This limits their applicability to only the smallest datasets. Results: We develop a multi-axis method capable of processing million-cell datasets within minutes. This was previously impossible, and unlocks the use of such methods on modern scRNA-seq datasets, as well as more complex datasets. We show that our method yields novel biological insights from real single-cell data, and compares favorably to the existing hdWGCNA methodology. In particular, it identifies long non-coding RNA genes that potentially have a regulatory or functional role in neuronal development. Availability and implementation: Our methodology is available as a Python package GmGM on PyPI (https://pypi.org/project/GmGM/0.5.3/). The code for all experiments performed in this paper is available on GitHub (https://github.com/BaileyAndrew/GmGM-Bioinformatics). Contact: sceba@leeds.ac.uk Supplementary information: Our proofs, and some additional experiments, are available in the supplementary material. Keywords: gaussian graphical models, multi-axis models, transcriptomics, multi-omics, scalability
format Preprint
id arxiv_https___arxiv_org_abs_2407_19892
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Making Multi-Axis Gaussian Graphical Models Scalable to Millions of Cells
Andrew, Bailey
Harris, Erica L.
Poulter, James A.
Westhead, David R.
Cutillo, Luisa
Machine Learning
Genomics
Motivation: Networks underlie the generation and interpretation of many biological datasets: gene networks shed light on the regulatory structure of the genome, and cell networks can capture structure of the tumor micro-environment. However, most methods that learn such networks make the faulty 'independence assumption'; to learn the gene network, they assume that no cell network exists. 'Multi-axis' methods, which do not make this assumption, fail to scale beyond a few thousand cells or genes. This limits their applicability to only the smallest datasets. Results: We develop a multi-axis method capable of processing million-cell datasets within minutes. This was previously impossible, and unlocks the use of such methods on modern scRNA-seq datasets, as well as more complex datasets. We show that our method yields novel biological insights from real single-cell data, and compares favorably to the existing hdWGCNA methodology. In particular, it identifies long non-coding RNA genes that potentially have a regulatory or functional role in neuronal development. Availability and implementation: Our methodology is available as a Python package GmGM on PyPI (https://pypi.org/project/GmGM/0.5.3/). The code for all experiments performed in this paper is available on GitHub (https://github.com/BaileyAndrew/GmGM-Bioinformatics). Contact: sceba@leeds.ac.uk Supplementary information: Our proofs, and some additional experiments, are available in the supplementary material. Keywords: gaussian graphical models, multi-axis models, transcriptomics, multi-omics, scalability
title Making Multi-Axis Gaussian Graphical Models Scalable to Millions of Cells
topic Machine Learning
Genomics
url https://arxiv.org/abs/2407.19892