Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Andrew, Bailey, Harris, Erica L., Poulter, James A., Westhead, David R., Cutillo, Luisa
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Genomics
Online Access:	https://arxiv.org/abs/2407.19892
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917349028790272
author	Andrew, Bailey Harris, Erica L. Poulter, James A. Westhead, David R. Cutillo, Luisa
author_facet	Andrew, Bailey Harris, Erica L. Poulter, James A. Westhead, David R. Cutillo, Luisa
contents	Motivation: Networks underlie the generation and interpretation of many biological datasets: gene networks shed light on the regulatory structure of the genome, and cell networks can capture structure of the tumor micro-environment. However, most methods that learn such networks make the faulty 'independence assumption'; to learn the gene network, they assume that no cell network exists. 'Multi-axis' methods, which do not make this assumption, fail to scale beyond a few thousand cells or genes. This limits their applicability to only the smallest datasets. Results: We develop a multi-axis method capable of processing million-cell datasets within minutes. This was previously impossible, and unlocks the use of such methods on modern scRNA-seq datasets, as well as more complex datasets. We show that our method yields novel biological insights from real single-cell data, and compares favorably to the existing hdWGCNA methodology. In particular, it identifies long non-coding RNA genes that potentially have a regulatory or functional role in neuronal development. Availability and implementation: Our methodology is available as a Python package GmGM on PyPI (https://pypi.org/project/GmGM/0.5.3/). The code for all experiments performed in this paper is available on GitHub (https://github.com/BaileyAndrew/GmGM-Bioinformatics). Contact: sceba@leeds.ac.uk Supplementary information: Our proofs, and some additional experiments, are available in the supplementary material. Keywords: gaussian graphical models, multi-axis models, transcriptomics, multi-omics, scalability
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_19892
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Making Multi-Axis Gaussian Graphical Models Scalable to Millions of Cells Andrew, Bailey Harris, Erica L. Poulter, James A. Westhead, David R. Cutillo, Luisa Machine Learning Genomics Motivation: Networks underlie the generation and interpretation of many biological datasets: gene networks shed light on the regulatory structure of the genome, and cell networks can capture structure of the tumor micro-environment. However, most methods that learn such networks make the faulty 'independence assumption'; to learn the gene network, they assume that no cell network exists. 'Multi-axis' methods, which do not make this assumption, fail to scale beyond a few thousand cells or genes. This limits their applicability to only the smallest datasets. Results: We develop a multi-axis method capable of processing million-cell datasets within minutes. This was previously impossible, and unlocks the use of such methods on modern scRNA-seq datasets, as well as more complex datasets. We show that our method yields novel biological insights from real single-cell data, and compares favorably to the existing hdWGCNA methodology. In particular, it identifies long non-coding RNA genes that potentially have a regulatory or functional role in neuronal development. Availability and implementation: Our methodology is available as a Python package GmGM on PyPI (https://pypi.org/project/GmGM/0.5.3/). The code for all experiments performed in this paper is available on GitHub (https://github.com/BaileyAndrew/GmGM-Bioinformatics). Contact: sceba@leeds.ac.uk Supplementary information: Our proofs, and some additional experiments, are available in the supplementary material. Keywords: gaussian graphical models, multi-axis models, transcriptomics, multi-omics, scalability
title	Making Multi-Axis Gaussian Graphical Models Scalable to Millions of Cells
topic	Machine Learning Genomics
url	https://arxiv.org/abs/2407.19892

Similar Items