Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.19892 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866917349028790272 |
|---|---|
| author | Andrew, Bailey Harris, Erica L. Poulter, James A. Westhead, David R. Cutillo, Luisa |
| author_facet | Andrew, Bailey Harris, Erica L. Poulter, James A. Westhead, David R. Cutillo, Luisa |
| contents | Motivation: Networks underlie the generation and interpretation of many biological datasets: gene networks shed light on the regulatory structure of the genome, and cell networks can capture structure of the tumor micro-environment. However, most methods that learn such networks make the faulty 'independence assumption'; to learn the gene network, they assume that no cell network exists. 'Multi-axis' methods, which do not make this assumption, fail to scale beyond a few thousand cells or genes. This limits their applicability to only the smallest datasets.
Results: We develop a multi-axis method capable of processing million-cell datasets within minutes. This was previously impossible, and unlocks the use of such methods on modern scRNA-seq datasets, as well as more complex datasets. We show that our method yields novel biological insights from real single-cell data, and compares favorably to the existing hdWGCNA methodology. In particular, it identifies long non-coding RNA genes that potentially have a regulatory or functional role in neuronal development.
Availability and implementation: Our methodology is available as a Python package GmGM on PyPI (https://pypi.org/project/GmGM/0.5.3/). The code for all experiments performed in this paper is available on GitHub (https://github.com/BaileyAndrew/GmGM-Bioinformatics).
Contact: sceba@leeds.ac.uk
Supplementary information: Our proofs, and some additional experiments, are available in the supplementary material.
Keywords: gaussian graphical models, multi-axis models, transcriptomics, multi-omics, scalability |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2407_19892 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Making Multi-Axis Gaussian Graphical Models Scalable to Millions of Cells Andrew, Bailey Harris, Erica L. Poulter, James A. Westhead, David R. Cutillo, Luisa Machine Learning Genomics Motivation: Networks underlie the generation and interpretation of many biological datasets: gene networks shed light on the regulatory structure of the genome, and cell networks can capture structure of the tumor micro-environment. However, most methods that learn such networks make the faulty 'independence assumption'; to learn the gene network, they assume that no cell network exists. 'Multi-axis' methods, which do not make this assumption, fail to scale beyond a few thousand cells or genes. This limits their applicability to only the smallest datasets. Results: We develop a multi-axis method capable of processing million-cell datasets within minutes. This was previously impossible, and unlocks the use of such methods on modern scRNA-seq datasets, as well as more complex datasets. We show that our method yields novel biological insights from real single-cell data, and compares favorably to the existing hdWGCNA methodology. In particular, it identifies long non-coding RNA genes that potentially have a regulatory or functional role in neuronal development. Availability and implementation: Our methodology is available as a Python package GmGM on PyPI (https://pypi.org/project/GmGM/0.5.3/). The code for all experiments performed in this paper is available on GitHub (https://github.com/BaileyAndrew/GmGM-Bioinformatics). Contact: sceba@leeds.ac.uk Supplementary information: Our proofs, and some additional experiments, are available in the supplementary material. Keywords: gaussian graphical models, multi-axis models, transcriptomics, multi-omics, scalability |
| title | Making Multi-Axis Gaussian Graphical Models Scalable to Millions of Cells |
| topic | Machine Learning Genomics |
| url | https://arxiv.org/abs/2407.19892 |