Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Wixson, Troy P., Shaby, Benjamin A., Philtron, Daisy L., Consortium, International Parkinson Disease Genomics, Lima, Leandro A., Wyman, Stacia K., Kaye, Julia A., Finkbeiner, Steven
Format: Preprint
Veröffentlicht: 2024
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2406.05262
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866915947870158848
author Wixson, Troy P.
Shaby, Benjamin A.
Philtron, Daisy L.
Consortium, International Parkinson Disease Genomics
Lima, Leandro A.
Wyman, Stacia K.
Kaye, Julia A.
Finkbeiner, Steven
author_facet Wixson, Troy P.
Shaby, Benjamin A.
Philtron, Daisy L.
Consortium, International Parkinson Disease Genomics
Lima, Leandro A.
Wyman, Stacia K.
Kaye, Julia A.
Finkbeiner, Steven
contents We seek to identify genes involved in Parkinson's Disease (PD) by combining information across different experiment types. Each experiment, taken individually, may contain too little information to distinguish some important genes from incidental ones. However, when experiments are combined using the proposed statistical framework, additional power emerges. The fundamental building block of the family of statistical models that we propose is a hierarchical three-group mixture of distributions. Each gene is modeled probabilistically as belonging to either a null group that is unassociated with PD, a deleterious group, or a beneficial group. This three-group formalism has two key features. By apportioning prior probability of group assignments with a Dirichlet distribution, the resultant posterior group probabilities automatically account for the multiplicity inherent in analyzing many genes simultaneously. By building models for experimental outcomes conditionally on the group labels, any number of data modalities may be combined in a single coherent probability model, allowing information sharing across experiment types. These two features result in parsimonious inference with few false positives, while simultaneously enhancing power to detect signals. Simulations show that our three-groups approach performs at least as well as commonly-used tools for GWAS and RNA-seq, and in some cases it performs better. We apply our proposed approach to publicly-available GWAS and RNA-seq datasets, discovering novel genes that are potential therapeutic targets.
format Preprint
id arxiv_https___arxiv_org_abs_2406_05262
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle A Three-groups Non-local Model for Combining Heterogeneous Data Sources to Identify Genes Associated with Parkinson's Disease
Wixson, Troy P.
Shaby, Benjamin A.
Philtron, Daisy L.
Consortium, International Parkinson Disease Genomics
Lima, Leandro A.
Wyman, Stacia K.
Kaye, Julia A.
Finkbeiner, Steven
Applications
We seek to identify genes involved in Parkinson's Disease (PD) by combining information across different experiment types. Each experiment, taken individually, may contain too little information to distinguish some important genes from incidental ones. However, when experiments are combined using the proposed statistical framework, additional power emerges. The fundamental building block of the family of statistical models that we propose is a hierarchical three-group mixture of distributions. Each gene is modeled probabilistically as belonging to either a null group that is unassociated with PD, a deleterious group, or a beneficial group. This three-group formalism has two key features. By apportioning prior probability of group assignments with a Dirichlet distribution, the resultant posterior group probabilities automatically account for the multiplicity inherent in analyzing many genes simultaneously. By building models for experimental outcomes conditionally on the group labels, any number of data modalities may be combined in a single coherent probability model, allowing information sharing across experiment types. These two features result in parsimonious inference with few false positives, while simultaneously enhancing power to detect signals. Simulations show that our three-groups approach performs at least as well as commonly-used tools for GWAS and RNA-seq, and in some cases it performs better. We apply our proposed approach to publicly-available GWAS and RNA-seq datasets, discovering novel genes that are potential therapeutic targets.
title A Three-groups Non-local Model for Combining Heterogeneous Data Sources to Identify Genes Associated with Parkinson's Disease
topic Applications
url https://arxiv.org/abs/2406.05262