Saved in:
Bibliographic Details
Main Authors: Young, Adamo, Wang, Fei, Wishart, David S, Wang, Bo, Greiner, Russell, Röst, Hannes
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2404.02360
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914007267409920
author Young, Adamo
Wang, Fei
Wishart, David S
Wang, Bo
Greiner, Russell
Röst, Hannes
author_facet Young, Adamo
Wang, Fei
Wishart, David S
Wang, Bo
Greiner, Russell
Röst, Hannes
contents Compound identification from tandem mass spectrometry (MS/MS) data is a critical step in the analysis of complex mixtures. Typical solutions for the MS/MS spectrum to compound (MS2C) problem involve comparing the unknown spectrum against a library of known spectrum-molecule pairs, an approach that is limited by incomplete library coverage. Compound to MS/MS spectrum (C2MS) models can improve retrieval rates by augmenting real libraries with predicted MS/MS spectra. Unfortunately, many existing C2MS models suffer from problems with mass accuracy, generalization, or interpretability. We develop a new probabilistic method for C2MS prediction, FraGNNet, that can efficiently and accurately simulate MS/MS spectra with high mass accuracy. Our approach formulates the C2MS problem as learning a distribution over molecule fragments. FraGNNet achieves state-of-the-art performance in terms of prediction error and surpasses existing C2MS models as a tool for retrieval-based MS2C.
format Preprint
id arxiv_https___arxiv_org_abs_2404_02360
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle FraGNNet: A Deep Probabilistic Model for Tandem Mass Spectrum Prediction
Young, Adamo
Wang, Fei
Wishart, David S
Wang, Bo
Greiner, Russell
Röst, Hannes
Machine Learning
Biomolecules
Compound identification from tandem mass spectrometry (MS/MS) data is a critical step in the analysis of complex mixtures. Typical solutions for the MS/MS spectrum to compound (MS2C) problem involve comparing the unknown spectrum against a library of known spectrum-molecule pairs, an approach that is limited by incomplete library coverage. Compound to MS/MS spectrum (C2MS) models can improve retrieval rates by augmenting real libraries with predicted MS/MS spectra. Unfortunately, many existing C2MS models suffer from problems with mass accuracy, generalization, or interpretability. We develop a new probabilistic method for C2MS prediction, FraGNNet, that can efficiently and accurately simulate MS/MS spectra with high mass accuracy. Our approach formulates the C2MS problem as learning a distribution over molecule fragments. FraGNNet achieves state-of-the-art performance in terms of prediction error and surpasses existing C2MS models as a tool for retrieval-based MS2C.
title FraGNNet: A Deep Probabilistic Model for Tandem Mass Spectrum Prediction
topic Machine Learning
Biomolecules
url https://arxiv.org/abs/2404.02360