Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sanders, Justin, Yilmaz, Melih, Russell, Jacob H., Bittremieux, Wout, Fondrie, William E., Riley, Nicholas M., Oh, Sewoong, Noble, William Stafford
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2505.10848
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918023283081216
author	Sanders, Justin Yilmaz, Melih Russell, Jacob H. Bittremieux, Wout Fondrie, William E. Riley, Nicholas M. Oh, Sewoong Noble, William Stafford
author_facet	Sanders, Justin Yilmaz, Melih Russell, Jacob H. Bittremieux, Wout Fondrie, William E. Riley, Nicholas M. Oh, Sewoong Noble, William Stafford
contents	Mass spectrometry is the dominant technology in the field of proteomics, enabling high-throughput analysis of the protein content of complex biological samples. Due to the complexity of the instrumentation and resulting data, sophisticated computational methods are required for the processing and interpretation of acquired mass spectra. Machine learning has shown great promise to improve the analysis of mass spectrometry data, with numerous purpose-built methods for improving specific steps in the data acquisition and analysis pipeline reaching widespread adoption. Here, we propose unifying various spectrum prediction tasks under a single foundation model for mass spectra. To this end, we pre-train a spectrum encoder using de novo sequencing as a pre-training task. We then show that using these pre-trained spectrum representations improves our performance on the four downstream tasks of spectrum quality prediction, chimericity prediction, phosphorylation prediction, and glycosylation status prediction. Finally, we perform multi-task fine-tuning and find that this approach improves the performance on each task individually. Overall, our work demonstrates that a foundation model for tandem mass spectrometry proteomics trained on de novo sequencing learns generalizable representations of spectra, improves performance on downstream tasks where training data is limited, and can ultimately enhance data acquisition and analysis in proteomics experiments.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_10848
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Foundation model for mass spectrometry proteomics Sanders, Justin Yilmaz, Melih Russell, Jacob H. Bittremieux, Wout Fondrie, William E. Riley, Nicholas M. Oh, Sewoong Noble, William Stafford Machine Learning Mass spectrometry is the dominant technology in the field of proteomics, enabling high-throughput analysis of the protein content of complex biological samples. Due to the complexity of the instrumentation and resulting data, sophisticated computational methods are required for the processing and interpretation of acquired mass spectra. Machine learning has shown great promise to improve the analysis of mass spectrometry data, with numerous purpose-built methods for improving specific steps in the data acquisition and analysis pipeline reaching widespread adoption. Here, we propose unifying various spectrum prediction tasks under a single foundation model for mass spectra. To this end, we pre-train a spectrum encoder using de novo sequencing as a pre-training task. We then show that using these pre-trained spectrum representations improves our performance on the four downstream tasks of spectrum quality prediction, chimericity prediction, phosphorylation prediction, and glycosylation status prediction. Finally, we perform multi-task fine-tuning and find that this approach improves the performance on each task individually. Overall, our work demonstrates that a foundation model for tandem mass spectrometry proteomics trained on de novo sequencing learns generalizable representations of spectra, improves performance on downstream tasks where training data is limited, and can ultimately enhance data acquisition and analysis in proteomics experiments.
title	Foundation model for mass spectrometry proteomics
topic	Machine Learning
url	https://arxiv.org/abs/2505.10848

Similar Items