Saved in:
Bibliographic Details
Main Authors: Xu, Ziyang, Zhong, Haitian, He, Bingrui, Wang, Xueying, Lu, Tianchi
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2308.05115
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929289138536448
author Xu, Ziyang
Zhong, Haitian
He, Bingrui
Wang, Xueying
Lu, Tianchi
author_facet Xu, Ziyang
Zhong, Haitian
He, Bingrui
Wang, Xueying
Lu, Tianchi
contents Phosphorylation is pivotal in numerous fundamental cellular processes and plays a significant role in the onset and progression of various diseases. The accurate identification of these phosphorylation sites is crucial for unraveling the molecular mechanisms within cells and during viral infections, potentially leading to the discovery of novel therapeutic targets. In this study, we develop PTransIPs, a new deep learning framework for the identification of phosphorylation sites. Independent testing results demonstrate that PTransIPs outperforms existing state-of-the-art (SOTA) methods, achieving AUCs of 0.9232 and 0.9660 for the identification of phosphorylated S/T and Y sites, respectively. PTransIPs contributes from three aspects. 1) PTransIPs is the first to apply protein pre-trained language model (PLM) embeddings to this task. It utilizes ProtTrans and EMBER2 to extract sequence and structure embeddings, respectively, as additional inputs into the model, effectively addressing issues of dataset size and overfitting, thus enhancing model performance; 2) PTransIPs is based on Transformer architecture, optimized through the integration of convolutional neural networks and TIM loss function, providing practical insights for model design and training; 3) The encoding of amino acids in PTransIPs enables it to serve as a universal framework for other peptide bioactivity tasks, with its excellent performance shown in extended experiments of this paper. Our code, data and models are publicly available at https://github.com/StatXzy7/PTransIPs.
format Preprint
id arxiv_https___arxiv_org_abs_2308_05115
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle PTransIPs: Identification of phosphorylation sites enhanced by protein PLM embeddings
Xu, Ziyang
Zhong, Haitian
He, Bingrui
Wang, Xueying
Lu, Tianchi
Quantitative Methods
Machine Learning
Phosphorylation is pivotal in numerous fundamental cellular processes and plays a significant role in the onset and progression of various diseases. The accurate identification of these phosphorylation sites is crucial for unraveling the molecular mechanisms within cells and during viral infections, potentially leading to the discovery of novel therapeutic targets. In this study, we develop PTransIPs, a new deep learning framework for the identification of phosphorylation sites. Independent testing results demonstrate that PTransIPs outperforms existing state-of-the-art (SOTA) methods, achieving AUCs of 0.9232 and 0.9660 for the identification of phosphorylated S/T and Y sites, respectively. PTransIPs contributes from three aspects. 1) PTransIPs is the first to apply protein pre-trained language model (PLM) embeddings to this task. It utilizes ProtTrans and EMBER2 to extract sequence and structure embeddings, respectively, as additional inputs into the model, effectively addressing issues of dataset size and overfitting, thus enhancing model performance; 2) PTransIPs is based on Transformer architecture, optimized through the integration of convolutional neural networks and TIM loss function, providing practical insights for model design and training; 3) The encoding of amino acids in PTransIPs enables it to serve as a universal framework for other peptide bioactivity tasks, with its excellent performance shown in extended experiments of this paper. Our code, data and models are publicly available at https://github.com/StatXzy7/PTransIPs.
title PTransIPs: Identification of phosphorylation sites enhanced by protein PLM embeddings
topic Quantitative Methods
Machine Learning
url https://arxiv.org/abs/2308.05115