Saved in:
Bibliographic Details
Main Authors: Nishant Kumar, Choudhury, Shubham, Raghava, Gajendra
Format: Recurso digital
Language:
Published: Zenodo 2026
Online Access:https://doi.org/10.5281/zenodo.19916375
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866901603004448768
author Nishant Kumar
Choudhury, Shubham
Raghava, Gajendra
author_facet Nishant Kumar
Choudhury, Shubham
Raghava, Gajendra
contents <p class="ds-markdown-paragraph"><strong>Title:</strong><br>AFProPred Dataset – Experimentally validated antifreeze proteins (AFPs) and non‑AFPs from reviewed UniProt entries</p> <p class="ds-markdown-paragraph"><strong>Description:</strong></p> <p class="ds-markdown-paragraph"><strong>Project:</strong> AFProPred – Prediction of antifreeze proteins using machine learning and evolutionary information</p> <p class="ds-markdown-paragraph"><strong>Publication:</strong> Kumar, N., Patiyal, S., Choudhury, S., Bajiya, N., & Raghava, G.P.S. (2025). AFProPred: Prediction of antifreeze proteins using machine learning and evolutionary information. <em>Proteomics</em>, e202400157. <a href="https://doi.org/10.1002/pmic.202400157" rel="noopener noreferrer">https://doi.org/10.1002/pmic.202400157</a></p> <p class="ds-markdown-paragraph"><strong>Overview:</strong> This dataset accompanies AFProPred, a machine learning method for predicting antifreeze proteins (AFPs). AFPs enable organisms (fish, insects, fungi, bacteria) to survive in sub‑zero temperatures via thermal hysteresis and ice recrystallisation inhibition, with applications in food preservation, medicine, and cryosurgery. Unlike existing methods evaluated on unreviewed data, this study uses a validation dataset of <strong>reviewed (Swiss‑Prot)</strong> AFPs and non‑AFPs.</p> <p class="ds-markdown-paragraph"><strong>Content:</strong></p> <div class="ds-scroll-area ds-scroll-area--show-on-focus-within _1210dd7 c03cafe9 _5ac647c"> <div class="ds-scroll-area__gutters"> <div class="ds-scroll-area__vertical-gutter"> </div> </div> <table> <tbody><tr> <th>Dataset</th> <th>AFPs</th> <th>Non‑AFPs</th> <th>Source</th> </tr> </tbody><tbody> <tr> <td><strong>Main (training)</strong></td> <td>8,134</td> <td>9,439</td> <td>UniProt (unreviewed) + AFP‑Pred</td> </tr> <tr> <td><strong>Validation (independent)</strong></td> <td>80</td> <td>73</td> <td><strong>Swiss‑Prot (reviewed)</strong> – keyword: "antifreeze protein" vs. "NOT_antifreeze_protein"</td> </tr> </tbody> </table> </div> <p class="ds-markdown-paragraph"><strong>Validation set length range:</strong> 16–2,439 amino acids (CD‑HIT 40% redundancy reduction)</p> <p class="ds-markdown-paragraph"><strong>Key Findings – Compositional analysis (AFPs enriched in):</strong> Alanine (A), Isoleucine (I), Valine (V), Threonine (T) – Thr increases AFP activity by adding hydrogen bonds to surface area</p> <p class="ds-markdown-paragraph"><strong>Best Model Performance (validation set – 80 AFPs + 73 non‑AFPs, reviewed):</strong></p> <div class="ds-scroll-area ds-scroll-area--show-on-focus-within _1210dd7 c03cafe9 _5ac647c"> <div class="ds-scroll-area__gutters"> <div class="ds-scroll-area__vertical-gutter"> </div> </div> <table> <tbody><tr> <th>Model</th> <th>Features</th> <th>AUC</th> <th>MCC</th> <th>Accuracy</th> </tr> </tbody><tbody> <tr> <td><strong>ET</strong></td> <td>PSSM + AAC</td> <td><strong>0.93</strong></td> <td><strong>0.77</strong></td> <td><strong>88.2%</strong></td> </tr> <tr> <td>RF</td> <td>PSSM + AAC</td> <td>0.91</td> <td>0.64</td> <td>81.7%</td> </tr> <tr> <td>ET</td> <td>150 selected (mRMR)</td> <td>0.90</td> <td>0.69</td> <td>84.3%</td> </tr> <tr> <td>XGB</td> <td>AAC only</td> <td>0.89</td> <td>0.63</td> <td>81.7%</td> </tr> </tbody> </table> </div> <p class="ds-markdown-paragraph"><strong>Comparison with existing methods (same validation dataset – reviewed):</strong></p> <div class="ds-scroll-area ds-scroll-area--show-on-focus-within _1210dd7 c03cafe9 _5ac647c"> <div class="ds-scroll-area__gutters"> <div class="ds-scroll-area__vertical-gutter"> </div> </div> <table> <tbody><tr> <th>Method</th> <th>AUC</th> <th>MCC</th> <th>Accuracy</th> </tr> </tbody><tbody> <tr> <td><strong>AFProPred (ET + PSSM+AAC)</strong></td> <td><strong>0.93</strong></td> <td><strong>0.77</strong></td> <td><strong>88.2%</strong></td> </tr> <tr> <td>AFP‑CKSAAP (2019)</td> <td>0.89</td> <td>0.65</td> <td>82.0%</td> </tr> <tr> <td>AFP‑LSE (2020)</td> <td>—</td> <td>0.48</td> <td>74.0%</td> </tr> <tr> <td>CryoProtect (2017)</td> <td>0.61</td> <td>0.23</td> <td>60.1%</td> </tr> <tr> <td>AFP‑SRC (2022)</td> <td>—</td> <td>0.14</td> <td>57.0%</td> </tr> </tbody> </table> </div> <p class="ds-markdown-paragraph"><strong>Alignment‑based methods (BLAST, MERCI motifs) failed due to poor coverage – ML models essential.</strong></p> <p class="ds-markdown-paragraph"><strong>Data Curation & Quality Control:</strong></p> <ul> <li> <p class="ds-markdown-paragraph"><strong>Validation set:</strong> Swiss‑Prot reviewed entries only (manually curated)</p> </li> <li> <p class="ds-markdown-paragraph"><strong>Training set:</strong> Unreviewed UniProt + AFP‑Pred (CD‑HIT 40% identity)</p> </li> <li> <p class="ds-markdown-paragraph"><strong>Length filter:</strong> 16–2,439 amino acids</p> </li> <li> <p class="ds-markdown-paragraph"><strong>Evolutionary features:</strong> PSSM generated via PSI‑BLAST (Swiss‑Prot, 3 iterations)</p> </li> <li> <p class="ds-markdown-paragraph"><strong>Feature selection:</strong> mRMR (minimum redundancy maximum relevance)</p> </li> </ul> <p class="ds-markdown-paragraph"><strong>Usage:</strong> Predicting antifreeze proteins for food preservation, cryopreservation, and medical applications, scanning protein sequences for AFP regions, designing AFP mutants.</p> <p class="ds-markdown-paragraph"><strong>Related Resources:</strong> Web server: <a href="https://webs.iiitd.edu.in/raghava/afpropred/" rel="noopener noreferrer">https://webs.iiitd.edu.in/raghava/afpropred/</a> | GitHub: <a href="https://github.com/raghavagps/afpropred" rel="noopener noreferrer">https://github.com/raghavagps/afpropred</a></p> <p class="ds-markdown-paragraph"><strong>Contact:</strong> raghava@iiitd.ac.in (Gajendra P. S. Raghava)</p>
format Recurso digital
id zenodo_https___doi_org_10_5281_zenodo_19916375
institution Zenodo
language
publishDate 2026
publisher Zenodo
record_format zenodo
spellingShingle AFProPred – Prediction of antifreeze proteins using machine learning and evolutionary information
Nishant Kumar
Choudhury, Shubham
Raghava, Gajendra
<p class="ds-markdown-paragraph"><strong>Title:</strong><br>AFProPred Dataset – Experimentally validated antifreeze proteins (AFPs) and non‑AFPs from reviewed UniProt entries</p> <p class="ds-markdown-paragraph"><strong>Description:</strong></p> <p class="ds-markdown-paragraph"><strong>Project:</strong> AFProPred – Prediction of antifreeze proteins using machine learning and evolutionary information</p> <p class="ds-markdown-paragraph"><strong>Publication:</strong> Kumar, N., Patiyal, S., Choudhury, S., Bajiya, N., & Raghava, G.P.S. (2025). AFProPred: Prediction of antifreeze proteins using machine learning and evolutionary information. <em>Proteomics</em>, e202400157. <a href="https://doi.org/10.1002/pmic.202400157" rel="noopener noreferrer">https://doi.org/10.1002/pmic.202400157</a></p> <p class="ds-markdown-paragraph"><strong>Overview:</strong> This dataset accompanies AFProPred, a machine learning method for predicting antifreeze proteins (AFPs). AFPs enable organisms (fish, insects, fungi, bacteria) to survive in sub‑zero temperatures via thermal hysteresis and ice recrystallisation inhibition, with applications in food preservation, medicine, and cryosurgery. Unlike existing methods evaluated on unreviewed data, this study uses a validation dataset of <strong>reviewed (Swiss‑Prot)</strong> AFPs and non‑AFPs.</p> <p class="ds-markdown-paragraph"><strong>Content:</strong></p> <div class="ds-scroll-area ds-scroll-area--show-on-focus-within _1210dd7 c03cafe9 _5ac647c"> <div class="ds-scroll-area__gutters"> <div class="ds-scroll-area__vertical-gutter"> </div> </div> <table> <tbody><tr> <th>Dataset</th> <th>AFPs</th> <th>Non‑AFPs</th> <th>Source</th> </tr> </tbody><tbody> <tr> <td><strong>Main (training)</strong></td> <td>8,134</td> <td>9,439</td> <td>UniProt (unreviewed) + AFP‑Pred</td> </tr> <tr> <td><strong>Validation (independent)</strong></td> <td>80</td> <td>73</td> <td><strong>Swiss‑Prot (reviewed)</strong> – keyword: "antifreeze protein" vs. "NOT_antifreeze_protein"</td> </tr> </tbody> </table> </div> <p class="ds-markdown-paragraph"><strong>Validation set length range:</strong> 16–2,439 amino acids (CD‑HIT 40% redundancy reduction)</p> <p class="ds-markdown-paragraph"><strong>Key Findings – Compositional analysis (AFPs enriched in):</strong> Alanine (A), Isoleucine (I), Valine (V), Threonine (T) – Thr increases AFP activity by adding hydrogen bonds to surface area</p> <p class="ds-markdown-paragraph"><strong>Best Model Performance (validation set – 80 AFPs + 73 non‑AFPs, reviewed):</strong></p> <div class="ds-scroll-area ds-scroll-area--show-on-focus-within _1210dd7 c03cafe9 _5ac647c"> <div class="ds-scroll-area__gutters"> <div class="ds-scroll-area__vertical-gutter"> </div> </div> <table> <tbody><tr> <th>Model</th> <th>Features</th> <th>AUC</th> <th>MCC</th> <th>Accuracy</th> </tr> </tbody><tbody> <tr> <td><strong>ET</strong></td> <td>PSSM + AAC</td> <td><strong>0.93</strong></td> <td><strong>0.77</strong></td> <td><strong>88.2%</strong></td> </tr> <tr> <td>RF</td> <td>PSSM + AAC</td> <td>0.91</td> <td>0.64</td> <td>81.7%</td> </tr> <tr> <td>ET</td> <td>150 selected (mRMR)</td> <td>0.90</td> <td>0.69</td> <td>84.3%</td> </tr> <tr> <td>XGB</td> <td>AAC only</td> <td>0.89</td> <td>0.63</td> <td>81.7%</td> </tr> </tbody> </table> </div> <p class="ds-markdown-paragraph"><strong>Comparison with existing methods (same validation dataset – reviewed):</strong></p> <div class="ds-scroll-area ds-scroll-area--show-on-focus-within _1210dd7 c03cafe9 _5ac647c"> <div class="ds-scroll-area__gutters"> <div class="ds-scroll-area__vertical-gutter"> </div> </div> <table> <tbody><tr> <th>Method</th> <th>AUC</th> <th>MCC</th> <th>Accuracy</th> </tr> </tbody><tbody> <tr> <td><strong>AFProPred (ET + PSSM+AAC)</strong></td> <td><strong>0.93</strong></td> <td><strong>0.77</strong></td> <td><strong>88.2%</strong></td> </tr> <tr> <td>AFP‑CKSAAP (2019)</td> <td>0.89</td> <td>0.65</td> <td>82.0%</td> </tr> <tr> <td>AFP‑LSE (2020)</td> <td>—</td> <td>0.48</td> <td>74.0%</td> </tr> <tr> <td>CryoProtect (2017)</td> <td>0.61</td> <td>0.23</td> <td>60.1%</td> </tr> <tr> <td>AFP‑SRC (2022)</td> <td>—</td> <td>0.14</td> <td>57.0%</td> </tr> </tbody> </table> </div> <p class="ds-markdown-paragraph"><strong>Alignment‑based methods (BLAST, MERCI motifs) failed due to poor coverage – ML models essential.</strong></p> <p class="ds-markdown-paragraph"><strong>Data Curation & Quality Control:</strong></p> <ul> <li> <p class="ds-markdown-paragraph"><strong>Validation set:</strong> Swiss‑Prot reviewed entries only (manually curated)</p> </li> <li> <p class="ds-markdown-paragraph"><strong>Training set:</strong> Unreviewed UniProt + AFP‑Pred (CD‑HIT 40% identity)</p> </li> <li> <p class="ds-markdown-paragraph"><strong>Length filter:</strong> 16–2,439 amino acids</p> </li> <li> <p class="ds-markdown-paragraph"><strong>Evolutionary features:</strong> PSSM generated via PSI‑BLAST (Swiss‑Prot, 3 iterations)</p> </li> <li> <p class="ds-markdown-paragraph"><strong>Feature selection:</strong> mRMR (minimum redundancy maximum relevance)</p> </li> </ul> <p class="ds-markdown-paragraph"><strong>Usage:</strong> Predicting antifreeze proteins for food preservation, cryopreservation, and medical applications, scanning protein sequences for AFP regions, designing AFP mutants.</p> <p class="ds-markdown-paragraph"><strong>Related Resources:</strong> Web server: <a href="https://webs.iiitd.edu.in/raghava/afpropred/" rel="noopener noreferrer">https://webs.iiitd.edu.in/raghava/afpropred/</a> | GitHub: <a href="https://github.com/raghavagps/afpropred" rel="noopener noreferrer">https://github.com/raghavagps/afpropred</a></p> <p class="ds-markdown-paragraph"><strong>Contact:</strong> raghava@iiitd.ac.in (Gajendra P. S. Raghava)</p>
title AFProPred – Prediction of antifreeze proteins using machine learning and evolutionary information
url https://doi.org/10.5281/zenodo.19916375