Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Recurso digital |
| Language: | |
| Published: |
Zenodo
2026
|
| Online Access: | https://doi.org/10.5281/zenodo.19916375 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Table of Contents:
- <p class="ds-markdown-paragraph"><strong>Title:</strong><br>AFProPred Dataset – Experimentally validated antifreeze proteins (AFPs) and non‑AFPs from reviewed UniProt entries</p> <p class="ds-markdown-paragraph"><strong>Description:</strong></p> <p class="ds-markdown-paragraph"><strong>Project:</strong> AFProPred – Prediction of antifreeze proteins using machine learning and evolutionary information</p> <p class="ds-markdown-paragraph"><strong>Publication:</strong> Kumar, N., Patiyal, S., Choudhury, S., Bajiya, N., & Raghava, G.P.S. (2025). AFProPred: Prediction of antifreeze proteins using machine learning and evolutionary information. <em>Proteomics</em>, e202400157. <a href="https://doi.org/10.1002/pmic.202400157" rel="noopener noreferrer">https://doi.org/10.1002/pmic.202400157</a></p> <p class="ds-markdown-paragraph"><strong>Overview:</strong> This dataset accompanies AFProPred, a machine learning method for predicting antifreeze proteins (AFPs). AFPs enable organisms (fish, insects, fungi, bacteria) to survive in sub‑zero temperatures via thermal hysteresis and ice recrystallisation inhibition, with applications in food preservation, medicine, and cryosurgery. Unlike existing methods evaluated on unreviewed data, this study uses a validation dataset of <strong>reviewed (Swiss‑Prot)</strong> AFPs and non‑AFPs.</p> <p class="ds-markdown-paragraph"><strong>Content:</strong></p> <div class="ds-scroll-area ds-scroll-area--show-on-focus-within _1210dd7 c03cafe9 _5ac647c"> <div class="ds-scroll-area__gutters"> <div class="ds-scroll-area__vertical-gutter"> </div> </div> <table> <tbody><tr> <th>Dataset</th> <th>AFPs</th> <th>Non‑AFPs</th> <th>Source</th> </tr> </tbody><tbody> <tr> <td><strong>Main (training)</strong></td> <td>8,134</td> <td>9,439</td> <td>UniProt (unreviewed) + AFP‑Pred</td> </tr> <tr> <td><strong>Validation (independent)</strong></td> <td>80</td> <td>73</td> <td><strong>Swiss‑Prot (reviewed)</strong> – keyword: "antifreeze protein" vs. "NOT_antifreeze_protein"</td> </tr> </tbody> </table> </div> <p class="ds-markdown-paragraph"><strong>Validation set length range:</strong> 16–2,439 amino acids (CD‑HIT 40% redundancy reduction)</p> <p class="ds-markdown-paragraph"><strong>Key Findings – Compositional analysis (AFPs enriched in):</strong> Alanine (A), Isoleucine (I), Valine (V), Threonine (T) – Thr increases AFP activity by adding hydrogen bonds to surface area</p> <p class="ds-markdown-paragraph"><strong>Best Model Performance (validation set – 80 AFPs + 73 non‑AFPs, reviewed):</strong></p> <div class="ds-scroll-area ds-scroll-area--show-on-focus-within _1210dd7 c03cafe9 _5ac647c"> <div class="ds-scroll-area__gutters"> <div class="ds-scroll-area__vertical-gutter"> </div> </div> <table> <tbody><tr> <th>Model</th> <th>Features</th> <th>AUC</th> <th>MCC</th> <th>Accuracy</th> </tr> </tbody><tbody> <tr> <td><strong>ET</strong></td> <td>PSSM + AAC</td> <td><strong>0.93</strong></td> <td><strong>0.77</strong></td> <td><strong>88.2%</strong></td> </tr> <tr> <td>RF</td> <td>PSSM + AAC</td> <td>0.91</td> <td>0.64</td> <td>81.7%</td> </tr> <tr> <td>ET</td> <td>150 selected (mRMR)</td> <td>0.90</td> <td>0.69</td> <td>84.3%</td> </tr> <tr> <td>XGB</td> <td>AAC only</td> <td>0.89</td> <td>0.63</td> <td>81.7%</td> </tr> </tbody> </table> </div> <p class="ds-markdown-paragraph"><strong>Comparison with existing methods (same validation dataset – reviewed):</strong></p> <div class="ds-scroll-area ds-scroll-area--show-on-focus-within _1210dd7 c03cafe9 _5ac647c"> <div class="ds-scroll-area__gutters"> <div class="ds-scroll-area__vertical-gutter"> </div> </div> <table> <tbody><tr> <th>Method</th> <th>AUC</th> <th>MCC</th> <th>Accuracy</th> </tr> </tbody><tbody> <tr> <td><strong>AFProPred (ET + PSSM+AAC)</strong></td> <td><strong>0.93</strong></td> <td><strong>0.77</strong></td> <td><strong>88.2%</strong></td> </tr> <tr> <td>AFP‑CKSAAP (2019)</td> <td>0.89</td> <td>0.65</td> <td>82.0%</td> </tr> <tr> <td>AFP‑LSE (2020)</td> <td>—</td> <td>0.48</td> <td>74.0%</td> </tr> <tr> <td>CryoProtect (2017)</td> <td>0.61</td> <td>0.23</td> <td>60.1%</td> </tr> <tr> <td>AFP‑SRC (2022)</td> <td>—</td> <td>0.14</td> <td>57.0%</td> </tr> </tbody> </table> </div> <p class="ds-markdown-paragraph"><strong>Alignment‑based methods (BLAST, MERCI motifs) failed due to poor coverage – ML models essential.</strong></p> <p class="ds-markdown-paragraph"><strong>Data Curation & Quality Control:</strong></p> <ul> <li> <p class="ds-markdown-paragraph"><strong>Validation set:</strong> Swiss‑Prot reviewed entries only (manually curated)</p> </li> <li> <p class="ds-markdown-paragraph"><strong>Training set:</strong> Unreviewed UniProt + AFP‑Pred (CD‑HIT 40% identity)</p> </li> <li> <p class="ds-markdown-paragraph"><strong>Length filter:</strong> 16–2,439 amino acids</p> </li> <li> <p class="ds-markdown-paragraph"><strong>Evolutionary features:</strong> PSSM generated via PSI‑BLAST (Swiss‑Prot, 3 iterations)</p> </li> <li> <p class="ds-markdown-paragraph"><strong>Feature selection:</strong> mRMR (minimum redundancy maximum relevance)</p> </li> </ul> <p class="ds-markdown-paragraph"><strong>Usage:</strong> Predicting antifreeze proteins for food preservation, cryopreservation, and medical applications, scanning protein sequences for AFP regions, designing AFP mutants.</p> <p class="ds-markdown-paragraph"><strong>Related Resources:</strong> Web server: <a href="https://webs.iiitd.edu.in/raghava/afpropred/" rel="noopener noreferrer">https://webs.iiitd.edu.in/raghava/afpropred/</a> | GitHub: <a href="https://github.com/raghavagps/afpropred" rel="noopener noreferrer">https://github.com/raghavagps/afpropred</a></p> <p class="ds-markdown-paragraph"><strong>Contact:</strong> raghava@iiitd.ac.in (Gajendra P. S. Raghava)</p>