Saved in:
| Main Authors: | , |
|---|---|
| Format: | Recurso digital |
| Language: | |
| Published: |
Zenodo
2026
|
| Online Access: | https://doi.org/10.5281/zenodo.19941115 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Table of Contents:
- <p class="ds-markdown-paragraph"><strong>Title:</strong><br>MRSLpred Dataset – Multi‑label subcellular localization of human mRNA sequences (6 compartments)</p> <p class="ds-markdown-paragraph"><strong>Description:</strong></p> <p class="ds-markdown-paragraph"><strong>Project:</strong> MRSLpred – A hybrid approach for predicting multi‑label subcellular localization of mRNA at the genome scale</p> <p class="ds-markdown-paragraph"><strong>Publication:</strong> Choudhury, S., Bajiya, N., Patiyal, S., & Raghava, G.P.S. (2024). MRSLpred – a hybrid approach for predicting multi‑label subcellular localization of mRNA at the genome scale. <em>Frontiers in Bioinformatics</em>, 4, 1341479. <a href="https://doi.org/10.3389/fbinf.2024.1341479" rel="noopener noreferrer">https://doi.org/10.3389/fbinf.2024.1341479</a></p> <p class="ds-markdown-paragraph"><strong>Overview:</strong> This dataset accompanies MRSLpred, a multi‑label classifier for predicting subcellular localization of human mRNA. Unlike traditional single‑label methods, MRSLpred assigns multiple locations to each mRNA (e.g., nucleus AND exosome), reflecting real‑world biology. mRNA localization controls protein synthesis spatially and is critical for neuronal maturation, embryonic patterning, cell migration, and stress adaptation. The dataset is derived from RNALocate v2 and pre‑processed by DM3Loc, with redundancy reduced at 80% similarity.</p> <p class="ds-markdown-paragraph"><strong>Content:</strong> 17,277 non‑exclusive human mRNA sequences distributed across 6 subcellular compartments:</p> <div class="ds-scroll-area ds-scroll-area--show-on-focus-within _1210dd7 c03cafe9 _5ac647c"> <div class="ds-scroll-area__gutters"> <div class="ds-scroll-area__vertical-gutter"> </div> </div> <table> <tbody> <tr> <th>Compartment</th> <th>Number of mRNAs</th> </tr> </tbody> <tbody> <tr> <td>Exosome</td> <td>17,156</td> </tr> <tr> <td>Nucleus</td> <td>11,923</td> </tr> <tr> <td>Ribosome</td> <td>5,210</td> </tr> <tr> <td>Membrane</td> <td>3,232</td> </tr> <tr> <td>Cytosol</td> <td>2,338</td> </tr> <tr> <td>Endoplasmic reticulum (ER)</td> <td>1,976</td> </tr> </tbody> </table> </div> <p class="ds-markdown-paragraph"><strong>Train/validation split:</strong> 80/20 (stratified by location labels)</p> <p class="ds-markdown-paragraph"><strong>Best Model Performance (XGBoost + motif module – validation set):</strong></p> <div class="ds-scroll-area ds-scroll-area--show-on-focus-within _1210dd7 c03cafe9 _5ac647c"> <div class="ds-scroll-area__gutters"> <div class="ds-scroll-area__vertical-gutter"> </div> </div> <table> <tbody> <tr> <th>Location</th> <th>Sensitivity</th> <th>Specificity</th> <th>AUC</th> <th>MCC</th> </tr> </tbody> <tbody> <tr> <td>Exosome</td> <td>0.696</td> <td>0.692</td> <td><strong>0.816</strong></td> <td>0.073</td> </tr> <tr> <td>Membrane</td> <td>0.675</td> <td>0.676</td> <td>0.736</td> <td>0.280</td> </tr> <tr> <td>Nucleus</td> <td>0.671</td> <td>0.671</td> <td>0.736</td> <td>0.319</td> </tr> <tr> <td>Ribosome</td> <td>0.664</td> <td>0.664</td> <td>0.728</td> <td>0.304</td> </tr> <tr> <td>Cytosol</td> <td>0.650</td> <td>0.650</td> <td>0.708</td> <td>0.211</td> </tr> <tr> <td>ER</td> <td>0.657</td> <td>0.656</td> <td>0.727</td> <td>0.205</td> </tr> <tr> <td><strong>Average</strong></td> <td><strong>0.669</strong></td> <td><strong>0.668</strong></td> <td><strong>0.742</strong></td> <td><strong>0.232</strong></td> </tr> </tbody> </table> </div> <p class="ds-markdown-paragraph"><strong>Features (CDK3 + RDK4 combined) – XGBoost alone (validation):</strong> Average AUC = 0.710, Average MCC = 0.216</p> <p class="ds-markdown-paragraph"><strong>Benchmarking comparison (validation set):</strong></p> <div class="ds-scroll-area ds-scroll-area--show-on-focus-within _1210dd7 c03cafe9 _5ac647c"> <div class="ds-scroll-area__gutters"> <div class="ds-scroll-area__vertical-gutter"> </div> </div> <table> <tbody> <tr> <th>Method</th> <th>Average AUC</th> <th>Multi‑label?</th> <th>Notes</th> </tr> </tbody> <tbody> <tr> <td><strong>MRSLpred</strong></td> <td><strong>0.742</strong></td> <td>✅</td> <td>Fast, genome‑scale</td> </tr> <tr> <td>DM3Loc</td> <td>0.704</td> <td>✅</td> <td>Deep learning, slow, high resource</td> </tr> <tr> <td>iLoc‑mRNA</td> <td>~0.50</td> <td>❌</td> <td>Single‑label only</td> </tr> <tr> <td>mRNALoc</td> <td>~0.46</td> <td>❌</td> <td>Single‑label only</td> </tr> </tbody> </table> </div> <p class="ds-markdown-paragraph"><strong>Computational efficiency (500 mRNA sequences):</strong></p> <div class="ds-scroll-area ds-scroll-area--show-on-focus-within _1210dd7 c03cafe9 _5ac647c"> <div class="ds-scroll-area__gutters"> <div class="ds-scroll-area__vertical-gutter"> </div> </div> <table> <tbody> <tr> <th>Method</th> <th>Real time</th> <th>User time</th> </tr> </tbody> <tbody> <tr> <td><strong>MRSLpred</strong></td> <td><strong>~23 sec</strong></td> <td><strong>~5 sec</strong></td> </tr> <tr> <td>DM3Loc</td> <td>~32 min</td> <td>~28 min</td> </tr> </tbody> </table> </div> <p class="ds-markdown-paragraph"><strong>MRSLpred is >80× faster than DM3Loc with comparable performance.</strong></p> <p class="ds-markdown-paragraph"><strong>Motif coverage (training set):</strong> Exosome (1,655 sequences), Nucleus (95), ER (65), Ribosome (32), Membrane (29), Cytosol (25)</p> <p class="ds-markdown-paragraph"><strong>Data Curation & Quality Control:</strong></p> <ul> <li> <p class="ds-markdown-paragraph"><strong>Source:</strong> RNALocate v2 (via DM3Loc pre‑processed dataset)</p> </li> <li> <p class="ds-markdown-paragraph"><strong>Organism:</strong> Homo sapiens</p> </li> <li> <p class="ds-markdown-paragraph"><strong>Redundancy reduction:</strong> CD‑HIT at 80% sequence identity</p> </li> <li> <p class="ds-markdown-paragraph"><strong>Labels:</strong> Multi‑hot encoding (6 locations)</p> </li> <li> <p class="ds-markdown-paragraph"><strong>Features:</strong> Nfeature (CDK3 + RDK4) → 200 features</p> </li> <li> <p class="ds-markdown-paragraph"><strong>Motifs:</strong> MERCI (discriminative location‑specific motifs)</p> </li> <li> <p class="ds-markdown-paragraph"><strong>Train/validation split:</strong> 80/20 stratified</p> </li> </ul> <p class="ds-markdown-paragraph"><strong>Usage:</strong> Multi‑label mRNA subcellular localization prediction (genome scale), identifying location‑specific sequence motifs, understanding mRNA spatial regulation, developmental biology research.</p> <p class="ds-markdown-paragraph"><strong>Related Resources:</strong> Web server: <a href="https://webs.iiitd.edu.in/raghava/mrslpred/" rel="noopener noreferrer">https://webs.iiitd.edu.in/raghava/mrslpred/</a> | GitHub: <a href="https://github.com/raghavagps/mrslpred" rel="noopener noreferrer">https://github.com/raghavagps/mrslpred</a></p> <p class="ds-markdown-paragraph"><strong>Contact:</strong> raghava@iiitd.ac.in (Gajendra P. S. Raghava)</p>