Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Recurso digital |
| Language: | |
| Published: |
Zenodo
2025
|
| Online Access: | https://doi.org/10.5281/zenodo.17098066 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866901934263238656 |
|---|---|
| author | Fang, Yimiao Wang, Jie Hong, Mutian Zheng, Jie |
| author_facet | Fang, Yimiao Wang, Jie Hong, Mutian Zheng, Jie |
| contents | <div> <h1>MuSL: Multimodal deep learning for generalizable prediction of synthetic lethality from sequence, transcriptomic, and network data</h1> <br> <h2>Model Architecture</h2> <br> <div> <p dir="auto">The MuSL framework integrates multimodal biological data through a tri-branch deep learning architecture to predict synthetic lethal (SL) gene pairs. As illustrated in the figure above, the model jointly learns from transcriptomic images, statistical features, and protein interaction networks.</p> <h3>1. Feature Learning Pathways</h3> <p dir="auto">MuSL processes input data through three complementary pathways:</p> <ul> <li> <p dir="auto"><strong>Image-Based Expression Branch (CNN):</strong></p> <ul> <li>Transforms the transcriptome profiles of gene pairs into <strong>two-dimensional joint density distribution maps</strong> (<span><span>32×32</span><span><span><span>32</span><span>×</span></span><span><span>32</span></span></span></span> histograms).</li> <li>Utilizes a <strong>Convolutional Neural Network (CNN)</strong> to automatically extract spatial features, capturing complex patterns such as functional decoupling and mutual exclusivity directly from raw data.</li> </ul> </li> <li> <p dir="auto"><strong>Statistical Feature Branch (MLP):</strong></p> <ul> <li>Extracts <strong>35 handcrafted statistical features</strong> capturing explicit expression patterns (e.g., correlation, variation) from the same transcriptomic data.</li> <li>Projects these features through a dedicated fully connected layer (MLP) to form a representation that complements the deep image features.</li> </ul> </li> <li> <p dir="auto"><strong>Graph-Based Network Branch (GNN):</strong></p> <ul> <li>Models the Protein-Protein Interaction (PPI) network using a <strong>Graph Neural Network (GNN)</strong>.</li> <li><strong>Crucial Innovation:</strong> Initializes node features using <strong>ESM2 protein language model embeddings</strong>, integrating evolutionary and sequence-level priors to generate robust topological relationship features.</li> </ul> </li> </ul> <h3>2. Adaptive Fusion & Prediction</h3> <ul> <li><strong>Cross-Modal Integration:</strong> The framework employs a <strong>cross-attention and gating fusion mechanism</strong> to dynamically weight and integrate features from all three pathways (CNN, Statistics, and GNN) based on their contextual importance.</li> <li><strong>Multi-Head Prediction:</strong> Predictions are generated from both modality-specific heads and the final fused representation using four specialized classifiers.</li> </ul> <h3>3. Training Strategy</h3> <ul> <li><strong>Composite Loss Function:</strong> The model optimizes a joint objective that combines: <ul> <li>Classification losses from all four prediction heads.</li> <li><strong>Contrastive learning components</strong> to enforce semantic alignment and improve cross-modal consistency.</li> </ul> </li> </ul> </div> <div> </div> </div> <div> <h3>Required Data Files</h3> <p dir="auto">The following H5AD files need to be downloaded and placed in the <code>processed_data/</code> directory:</p> <ul> <li><code>a549_cell_line_imputed.h5ad</code> - A549 cell line single-cell RNA-seq data</li> <li><code>k562_cell_line_imputed.h5ad</code> - K562 cell line single-cell RNA-seq data</li> <li><code>tcga_all.h5ad</code> - TCGA bulk RNA-seq data</li> <li><code>protein_embeddings.pt</code> - Embeddings from ESM2</li> <li><code>all_emb_genept.pkl</code> - Embeddings from GenePT</li> </ul> </div> |
| format | Recurso digital |
| id | zenodo_https___doi_org_10_5281_zenodo_17098066 |
| institution | Zenodo |
| language | |
| publishDate | 2025 |
| publisher | Zenodo |
| record_format | zenodo |
| spellingShingle | MuSL: Multimodal deep learning for generalizable prediction of synthetic lethality from sequence, transcriptomic, and network data Fang, Yimiao Wang, Jie Hong, Mutian Zheng, Jie <div> <h1>MuSL: Multimodal deep learning for generalizable prediction of synthetic lethality from sequence, transcriptomic, and network data</h1> <br> <h2>Model Architecture</h2> <br> <div> <p dir="auto">The MuSL framework integrates multimodal biological data through a tri-branch deep learning architecture to predict synthetic lethal (SL) gene pairs. As illustrated in the figure above, the model jointly learns from transcriptomic images, statistical features, and protein interaction networks.</p> <h3>1. Feature Learning Pathways</h3> <p dir="auto">MuSL processes input data through three complementary pathways:</p> <ul> <li> <p dir="auto"><strong>Image-Based Expression Branch (CNN):</strong></p> <ul> <li>Transforms the transcriptome profiles of gene pairs into <strong>two-dimensional joint density distribution maps</strong> (<span><span>32×32</span><span><span><span>32</span><span>×</span></span><span><span>32</span></span></span></span> histograms).</li> <li>Utilizes a <strong>Convolutional Neural Network (CNN)</strong> to automatically extract spatial features, capturing complex patterns such as functional decoupling and mutual exclusivity directly from raw data.</li> </ul> </li> <li> <p dir="auto"><strong>Statistical Feature Branch (MLP):</strong></p> <ul> <li>Extracts <strong>35 handcrafted statistical features</strong> capturing explicit expression patterns (e.g., correlation, variation) from the same transcriptomic data.</li> <li>Projects these features through a dedicated fully connected layer (MLP) to form a representation that complements the deep image features.</li> </ul> </li> <li> <p dir="auto"><strong>Graph-Based Network Branch (GNN):</strong></p> <ul> <li>Models the Protein-Protein Interaction (PPI) network using a <strong>Graph Neural Network (GNN)</strong>.</li> <li><strong>Crucial Innovation:</strong> Initializes node features using <strong>ESM2 protein language model embeddings</strong>, integrating evolutionary and sequence-level priors to generate robust topological relationship features.</li> </ul> </li> </ul> <h3>2. Adaptive Fusion & Prediction</h3> <ul> <li><strong>Cross-Modal Integration:</strong> The framework employs a <strong>cross-attention and gating fusion mechanism</strong> to dynamically weight and integrate features from all three pathways (CNN, Statistics, and GNN) based on their contextual importance.</li> <li><strong>Multi-Head Prediction:</strong> Predictions are generated from both modality-specific heads and the final fused representation using four specialized classifiers.</li> </ul> <h3>3. Training Strategy</h3> <ul> <li><strong>Composite Loss Function:</strong> The model optimizes a joint objective that combines: <ul> <li>Classification losses from all four prediction heads.</li> <li><strong>Contrastive learning components</strong> to enforce semantic alignment and improve cross-modal consistency.</li> </ul> </li> </ul> </div> <div> </div> </div> <div> <h3>Required Data Files</h3> <p dir="auto">The following H5AD files need to be downloaded and placed in the <code>processed_data/</code> directory:</p> <ul> <li><code>a549_cell_line_imputed.h5ad</code> - A549 cell line single-cell RNA-seq data</li> <li><code>k562_cell_line_imputed.h5ad</code> - K562 cell line single-cell RNA-seq data</li> <li><code>tcga_all.h5ad</code> - TCGA bulk RNA-seq data</li> <li><code>protein_embeddings.pt</code> - Embeddings from ESM2</li> <li><code>all_emb_genept.pkl</code> - Embeddings from GenePT</li> </ul> </div> |
| title | MuSL: Multimodal deep learning for generalizable prediction of synthetic lethality from sequence, transcriptomic, and network data |
| url | https://doi.org/10.5281/zenodo.17098066 |