Gespeichert in:
| Hauptverfasser: | , , |
|---|---|
| Format: | Recurso digital |
| Sprache: | |
| Veröffentlicht: |
Zenodo
2025
|
| Online-Zugang: | https://doi.org/10.5281/zenodo.15496012 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Inhaltsangabe:
- <h3>This data is based on a simple Bayesian network from publicly available data on gene expression levels TCGA and LINCS.<br>For more information, please refer to the following.</h3> <p>Producer name withheld due to blind review in progress.</p> <h3><br>## Data</h3> <p>### 1. GRN Dataset & Preprocessing <br>All graph data objects live under `data/GRN_dataset/` and are generated by the preprocessing script:</p> <p>```plaintext<br><br>└── data/<br> └── GRN_dataset/<br> ├── Breast/<br> │ ├── Edge_feature/<br> │ │ ├── Breast_tcga_ecv.csv # TCGA patient-specific edge contribution values (ECv)<br> │ │ └── Breast_lincs_kd_ecv.csv # LINCS knockdown edge contribution values<br> │ ├── Node_feature/<br> │ │ ├── Breast_TCGA_exp.csv # TCGA patient-specific gene expression levels<br> │ │ └── Breast_LINCS_KD_exp.csv # LINCS knockdown gene expression levels<br> │ └── make_GRN_dataset/<br> │ └── mk_GRN.py # builds per-sample PyG graphs and pickles them<br> ├── Colorectal/… # same structure for colorectal cancer<br> └── Lung/… # same structure for lung cancer<br>```</p> <p>- Edge_feature: 1D scalar “edge contribution values” (ECv) per sample<br>- Node_feature: 1D scalar gene expression per sample<br>- mk_GRN.py: combines node + edge features into torch_geometric.data.Data objects and serializes them.</p> <p>### 2. Label Data for Finetuning<br>All task labels live under `data/labels/`:</p> <p>```plaintext<br><br>└── data/<br> └── labels/<br> ├── BP_data/<br> │ └── gene_with_BP_multilabels.csv # GO-BP multilabels (shared)<br> ├── CC_data/<br> │ └── gene_with_CC_multilabels.csv # GO-CC multilabels (shared)<br> ├── Cancer_rel_data/<br> │ └── gene_with_cancer_relation.csv # Cancer-relation labels (shared)<br> ├── Subtype_data/<br> │ └── Breast/subtype.csv # Breast cancer subtype per patient<br> └── Survival_data/<br> ├── Breast/tcga_survival_time.csv # OS time & event for hazard prediction<br> ├── Colorectal/…<br> └── Lung/…<br>```</p> <p>### 3. Metadata<br>Helper files for mapping and filtering samples, under `data/meta_data/`:</p> <p>```plaintext<br>SupGCL/<br>└── data/<br> └── meta_data/<br> └── Breast/<br> ├── Breast_LINCS_KD_graphs_metadata.pkl # Order of LINCS KD graphs<br> ├── LINCS_sampleID_KDgene_metadata.pkl # Map LINCS sample → knocked-down gene<br> └── Breast_tcga_graphs_metadata.pkl # Order of TCGA patient graphs<br>```<br>- LINCS metadata: used by SupGCL pretraining to match teacher (knockdown) graphs<br>- TCGA metadata: used in finetuning to filter only patients with survival/subtype annotations (via --meta)</p> <p><br>## About TCGA Datasets<br>The Cancer Genome Atlas (TCGA) Research Network .<br>TCGA TARGET GTEx data were accessed through the UCSC Xena Browser: <br>Derived data available at: https://xenabrowser.net/datapages/?cohort=TCGA%2520TARGET%2520GTEx&removeHub=https%253A%252F%252Fxena.treehouse.gi.ucsc.edu%253A443</p> <p>## About LINCS Datasets<br>Subramanian A, et al. "A Next Generation Connectivity Map: L1000 Platform<br>and the First 1,000,000 Profiles." Cell 2017.<br>LINCS Program, NIH. L1000 data available at GEO: GSE.92742<br><br>## About Algorithm of Bayesian Network<br><br>Yoshinori Tamada, Teppei Shimamura, Rui Yamaguchi, Seiya Imoto, Masao Nagasaki, and Satoru Miyano.<br>Sign: Large-Scale Gene Network Estimation Environment for High Performance Computing. Genome<br>Informatics, 25(1):40–52, 2011.<br><br>Seiya Imoto, Takao Goto, and Satoru Miyano. Estimation of genetic networks and functional struc-<br>tures between genes by using Bayesian networks and nonparametric regression. Pacific Symposium on<br>Biocomputing. Pacific Symposium on Biocomputing, pages 175–186, 2002.</p> <p> </p> <p>## Acknowledgments<br>This work was supported by JST Moonshot R\&D (JPMJMS2021, JPMJMS2024), JST Research and Development Program for Next-generation Edge AI Semiconductors (JPMJES2511), JSPS KAKENHI (JP25K00148, JP25H02626, JP26K14994), and a project (JPNP14004) commissioned by the New Energy and Industrial Technology Development Organization (NEDO).<br>This work used computational resources of the supercomputer Fugaku provided by RIKEN through the HPCI System Research Project (Project IDs: hp150272, ra000018).<br>Taisei Tosaki received financial support from RIKEN Jr. Research-associated Programs.<br><br></p>