Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Tosaki, Taisei, Oshima, Sho, Okamoto, Yuji
Format: Recurso digital
Sprache:
Veröffentlicht: Zenodo 2025
Online-Zugang:https://doi.org/10.5281/zenodo.15496012
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Inhaltsangabe:
  • <h3>This data is based on a simple Bayesian network from publicly available data on gene expression levels TCGA and LINCS.<br>For more information, please refer to the following.</h3> <p>Producer name withheld due to blind review in progress.</p> <h3><br>## Data</h3> <p>### 1. GRN Dataset & Preprocessing  <br>All graph data objects live under `data/GRN_dataset/` and are generated by the preprocessing script:</p> <p>```plaintext<br><br>└── data/<br>    └── GRN_dataset/<br>        ├── Breast/<br>        │   ├── Edge_feature/<br>        │   │   ├── Breast_tcga_ecv.csv      # TCGA patient-specific edge contribution values (ECv)<br>        │   │   └── Breast_lincs_kd_ecv.csv  # LINCS knockdown edge contribution values<br>        │   ├── Node_feature/<br>        │   │   ├── Breast_TCGA_exp.csv       # TCGA patient-specific gene expression levels<br>        │   │   └── Breast_LINCS_KD_exp.csv   # LINCS knockdown gene expression levels<br>        │   └── make_GRN_dataset/<br>        │       └── mk_GRN.py                 # builds per-sample PyG graphs and pickles them<br>        ├── Colorectal/…                       # same structure for colorectal cancer<br>        └── Lung/…                             # same structure for lung cancer<br>```</p> <p>- Edge_feature: 1D scalar “edge contribution values” (ECv) per sample<br>- Node_feature: 1D scalar gene expression per sample<br>- mk_GRN.py: combines node + edge features into torch_geometric.data.Data objects and serializes them.</p> <p>### 2. Label Data for Finetuning<br>All task labels live under `data/labels/`:</p> <p>```plaintext<br><br>└── data/<br>    └── labels/<br>        ├── BP_data/<br>        │   └── gene_with_BP_multilabels.csv    # GO-BP multilabels (shared)<br>        ├── CC_data/<br>        │   └── gene_with_CC_multilabels.csv    # GO-CC multilabels (shared)<br>        ├── Cancer_rel_data/<br>        │   └── gene_with_cancer_relation.csv   # Cancer-relation labels (shared)<br>        ├── Subtype_data/<br>        │   └── Breast/subtype.csv              # Breast cancer subtype per patient<br>        └── Survival_data/<br>            ├── Breast/tcga_survival_time.csv   # OS time & event for hazard prediction<br>            ├── Colorectal/…<br>            └── Lung/…<br>```</p> <p>### 3. Metadata<br>Helper files for mapping and filtering samples, under `data/meta_data/`:</p> <p>```plaintext<br>SupGCL/<br>└── data/<br>    └── meta_data/<br>        └── Breast/<br>            ├── Breast_LINCS_KD_graphs_metadata.pkl   # Order of LINCS KD graphs<br>            ├── LINCS_sampleID_KDgene_metadata.pkl    # Map LINCS sample → knocked-down gene<br>            └── Breast_tcga_graphs_metadata.pkl       # Order of TCGA patient graphs<br>```<br>- LINCS metadata: used by SupGCL pretraining to match teacher (knockdown) graphs<br>- TCGA metadata: used in finetuning to filter only patients with survival/subtype annotations (via --meta)</p> <p><br>## About TCGA Datasets<br>The Cancer Genome Atlas (TCGA) Research Network .<br>TCGA TARGET GTEx data were accessed through the UCSC Xena Browser:  <br>Derived data available at: https://xenabrowser.net/datapages/?cohort=TCGA%2520TARGET%2520GTEx&removeHub=https%253A%252F%252Fxena.treehouse.gi.ucsc.edu%253A443</p> <p>## About LINCS Datasets<br>Subramanian A, et al. "A Next Generation Connectivity Map: L1000 Platform<br>and the First 1,000,000 Profiles." Cell 2017.<br>LINCS Program, NIH. L1000 data available at GEO: GSE.92742<br><br>## About Algorithm of Bayesian Network<br><br>Yoshinori Tamada, Teppei Shimamura, Rui Yamaguchi, Seiya Imoto, Masao Nagasaki, and Satoru Miyano.<br>Sign: Large-Scale Gene Network Estimation Environment for High Performance Computing. Genome<br>Informatics, 25(1):40–52, 2011.<br><br>Seiya Imoto, Takao Goto, and Satoru Miyano. Estimation of genetic networks and functional struc-<br>tures between genes by using Bayesian networks and nonparametric regression. Pacific Symposium on<br>Biocomputing. Pacific Symposium on Biocomputing, pages 175–186, 2002.</p> <p> </p> <p>## Acknowledgments<br>This work was supported by JST Moonshot R\&D (JPMJMS2021, JPMJMS2024), JST Research and Development Program for Next-generation Edge AI Semiconductors (JPMJES2511), JSPS KAKENHI (JP25K00148, JP25H02626, JP26K14994), and a project (JPNP14004) commissioned by the New Energy and Industrial Technology Development Organization (NEDO).<br>This work used computational resources of the supercomputer Fugaku provided by RIKEN through the HPCI System Research Project (Project IDs: hp150272, ra000018).<br>Taisei Tosaki received financial support from RIKEN Jr. Research-associated Programs.<br><br></p>