Gardado en:
| Autor Principal: | |
|---|---|
| Formato: | Recurso digital |
| Idioma: | |
| Publicado: |
Zenodo
2017
|
| Subjects: | |
| Acceso en liña: | https://doi.org/10.5281/zenodo.16412818 |
| Tags: |
Engadir etiqueta
Sen Etiquetas, Sexa o primeiro en etiquetar este rexistro!
|
Table of Contents:
- <p>PCAWG GATK Co-cleaning workflow is developed by the Broad Institute (https://www.broadinstitute.org), it consists of two pre-processing steps for tumor/normal BAM files: indel realignment and base quality score recalibration (BQSR). The workflow has been dockerized and packaged using CWL workflow language, the source code is available on GitHub at: https://github.com/ICGC-TCGA-PanCancer/pcawg-gatk-cocleaning.</p> <h2>Run the workflow with your own data</h2> <h3>Prepare compute environment and install software packages</h3> <p>The workflow has been tested in Ubuntu 16.04 Linux environment with the following hardware and software settings.</p> <h4>Hardware requirement (assuming 30X coverage whole genome sequence)</h4> <ul> <li>CPU core: 16</li> <li>Memory: 64GB</li> <li>Disk space: 1TB</li> </ul> <h4>Software installation</h4> <ul> <li>Docker (1.12.6): follow instructions to install Docker https://docs.docker.com/engine/installation</li> <li>CWL tool</li> </ul> <pre><code>pip install cwltool==1.0.20170217172322 </code></pre> <h3>Prepare input data</h3> <h4>Input aligned tumor / normal BAM files</h4> <p>The workflow uses a pair of aligned BAM files as input, one BAM for tumor, the other for normal, both from the same donor. Here we assume file names are <em>tumor_sample.bam</em> and <em>normal_sample.bam</em>, and are under <em>bams</em> subfolder.</p> <h4>Reference data files</h4> <p>The workflow also uses the following files as reference, they can be downloaded from the ICGC Data Portal:</p> <ul> <li>Under https://dcc.icgc.org/releases/PCAWG/reference_data/pcawg-bwa-mem <ul> <li>genome.fa.gz</li> <li>genome.dict</li> </ul> </li> <li>Under https://dcc.icgc.org/releases/PCAWG/reference_data/pcawg-gatk-cocleaning <ul> <li>1000G_phase1.indels.hg19.sites.fixed.vcf.gz</li> <li>Mills_and_1000G_gold_standard.indels.hg19.sites.fixed.vcf.gz</li> <li>dbsnp_132_b37.leftAligned.vcf.gz</li> </ul> </li> </ul> <p>We assume the reference files are under <em>reference</em> subfolder.</p> <h4>Job JSON file for CWL</h4> <p>Finally, we need to prepare a JSON file with input, reference files specified. Please replace the <em>tumor_bam</em> and <em>normal_bam</em> parameters with your real BAM files.</p> <p>Name the JSON file: <em>pcawg-gatk-cocleaning.job.json</em></p> <pre><code>{ "tumor_bam": { "class": "File", "location": "bams/tumor_sample.bam" }, "normal_bam": { "class": "File", "location": "bams/normal_sample.bam" }, "reference": { "class": "File", "location": "reference/genome.fa" }, "knownIndels": [ { "class": "File", "location": "reference/1000G_phase1.indels.hg19.sites.fixed.vcf.gz" }, { "class": "File", "location": "reference/Mills_and_1000G_gold_standard.indels.hg19.sites.fixed.vcf.gz" } ], "knownSites": [ { "class": "File", "location": "reference/dbsnp_132_b37.leftAligned.vcf.gz" } ] } </code></pre> <h3>Run the workflow</h3> <h4>Option 1: Run with CWL tool</h4> <ul> <li>Download CWL workflow definition files</li> </ul> <pre><code>wget https://github.com/ICGC-TCGA-PanCancer/pcawg-gatk-cocleaning/archive/0.1.1.tar.gz tar xvf pcawg-gatk-cocleaning-0.1.1.tar.gz </code></pre> <ul> <li>Run <code>cwltool</code> to execute the workflow</li> </ul> <pre><code>nohup cwltool --debug --non-strict pcawg-gatk-cocleaning-0.1.1/gatk-cocleaning-workflow.cwl pcawg-gatk-cocleaning.job.json > pcawg-gatk-cocleaning.log 2>&1 & </code></pre> <h4>Option 2: Run with the Dockstore CLI</h4> <p>See the <em>Launch with</em> section below for details.</p>