Saved in:
Bibliografiske detaljer
Hovedforfatter: sevenbridges-openworkflows
Format: Recurso digital
Sprog:
Udgivet: Zenodo 2019
Online adgang:https://doi.org/10.5281/zenodo.15650660
Tags: Tilføj Tag
Ingen Tags, Vær først til at tagge denne postø!
Indholdsfortegnelse:
  • <p>This workflow represents the GATK Best Practices for SNP and INDEL calling on RNA-Seq data.</p> <p>Starting from an unmapped BAM file, it performs alignment to the reference genome, followed by marking duplicates, reassigning mapping qualities, base recalibration, variant calling and variant filtering. On the <a href="https://software.broadinstitute.org/gatk/documentation/article.php?id=3891">GATK website</a>, you can find more detailed information about calling variants in RNA-Seq.</p> <h3>Common Use Cases</h3> <ul> <li>If you have raw sequencing reads in FASTQ format, you should convert them to an unmapped BAM file using the <strong>Picard FastqToSam</strong> app before running the workflow.</li> <li><strong>BaseRecalibrator</strong> uses <strong>Known indels</strong> and <strong>Known SNPs</strong> databases to mask out polymorphic sites when creating a model for adjusting quality scores. Also, the <strong>HaplotypeCaller</strong> uses the <strong>Known SNPs</strong> database to populate the ID column of the VCF output.</li> <li>The <strong>HaplotypeCaller</strong> app uses <strong>Intervals list</strong> to restrict processing to specific genomic intervals. You can set the <strong>Scatter count</strong> value in order to split <strong>Intervals list</strong> into smaller intervals. <strong>HaplotypeCaller</strong> processes these intervals in parallel, which will significantly reduce workflow execution time in some cases.</li> <li>You can provide a pre-generated <strong>STAR</strong> reference index file or a genome reference file to the <strong>Reference or STAR index</strong> input.</li> <li><strong>Running a batch task</strong>: Batching is performed by <strong>Sample ID</strong> metadata field on the <strong>Unmapped BAM</strong> input port. For running analyses in batches, it is necessary to set <strong>Sample ID</strong> metadata for each unmapped BAM file.</li> </ul> <h3>Changes Introduced by Seven Bridges</h3> <p>This workflow represents the GATK Best Practices for SNP and indel calling on RNA-Seq data, and there are no modifications to the original workflow.</p> <h3>Common Issues and Important Notes</h3> <ul> <li>As the <em>(--known-sites)</em> is the required option for GATK BaseRecalibrator tool, it is necessary to provide at least one database file to the <strong>Known INDELs</strong> or <strong>Known SNPs</strong> input port.</li> <li>If you are providing pre-generated STAR reference index make sure it is created using the adequate version of STAR (check the STAR version in the original <a href="https://github.com/gatk-workflows/gatk3-4-rnaseq-germline-snps-indels/blob/master/rna-germline-variant-calling.wdl">WDL file</a>).</li> <li>When converting FASTQ files to an unmapped BAM file using <strong>Picard FastqToSam</strong>, it is required to set the <strong>Platform</strong> (<code>PLATFORM=</code>) parameter.</li> <li>This workflow allows you to process one sample per task execution. If you are planning to process more than one sample, it is required to run multiple task executions in batch mode. More about batch analyses can be found <a href="https://docs.sevenbridges.com/docs/about-batch-analyses">here</a>.</li> </ul> <h3>Performance Benchmarking</h3> <p>The default memory and CPU requirements for each app in the workflow are the same as in the original <a href="https://github.com/gatk-workflows/gatk3-4-rnaseq-germline-snps-indels/blob/master/rna-germline-variant-calling.wdl">GATK Best Practices WDL</a>. You can change the default runtime requirements for <strong>STAR GenomeGenerate</strong> and <strong>STAR Align</strong> apps.</p> <p>| Experiment type | Input size | Paired-end | # of reads | Read length | Duration | AWS Instance Cost (spot) | AWS Instance Cost (on-demand) | |:--------------:|:------------:|:--------:|:-------:|:---------:|:----------:|:------:|:------:| | RNA-Seq | 1.3 GB | Yes | 16M | 101 | 2h44min | 0.79$ | 1.79$ | | RNA-Seq | 3.9 GB | Yes | 50M | 101 | 4h38min | 1.29$ | 2.71$ | | RNA-Seq | 6.5 GB | Yes | 82M | 101 | 6h44min | 1.85$ | 3.84$ | | RNA-Seq | 12.9 GB | Yes | 164M | 101 | 12h4min | 3.30$ | 6.99$ |</p> <h3>API Python Implementation</h3> <p>The workflow's draft task can also be submitted via the API. To learn how to get your Authentication token and API endpoint for the corresponding platform, visit our <a href="https://github.com/sbg/sevenbridges-python#authentication-and-configuration">documentation</a>.</p> <pre><code>from sevenbridges import Api authentication_token, api_endpoint = "enter_your_token", "enter_api_endpoint" api = Api(token=authentication_token, url=api_endpoint) # Get project_id/workflow_id from your address bar. Example: https://igor.sbgenomics.com/u/your_username/project/workflow project_id = "your_username/project" workflow_id = "your_username/project/workflow" # Get file names from files in your project. inputs = { "input": api.files.query(project=project_id, names=['Homo_sapiens_assembly19_1000genomes_decoy.whole_genome.interval_list']), "in_alignments": api.files.query(project=project_id, names=['G26234.HCC1187_1Mreads.bam'])[0], "in_reference": api.files.query(project=project_id, names=['Homo_sapiens_assembly19_1000genomes_decoy.fasta'])[0], "in_gene_annotation": api.files.query(project=project_id, names=['star.gencode.v19.transcripts.patched_contigs.gtf'])[0], "in_reference_or_index": api.files.query(project=project_id, names=['Homo_sapiens_assembly19_1000genomes_decoy.star.gencode.v19.transcripts.patched_contigs.star-2.5.3a_modified-index-archive.tar'])[0], "known_indels": api.files.query(project=project_id, names=['Mills_and_1000G_gold_standard.indels.b37.sites.vcf', 'Homo_sapiens_assembly19_1000genomes_decoy.known_indels.vcf']), "known_snps": api.files.query(project=project_id, names=['Homo_sapiens_assembly19_1000genomes_decoy.dbsnp138.vcf']), } task = api.tasks.create(name='GATK4 RNA-Seq Workflow - API Example', project=project_id, app=workflow_id, inputs=inputs, run=False) # For running a batch task task = api.tasks.create(name='GATK4 RNA-Seq Workflow - API Batch Example', project=project_id, app=workflow_id, inputs=inputs, run=False, batch_input='in_alignments', batch_by = { 'type': 'CRITERIA', 'criteria': [ 'metadata.sample_id'] }) </code></pre> <p>Instructions for installing and configuring the API Python client are provided on GitHub. For more information about using the API Python client, consult <a href="http://sevenbridges-python.readthedocs.io/en/latest/">sevenbridges-python documentation</a>. More examples are available <a href="https://github.com/sbg/okAPI">here</a>.</p> <p>Additionally, <a href="https://github.com/sbg/sevenbridges-r">API R</a> and <a href="https://github.com/sbg/sevenbridges-java">API Java</a> clients are available. To learn more about using these API clients please refer to the <a href="https://sbg.github.io/sevenbridges-r/">API R client documentation</a>, and <a href="https://docs.sevenbridges.com/docs/java-library-quickstart">API Java client documentation</a>.</p>