Saved in:
Bibliographic Details
Main Author: Hélène Zuber
Format: Recurso digital
Language:
Published: Zenodo 2025
Online Access:https://doi.org/10.5281/zenodo.17316262
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866902055384252416
author Hélène Zuber
author_facet Hélène Zuber
contents <p>The pipeline allows the analysis of fastq files adapted from (https://doi.org/10.3389/fpls.2018.01438) and is composed of scripts using Python (v3), Biopython (v1.79) 52 and RegEX (v 2022.9.13) libraries. Reads with low quality bases (= < Q10) within the 15-base random sequence of the read 2 or within the 30 bases downstream of the delimiter sequence were filtered out. Sequences with identical nucleotides in 15-base random sequence were deduplicated. Next, nucleotide sequences corresponding to the forward PCR2 primer were searched into reads 1 to identify the corresponding target. One mismatch was tolerated. Matched reads 1 and their corresponding reads 2 were extracted for further analysis. Reads 2 that contain the delimiter sequence were selected and subsequently trimmed from their random and delimiter sequences. As some of the target sequences were short, some reads 2 ran into the 5′ forward PCR primer, the sequence of which was also removed. Reads 2 shorter than 20 nucleotides after these trimming steps were excluded from further analysis. In order to map 3′ extremities of target RNAs, the 20-nucleotide sequences downstream of the read-2 delimiter sequence were mapped to the corresponding reference sequence, which goes from the first nucleotide of the transcript that maps the forward PCR2 primer to the end of the target RNA. Up to two mismatches were tolerated, with the exception of the first nucleotide downstream of the mapping site that had to map perfectly. To map the 3' end position of reads 2 with untemplated tails, the sequences of the unmatched reads 2 were successively trimmed from their 3' end with a single nucleotide trimming step, until they could be mapped to the reference sequence, or until a maximum of 30 nucleotides had been removed. For each successfully mapped read 2, untemplated nucleotides at the 3' end were extracted and annotated according to their nucleotide tail composition as: U-tails (i.e., only composed of Us), U-rich tails (i.e., composed of a majority of Us, >70%), A-tails (i.e., only composed of As), A-rich tails (i.e., composed of a majority of As, >70%), C-tails (i.e., only composed of Cs), C-rich tails (i.e., composed of a majority of Cs, >70%), G-tails (i.e., only composed of Gs), and G-rich tails (i.e., composed of a majority of Gs, >70%).</p>
format Recurso digital
id zenodo_https___doi_org_10_5281_zenodo_17316262
institution Zenodo
language
publishDate 2025
publisher Zenodo
record_format zenodo
spellingShingle hzuber67/RACEseq_3ETS_ITS1: Bioinformatic pipeline for mapping pre-rRNA 3' ends using 3' RACE-seq
Hélène Zuber
<p>The pipeline allows the analysis of fastq files adapted from (https://doi.org/10.3389/fpls.2018.01438) and is composed of scripts using Python (v3), Biopython (v1.79) 52 and RegEX (v 2022.9.13) libraries. Reads with low quality bases (= < Q10) within the 15-base random sequence of the read 2 or within the 30 bases downstream of the delimiter sequence were filtered out. Sequences with identical nucleotides in 15-base random sequence were deduplicated. Next, nucleotide sequences corresponding to the forward PCR2 primer were searched into reads 1 to identify the corresponding target. One mismatch was tolerated. Matched reads 1 and their corresponding reads 2 were extracted for further analysis. Reads 2 that contain the delimiter sequence were selected and subsequently trimmed from their random and delimiter sequences. As some of the target sequences were short, some reads 2 ran into the 5′ forward PCR primer, the sequence of which was also removed. Reads 2 shorter than 20 nucleotides after these trimming steps were excluded from further analysis. In order to map 3′ extremities of target RNAs, the 20-nucleotide sequences downstream of the read-2 delimiter sequence were mapped to the corresponding reference sequence, which goes from the first nucleotide of the transcript that maps the forward PCR2 primer to the end of the target RNA. Up to two mismatches were tolerated, with the exception of the first nucleotide downstream of the mapping site that had to map perfectly. To map the 3' end position of reads 2 with untemplated tails, the sequences of the unmatched reads 2 were successively trimmed from their 3' end with a single nucleotide trimming step, until they could be mapped to the reference sequence, or until a maximum of 30 nucleotides had been removed. For each successfully mapped read 2, untemplated nucleotides at the 3' end were extracted and annotated according to their nucleotide tail composition as: U-tails (i.e., only composed of Us), U-rich tails (i.e., composed of a majority of Us, >70%), A-tails (i.e., only composed of As), A-rich tails (i.e., composed of a majority of As, >70%), C-tails (i.e., only composed of Cs), C-rich tails (i.e., composed of a majority of Cs, >70%), G-tails (i.e., only composed of Gs), and G-rich tails (i.e., composed of a majority of Gs, >70%).</p>
title hzuber67/RACEseq_3ETS_ITS1: Bioinformatic pipeline for mapping pre-rRNA 3' ends using 3' RACE-seq
url https://doi.org/10.5281/zenodo.17316262