Saved in:
| Main Author: | |
|---|---|
| Format: | Recurso digital |
| Language: | |
| Published: |
Zenodo
2025
|
| Online Access: | https://doi.org/10.5281/zenodo.17316262 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866902055384252416 |
|---|---|
| author | Hélène Zuber |
| author_facet | Hélène Zuber |
| contents | <p>The pipeline allows the analysis of fastq files adapted from (https://doi.org/10.3389/fpls.2018.01438) and is composed of scripts using Python (v3), Biopython (v1.79) 52 and RegEX (v 2022.9.13) libraries. Reads with low quality bases (= < Q10) within the 15-base random sequence of the read 2 or within the 30 bases downstream of the delimiter sequence were filtered out. Sequences with identical nucleotides in 15-base random sequence were deduplicated. Next, nucleotide sequences corresponding to the forward PCR2 primer were searched into reads 1 to identify the corresponding target. One mismatch was tolerated. Matched reads 1 and their corresponding reads 2 were extracted for further analysis. Reads 2 that contain the delimiter sequence were selected and subsequently trimmed from their random and delimiter sequences. As some of the target sequences were short, some reads 2 ran into the 5′ forward PCR primer, the sequence of which was also removed. Reads 2 shorter than 20 nucleotides after these trimming steps were excluded from further analysis. In order to map 3′ extremities of target RNAs, the 20-nucleotide sequences downstream of the read-2 delimiter sequence were mapped to the corresponding reference sequence, which goes from the first nucleotide of the transcript that maps the forward PCR2 primer to the end of the target RNA. Up to two mismatches were tolerated, with the exception of the first nucleotide downstream of the mapping site that had to map perfectly. To map the 3' end position of reads 2 with untemplated tails, the sequences of the unmatched reads 2 were successively trimmed from their 3' end with a single nucleotide trimming step, until they could be mapped to the reference sequence, or until a maximum of 30 nucleotides had been removed. For each successfully mapped read 2, untemplated nucleotides at the 3' end were extracted and annotated according to their nucleotide tail composition as: U-tails (i.e., only composed of Us), U-rich tails (i.e., composed of a majority of Us, >70%), A-tails (i.e., only composed of As), A-rich tails (i.e., composed of a majority of As, >70%), C-tails (i.e., only composed of Cs), C-rich tails (i.e., composed of a majority of Cs, >70%), G-tails (i.e., only composed of Gs), and G-rich tails (i.e., composed of a majority of Gs, >70%).</p> |
| format | Recurso digital |
| id | zenodo_https___doi_org_10_5281_zenodo_17316262 |
| institution | Zenodo |
| language | |
| publishDate | 2025 |
| publisher | Zenodo |
| record_format | zenodo |
| spellingShingle | hzuber67/RACEseq_3ETS_ITS1: Bioinformatic pipeline for mapping pre-rRNA 3' ends using 3' RACE-seq Hélène Zuber <p>The pipeline allows the analysis of fastq files adapted from (https://doi.org/10.3389/fpls.2018.01438) and is composed of scripts using Python (v3), Biopython (v1.79) 52 and RegEX (v 2022.9.13) libraries. Reads with low quality bases (= < Q10) within the 15-base random sequence of the read 2 or within the 30 bases downstream of the delimiter sequence were filtered out. Sequences with identical nucleotides in 15-base random sequence were deduplicated. Next, nucleotide sequences corresponding to the forward PCR2 primer were searched into reads 1 to identify the corresponding target. One mismatch was tolerated. Matched reads 1 and their corresponding reads 2 were extracted for further analysis. Reads 2 that contain the delimiter sequence were selected and subsequently trimmed from their random and delimiter sequences. As some of the target sequences were short, some reads 2 ran into the 5′ forward PCR primer, the sequence of which was also removed. Reads 2 shorter than 20 nucleotides after these trimming steps were excluded from further analysis. In order to map 3′ extremities of target RNAs, the 20-nucleotide sequences downstream of the read-2 delimiter sequence were mapped to the corresponding reference sequence, which goes from the first nucleotide of the transcript that maps the forward PCR2 primer to the end of the target RNA. Up to two mismatches were tolerated, with the exception of the first nucleotide downstream of the mapping site that had to map perfectly. To map the 3' end position of reads 2 with untemplated tails, the sequences of the unmatched reads 2 were successively trimmed from their 3' end with a single nucleotide trimming step, until they could be mapped to the reference sequence, or until a maximum of 30 nucleotides had been removed. For each successfully mapped read 2, untemplated nucleotides at the 3' end were extracted and annotated according to their nucleotide tail composition as: U-tails (i.e., only composed of Us), U-rich tails (i.e., composed of a majority of Us, >70%), A-tails (i.e., only composed of As), A-rich tails (i.e., composed of a majority of As, >70%), C-tails (i.e., only composed of Cs), C-rich tails (i.e., composed of a majority of Cs, >70%), G-tails (i.e., only composed of Gs), and G-rich tails (i.e., composed of a majority of Gs, >70%).</p> |
| title | hzuber67/RACEseq_3ETS_ITS1: Bioinformatic pipeline for mapping pre-rRNA 3' ends using 3' RACE-seq |
| url | https://doi.org/10.5281/zenodo.17316262 |