Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Hélène Zuber
Format:	Recurso digital
Language:
Published:	Zenodo 2025
Online Access:	https://doi.org/10.5281/zenodo.17316262
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866902055384252416
author	Hélène Zuber
author_facet	Hélène Zuber
contents	<p>The pipeline allows the analysis of fastq files adapted from (https://doi.org/10.3389/fpls.2018.01438) and is composed of scripts using Python (v3), Biopython (v1.79) 52 and RegEX (v 2022.9.13) libraries. Reads with low quality bases (= < Q10) within the 15-base random sequence of the read 2 or within the 30 bases downstream of the delimiter sequence were filtered out. Sequences with identical nucleotides in 15-base random sequence were deduplicated. Next, nucleotide sequences corresponding to the forward PCR2 primer were searched into reads 1 to identify the corresponding target. One mismatch was tolerated. Matched reads 1 and their corresponding reads 2 were extracted for further analysis. Reads 2 that contain the delimiter sequence were selected and subsequently trimmed from their random and delimiter sequences. As some of the target sequences were short, some reads 2 ran into the 5′ forward PCR primer, the sequence of which was also removed. Reads 2 shorter than 20 nucleotides after these trimming steps were excluded from further analysis. In order to map 3′ extremities of target RNAs, the 20-nucleotide sequences downstream of the read-2 delimiter sequence were mapped to the corresponding reference sequence, which goes from the first nucleotide of the transcript that maps the forward PCR2 primer to the end of the target RNA. Up to two mismatches were tolerated, with the exception of the first nucleotide downstream of the mapping site that had to map perfectly. To map the 3' end position of reads 2 with untemplated tails, the sequences of the unmatched reads 2 were successively trimmed from their 3' end with a single nucleotide trimming step, until they could be mapped to the reference sequence, or until a maximum of 30 nucleotides had been removed. For each successfully mapped read 2, untemplated nucleotides at the 3' end were extracted and annotated according to their nucleotide tail composition as: U-tails (i.e., only composed of Us), U-rich tails (i.e., composed of a majority of Us, >70%), A-tails (i.e., only composed of As), A-rich tails (i.e., composed of a majority of As, >70%), C-tails (i.e., only composed of Cs), C-rich tails (i.e., composed of a majority of Cs, >70%), G-tails (i.e., only composed of Gs), and G-rich tails (i.e., composed of a majority of Gs, >70%).</p>
format	Recurso digital
id	zenodo_https___doi_org_10_5281_zenodo_17316262
institution	Zenodo
language
publishDate	2025
publisher	Zenodo
record_format	zenodo
spellingShingle	hzuber67/RACEseq_3ETS_ITS1: Bioinformatic pipeline for mapping pre-rRNA 3' ends using 3' RACE-seq Hélène Zuber <p>The pipeline allows the analysis of fastq files adapted from (https://doi.org/10.3389/fpls.2018.01438) and is composed of scripts using Python (v3), Biopython (v1.79) 52 and RegEX (v 2022.9.13) libraries. Reads with low quality bases (= < Q10) within the 15-base random sequence of the read 2 or within the 30 bases downstream of the delimiter sequence were filtered out. Sequences with identical nucleotides in 15-base random sequence were deduplicated. Next, nucleotide sequences corresponding to the forward PCR2 primer were searched into reads 1 to identify the corresponding target. One mismatch was tolerated. Matched reads 1 and their corresponding reads 2 were extracted for further analysis. Reads 2 that contain the delimiter sequence were selected and subsequently trimmed from their random and delimiter sequences. As some of the target sequences were short, some reads 2 ran into the 5′ forward PCR primer, the sequence of which was also removed. Reads 2 shorter than 20 nucleotides after these trimming steps were excluded from further analysis. In order to map 3′ extremities of target RNAs, the 20-nucleotide sequences downstream of the read-2 delimiter sequence were mapped to the corresponding reference sequence, which goes from the first nucleotide of the transcript that maps the forward PCR2 primer to the end of the target RNA. Up to two mismatches were tolerated, with the exception of the first nucleotide downstream of the mapping site that had to map perfectly. To map the 3' end position of reads 2 with untemplated tails, the sequences of the unmatched reads 2 were successively trimmed from their 3' end with a single nucleotide trimming step, until they could be mapped to the reference sequence, or until a maximum of 30 nucleotides had been removed. For each successfully mapped read 2, untemplated nucleotides at the 3' end were extracted and annotated according to their nucleotide tail composition as: U-tails (i.e., only composed of Us), U-rich tails (i.e., composed of a majority of Us, >70%), A-tails (i.e., only composed of As), A-rich tails (i.e., composed of a majority of As, >70%), C-tails (i.e., only composed of Cs), C-rich tails (i.e., composed of a majority of Cs, >70%), G-tails (i.e., only composed of Gs), and G-rich tails (i.e., composed of a majority of Gs, >70%).</p>
title	hzuber67/RACEseq_3ETS_ITS1: Bioinformatic pipeline for mapping pre-rRNA 3' ends using 3' RACE-seq
url	https://doi.org/10.5281/zenodo.17316262

Similar Items