Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Artículo científico |
| Language: | en |
| Published: |
Microbial genomics
2026
|
| Subjects: | |
| Online Access: | https://pubmed.ncbi.nlm.nih.gov/42200521/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Table of Contents:
- From classification to confirmation: verifying taxonomic classifications by mapping metagenomic reads to reference genomes. Wright, Robyn J Fisher, Benjamin R Comeau, André M Langille, Morgan G I Metagenomics Humans Metagenome Bacteria Genome, Bacterial Sequence Analysis, DNA Computational Biology Microbiota Obtaining high precision while maintaining high recall is an ongoing problem for metagenomic taxonomic classification in microbial ecology research. Parameter adjustments can achieve this in simulated samples, but in real samples - especially from environments like marine and soil - the proportion of classified reads drops sharply with precision increases. We, therefore, suggest verification of metagenomic taxonomic classifications obtained from a tool like Kraken by mapping their assigned reads to reference genomes to assess genomic coverage. In simulations, filtering the identified species to only those with ≥0.5% reference genome coverage removed 99.7% of false-positive taxa. Applying this method to samples from real datasets requires a more nuanced approach that considers sequencing depth, whether the samples are high- or low-microbial biomass, and database completeness with respect to the sampled environment. Nevertheless, we show that clinically relevant Kraken-identified taxa, such as identified in human stool samples, lack any reads mapping to their reference genome and are likely false positives driven by contaminating phage sequences within reference genomes. Similarly, in human blood and lung tumour datasets, only 18 and 11 species, respectively, have ≥1% reference genome coverage and likely represent sample collection or sequencing contaminants. Marine and soil samples pose additional challenges due to lower representation in reference databases, leading to low nucleotide identity between sequenced reads and reference genomes and similarity only at higher taxonomic ranks. We recommend genome coverage checking to researchers in all fields of microbial ecology and provide an open-source pipeline on GitHub (GeCoCheck): https://github.com/R-Wright-1/GeCoCheck.