Saved in:
Bibliographic Details
Main Authors: Wright, Robyn J, Fisher, Benjamin R, Comeau, André M, Langille, Morgan G I
Format: Artículo científico
Language:en
Published: Microbial genomics 2026
Subjects:
Online Access:https://pubmed.ncbi.nlm.nih.gov/42200521/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1868266044073705473
author Wright, Robyn J
Fisher, Benjamin R
Comeau, André M
Langille, Morgan G I
author_facet Wright, Robyn J
Fisher, Benjamin R
Comeau, André M
Langille, Morgan G I
Wright, Robyn J
Fisher, Benjamin R
Comeau, André M
Langille, Morgan G I
collection PubMed - marine biology
contents From classification to confirmation: verifying taxonomic classifications by mapping metagenomic reads to reference genomes. Wright, Robyn J Fisher, Benjamin R Comeau, André M Langille, Morgan G I Metagenomics Humans Metagenome Bacteria Genome, Bacterial Sequence Analysis, DNA Computational Biology Microbiota Obtaining high precision while maintaining high recall is an ongoing problem for metagenomic taxonomic classification in microbial ecology research. Parameter adjustments can achieve this in simulated samples, but in real samples - especially from environments like marine and soil - the proportion of classified reads drops sharply with precision increases. We, therefore, suggest verification of metagenomic taxonomic classifications obtained from a tool like Kraken by mapping their assigned reads to reference genomes to assess genomic coverage. In simulations, filtering the identified species to only those with ≥0.5% reference genome coverage removed 99.7% of false-positive taxa. Applying this method to samples from real datasets requires a more nuanced approach that considers sequencing depth, whether the samples are high- or low-microbial biomass, and database completeness with respect to the sampled environment. Nevertheless, we show that clinically relevant Kraken-identified taxa, such as identified in human stool samples, lack any reads mapping to their reference genome and are likely false positives driven by contaminating phage sequences within reference genomes. Similarly, in human blood and lung tumour datasets, only 18 and 11 species, respectively, have ≥1% reference genome coverage and likely represent sample collection or sequencing contaminants. Marine and soil samples pose additional challenges due to lower representation in reference databases, leading to low nucleotide identity between sequenced reads and reference genomes and similarity only at higher taxonomic ranks. We recommend genome coverage checking to researchers in all fields of microbial ecology and provide an open-source pipeline on GitHub (GeCoCheck): https://github.com/R-Wright-1/GeCoCheck.
format Artículo científico
id pubmed_42200521
institution PubMed
language en
publishDate 2026
publisher Microbial genomics
record_format pubmed
spellingShingle From classification to confirmation: verifying taxonomic classifications by mapping metagenomic reads to reference genomes.
Wright, Robyn J
Fisher, Benjamin R
Comeau, André M
Langille, Morgan G I
Metagenomics
Humans
Metagenome
Bacteria
Genome, Bacterial
Sequence Analysis, DNA
Computational Biology
Microbiota
From classification to confirmation: verifying taxonomic classifications by mapping metagenomic reads to reference genomes. Wright, Robyn J Fisher, Benjamin R Comeau, André M Langille, Morgan G I Metagenomics Humans Metagenome Bacteria Genome, Bacterial Sequence Analysis, DNA Computational Biology Microbiota Obtaining high precision while maintaining high recall is an ongoing problem for metagenomic taxonomic classification in microbial ecology research. Parameter adjustments can achieve this in simulated samples, but in real samples - especially from environments like marine and soil - the proportion of classified reads drops sharply with precision increases. We, therefore, suggest verification of metagenomic taxonomic classifications obtained from a tool like Kraken by mapping their assigned reads to reference genomes to assess genomic coverage. In simulations, filtering the identified species to only those with ≥0.5% reference genome coverage removed 99.7% of false-positive taxa. Applying this method to samples from real datasets requires a more nuanced approach that considers sequencing depth, whether the samples are high- or low-microbial biomass, and database completeness with respect to the sampled environment. Nevertheless, we show that clinically relevant Kraken-identified taxa, such as identified in human stool samples, lack any reads mapping to their reference genome and are likely false positives driven by contaminating phage sequences within reference genomes. Similarly, in human blood and lung tumour datasets, only 18 and 11 species, respectively, have ≥1% reference genome coverage and likely represent sample collection or sequencing contaminants. Marine and soil samples pose additional challenges due to lower representation in reference databases, leading to low nucleotide identity between sequenced reads and reference genomes and similarity only at higher taxonomic ranks. We recommend genome coverage checking to researchers in all fields of microbial ecology and provide an open-source pipeline on GitHub (GeCoCheck): https://github.com/R-Wright-1/GeCoCheck.
title From classification to confirmation: verifying taxonomic classifications by mapping metagenomic reads to reference genomes.
topic Metagenomics
Humans
Metagenome
Bacteria
Genome, Bacterial
Sequence Analysis, DNA
Computational Biology
Microbiota
url https://pubmed.ncbi.nlm.nih.gov/42200521/