Saved in:
Bibliographic Details
Main Author: Louca, Stilianos
Format: Artículo científico
Language:en
Published: NAR genomics and bioinformatics 2025
Subjects:
Online Access:https://pubmed.ncbi.nlm.nih.gov/40585302/
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Machine learning models for delineating marine microbial taxa. Louca, Stilianos Machine Learning Archaea Bacteria Metagenome Phylogeny Genome, Bacterial Aquatic Organisms The relationship between gene content differences and microbial taxonomic divergence remains poorly understood, and algorithms for delineating novel microbial taxa above genus level based on multiple genome similarity metrics are lacking. Addressing these gaps is important for macroevolutionary theory, biodiversity assessments, and discovery of novel taxa in metagenomes. Here, I develop machine learning classifier models, based on multiple genome similarity metrics, to determine whether any two marine bacterial and archaeal (prokaryotic) metagenome-assembled genomes (MAGs) belong to the same taxon, from the genus up to the phylum levels. Metrics include average amino acid and nucleotide identities, and fractions of shared genes within various categories, applied to 14 390 previously published non-redundant MAGs. At all taxonomic levels, the balanced accuracy (average of the true-positive and true-negative rate) of classifiers exceeded 92%, suggesting that simple genome similarity metrics serve as good taxon differentiators. Predictor selection and sensitivity analyses revealed gene categories, e.g. those involved in metabolism of cofactors and vitamins, particularly correlated to taxon divergence. Predicted taxon delineations were further used to enumerate marine prokaryotic taxa. Statistical analyses of those enumerations suggest that over half of extant marine prokaryotic phyla, classes, and orders have already been recovered by genome-resolved metagenomic surveys.