Saved in:
Bibliographic Details
Main Authors: Schmitt, Matthew S, Lee, Kiseok, Bunbury, Freddy, Landsittel, Joseph A, Vitelli, Vincenzo, Kuehn, Seppe
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.03547
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912941750616064
author Schmitt, Matthew S
Lee, Kiseok
Bunbury, Freddy
Landsittel, Joseph A
Vitelli, Vincenzo
Kuehn, Seppe
author_facet Schmitt, Matthew S
Lee, Kiseok
Bunbury, Freddy
Landsittel, Joseph A
Vitelli, Vincenzo
Kuehn, Seppe
contents From soil to the gut, communities composed of thousands of microbes perform functions such as carbon sequestration and immune system regulation. Here, we introduce a data-driven approach that explains how community function can be traced to just a few groups of microbes or genes. In gut communities, our neural-network based clustering algorithm correctly recovers known functional groups. In the ocean metagenome, it distills ~500 gene modules down to three sparse groups highlighting survival strategies at different depths. In soils, it distills ~4400 bacterial species into two groups that enter a mathematical model of nitrate metabolism. By combining interpretable ML with strain isolation and sequencing experiments, we connect the metabolic specialization of each group to community-wide responses to perturbations. This integrated approach yields simple structure-function maps of microbiomes, allowing the discovery of molecular mechanisms underlying human and environmental health. More broadly, we illustrate how to do function-informed dimensionality reduction in biology.
format Preprint
id arxiv_https___arxiv_org_abs_2603_03547
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Learning functional groups in complex microbiomes
Schmitt, Matthew S
Lee, Kiseok
Bunbury, Freddy
Landsittel, Joseph A
Vitelli, Vincenzo
Kuehn, Seppe
Biological Physics
Genomics
From soil to the gut, communities composed of thousands of microbes perform functions such as carbon sequestration and immune system regulation. Here, we introduce a data-driven approach that explains how community function can be traced to just a few groups of microbes or genes. In gut communities, our neural-network based clustering algorithm correctly recovers known functional groups. In the ocean metagenome, it distills ~500 gene modules down to three sparse groups highlighting survival strategies at different depths. In soils, it distills ~4400 bacterial species into two groups that enter a mathematical model of nitrate metabolism. By combining interpretable ML with strain isolation and sequencing experiments, we connect the metabolic specialization of each group to community-wide responses to perturbations. This integrated approach yields simple structure-function maps of microbiomes, allowing the discovery of molecular mechanisms underlying human and environmental health. More broadly, we illustrate how to do function-informed dimensionality reduction in biology.
title Learning functional groups in complex microbiomes
topic Biological Physics
Genomics
url https://arxiv.org/abs/2603.03547