Saved in:
Bibliographic Details
Main Author: Malfertheiner, Lukas
Format: Recurso digital
Language:
Published: Zenodo 2025
Online Access:https://doi.org/10.5281/zenodo.15689424
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866902117334122496
author Malfertheiner, Lukas
author_facet Malfertheiner, Lukas
contents <div dir="auto"> <h1> </h1> </div> <p dir="auto">This repository (community_conservatism-main.zip) contains the custom scripts used in the Manuscript "Community conservatism is widespread throughout microbial phyla and environments".</p> <p dir="auto">The following packages and programs are needed:</p> <div> <pre><code>MAPseq v.2.2.1 fastTree 2.1.10 HPC-CLUST v1.1.0 FlashWeave v.0.19.0 Python 3.7.6 Python libraries: numpy 1.18.1 Pandas 1.0.3 bokeh 2.2.3 ete3 3.1.2 holoviews 0.11.6 scipy 1.4.1 WordCloud 1.5.0 </code></pre> <div> </div> </div> <p dir="auto">For the workflow, we here show the general trend containing all fully annotated microbes (Bacteria&Archaea). The workflow was then repeated seperately for each shown envionment (animal, marine, freshwater plant and soil) as well as for all phyla which contain >500 99% OTUs.</p> <p dir="auto">First, phylogenetic trees are contructed with the "make_all_trees" ipyndb workbook. These trees are converted into a pairwise tree branch length matrix, which is then used to select OTU-pairs following a uniform distribution. This is reproducible via the "select_taxa" script, where the distances are used to obtain bins and fill them with different OTU-pairs, as well as save the respective files. The OTU-pairs are then used to compute the ß-similarity values of all sample-sample comparisons off the two OTUs, as well as additional metrics such as sequence similarity, main evironment etc. (See Methods of Paper)</p> <p dir="auto">All of these are combined into one file, which is then further processed to plot most results.</p> <p dir="auto">The file "microbes_result_combined.csv" contains the main results of the general microbial trend needed to reproduce the main result figures. Follow the scipt "main_figure_plots" to obtain said plots, and experiment using different percentiles etc.<br><br><br>All raw data (OTU-tables in hdf5 format) is found in otu99.h5.gz and otu90.h5.gz for the read couts of 99% OTUs and 90% OTUs, respectively. The "open_h5_file.ipynb" file contains the function to open said file.<br>otu.99.tree is the phylogenetic tree, from which we calculated the tree branch lengths ("Phylogenetic Distance") between OTU-pairs.<br><br>Additionally, we provide the two matrices for the example taxonomy off A90_104- "combined" for the combined clustering (sequence similarity combined with community similarity, see Methods for formula) and "seq_dissimilarity", which contains the pariwise values of: "2 - sequence similarity" of the 16S rRNA of all OTU-pairs therein. </p>
format Recurso digital
id zenodo_https___doi_org_10_5281_zenodo_15689424
institution Zenodo
language
publishDate 2025
publisher Zenodo
record_format zenodo
spellingShingle Code and Data for the manuscript "Community conservatism is widespread throughout microbial phyla and environments"
Malfertheiner, Lukas
<div dir="auto"> <h1> </h1> </div> <p dir="auto">This repository (community_conservatism-main.zip) contains the custom scripts used in the Manuscript "Community conservatism is widespread throughout microbial phyla and environments".</p> <p dir="auto">The following packages and programs are needed:</p> <div> <pre><code>MAPseq v.2.2.1 fastTree 2.1.10 HPC-CLUST v1.1.0 FlashWeave v.0.19.0 Python 3.7.6 Python libraries: numpy 1.18.1 Pandas 1.0.3 bokeh 2.2.3 ete3 3.1.2 holoviews 0.11.6 scipy 1.4.1 WordCloud 1.5.0 </code></pre> <div> </div> </div> <p dir="auto">For the workflow, we here show the general trend containing all fully annotated microbes (Bacteria&Archaea). The workflow was then repeated seperately for each shown envionment (animal, marine, freshwater plant and soil) as well as for all phyla which contain >500 99% OTUs.</p> <p dir="auto">First, phylogenetic trees are contructed with the "make_all_trees" ipyndb workbook. These trees are converted into a pairwise tree branch length matrix, which is then used to select OTU-pairs following a uniform distribution. This is reproducible via the "select_taxa" script, where the distances are used to obtain bins and fill them with different OTU-pairs, as well as save the respective files. The OTU-pairs are then used to compute the ß-similarity values of all sample-sample comparisons off the two OTUs, as well as additional metrics such as sequence similarity, main evironment etc. (See Methods of Paper)</p> <p dir="auto">All of these are combined into one file, which is then further processed to plot most results.</p> <p dir="auto">The file "microbes_result_combined.csv" contains the main results of the general microbial trend needed to reproduce the main result figures. Follow the scipt "main_figure_plots" to obtain said plots, and experiment using different percentiles etc.<br><br><br>All raw data (OTU-tables in hdf5 format) is found in otu99.h5.gz and otu90.h5.gz for the read couts of 99% OTUs and 90% OTUs, respectively. The "open_h5_file.ipynb" file contains the function to open said file.<br>otu.99.tree is the phylogenetic tree, from which we calculated the tree branch lengths ("Phylogenetic Distance") between OTU-pairs.<br><br>Additionally, we provide the two matrices for the example taxonomy off A90_104- "combined" for the combined clustering (sequence similarity combined with community similarity, see Methods for formula) and "seq_dissimilarity", which contains the pariwise values of: "2 - sequence similarity" of the 16S rRNA of all OTU-pairs therein. </p>
title Code and Data for the manuscript "Community conservatism is widespread throughout microbial phyla and environments"
url https://doi.org/10.5281/zenodo.15689424