Gardado en:
Detalles Bibliográficos
Main Authors: Hrab, Pavlo, Zdouc, Mitja Maximilian, Terlouw, Barbara, Loureiro, Catarina, Schorn, Michelle A., de Ridder, Dick, Medema, Marnix H., Sipkema, Detmer
Formato: Recurso digital
Idioma:
Publicado: Zenodo 2026
Subjects:
Acceso en liña:https://doi.org/10.5281/zenodo.18936217
Tags: Engadir etiqueta
Sen Etiquetas, Sexa o primeiro en etiquetar este rexistro!
Table of Contents:
  • <h1>Acidobacteriota MAGs and metadata</h1> <h2>Dataset contents</h2> <ul> <li><em>acido.mags.tar.gz</em> - 6,169 metagenome-assembled genomes (MAGs) of Acidobacteriota in FASTA format.</li> <li><em>metadata.tsv</em> - Tab-separated metadata table (6,169 rows, 54 columns) describing each MAG.</li> <li><em>environment_classification.png</em> - Environmental classification categories of the MAGs.</li> </ul> <h2>Column descriptions for metadata.tsv</h2> <h3>Genome identifier</h3> <table> <tbody> <tr> <th>Column</th> <th>Description</th> </tr> <tr> <td><code>record</code></td> <td>Unique genome identifier. Format depends on the source: NCBI Assembly accession (e.g., <code>GCA_000179915.2_ASM17991v2_genomic</code>), JGI taxon OID with bin number (e.g., <code>3300012889_2</code>), mOTUs-db genome name (e.g., <code>TARA_SAMEA2621099_MAG_00000082</code>), or sponge MAG label (e.g., <code>Aply16_bin.10.permissive</code>).</td> </tr> </tbody> </table> <h3>BGC annotations (antiSMASH)</h3> <table> <tbody> <tr> <th>Column</th> <th>Description</th> </tr> <tr> <td><code>bgc_category</code></td> <td>Semicolon-separated list of biosynthetic gene cluster (BGC) types detected by antiSMASH (e.g., <code>terpene;NRPS-like;T3PKS</code>).</td> </tr> <tr> <td><code>bgc_lengths</code></td> <td>Semicolon-separated list of BGC lengths in base pairs, in the same order as <code>bgc_category</code>.</td> </tr> <tr> <td><code>asmod_completeness</code></td> <td>Mean antiSMASH module completeness score across all detected BGCs (0–1).</td> </tr> <tr> <td><code>region_completeness</code></td> <td>Mean region completeness score across all detected BGC regions (0–1).</td> </tr> <tr> <td><code>candidate_completeness</code></td> <td>Mean candidate cluster completeness score (0–1).</td> </tr> <tr> <td><code>proto_cluster_completeness</code></td> <td>Mean proto-cluster completeness score (0–1).</td> </tr> <tr> <td><code>protocore_completeness</code></td> <td>Mean proto-core completeness score (0–1).</td> </tr> </tbody> </table> <h3>GTDB taxonomy</h3> <table> <tbody> <tr> <th>Column</th> <th>Description</th> </tr> <tr> <td><code>domain</code></td> <td>GTDB domain classification (all <code>d_Bacteria</code>).</td> </tr> <tr> <td><code>phylum</code></td> <td>GTDB phylum (all <code>p_Acidobacteriota</code>).</td> </tr> <tr> <td><code>class</code></td> <td>GTDB class (e.g., <code>c_Terriglobia</code>, <code>c_Vicinamibacteria</code>).</td> </tr> <tr> <td><code>order</code></td> <td>GTDB order (e.g., <code>o_Terriglobales</code>, <code>o_Bryobacterales</code>).</td> </tr> <tr> <td><code>family</code></td> <td>GTDB family (e.g., <code>f_Acidobacteriaceae</code>, <code>f_Koribacteraceae</code>).</td> </tr> <tr> <td><code>genus</code></td> <td>GTDB genus (e.g., <code>g_Edaphobacter</code>). Empty string if unclassified at this rank.</td> </tr> <tr> <td><code>species</code></td> <td>GTDB species (e.g., <code>s_Koribacter versatilis_A</code>). Empty string if unclassified at this rank.</td> </tr> </tbody> </table> <h3>Genome information</h3> <table> <tbody> <tr> <th>Column</th> <th>Description</th> </tr> <tr> <td><code>genome</code></td> <td>Genome FASTA filename within the archive (e.g., <code>637000001.fna</code>).</td> </tr> <tr> <td><code>secondary_cluster</code></td> <td>dRep secondary cluster assignment (e.g., <code>778_1</code>). Groups genomes at ~95% ANI.</td> </tr> <tr> <td><code>drep_cluster_size</code></td> <td>Number of genomes in the dRep secondary cluster.</td> </tr> <tr> <td><code>spec_met_ratio</code></td> <td>Specialized metabolite ratio: proportion of total CDS devoted to BGCs.</td> </tr> </tbody> </table> <h3>Genome quality metrics (CheckM2)</h3> <table> <tbody> <tr> <th>Column</th> <th>Description</th> </tr> <tr> <td><code>completeness</code></td> <td>CheckM2 estimated genome completeness (%).</td> </tr> <tr> <td><code>contamination</code></td> <td>CheckM2 estimated genome contamination (%).</td> </tr> <tr> <td><code>coding_density</code></td> <td>Fraction of the genome that is protein-coding.</td> </tr> <tr> <td><code>contig_n50</code></td> <td>Contig N50 in base pairs.</td> </tr> <tr> <td><code>avg_gene_length</code></td> <td>Average predicted gene length in base pairs.</td> </tr> <tr> <td><code>genome_size</code></td> <td>Total genome assembly size in base pairs.</td> </tr> <tr> <td><code>gc_content</code></td> <td>GC content as a proportion (0–1).</td> </tr> <tr> <td><code>total_cds</code></td> <td>Total number of predicted protein-coding sequences.</td> </tr> <tr> <td><code>total_contigs</code></td> <td>Total number of contigs in the assembly.</td> </tr> <tr> <td><code>max_contig_length</code></td> <td>Length of the longest contig in base pairs.</td> </tr> </tbody> </table> <h3>Environmental classification</h3> <table> <tbody> <tr> <th>Column</th> <th>Description</th> </tr> <tr> <td><code>broad_environment</code></td> <td>High-level environment: <code>terrestrial</code>, <code>aquatic</code>, or <code>Unknown</code>.</td> </tr> <tr> <td><code>narrow_environment</code></td> <td>Intermediate environment: <code>soil</code>, <code>freshwater</code>, <code>marine</code>, <code>plants</code>, <code>rhizosphere</code>, <code>other organisms</code>, or <code>Unknown</code>.</td> </tr> <tr> <td><code>specific_environment</code></td> <td>Detailed environment annotation where available (e.g., <code>wastewater</code>, <code>peatland</code>), or <code>Unknown</code>.</td> </tr> <tr> <td><code>host_associated</code></td> <td>Whether the genome is from a host-associated environment: <code>host-associated</code> or <code>Unknown</code>.</td> </tr> <tr> <td><code>sediment</code></td> <td>Whether the genome is from a sediment environment: <code>sediment</code> or <code>Unknown</code>.</td> </tr> </tbody> </table> <p><strong>For the environment classification, please see the environment_classification.png figure.</strong></p> <h3>BGC counts by type</h3> <table> <tbody> <tr> <th>Column</th> <th>Description</th> </tr> <tr> <td><code>bgc_count</code></td> <td>Total number of BGCs detected per genome.</td> </tr> <tr> <td><code>NRPS</code></td> <td>Number of non-ribosomal peptide synthetase (NRPS and NRPS-like) BGCs.</td> </tr> <tr> <td><code>terpene</code></td> <td>Number of terpene BGCs.</td> </tr> <tr> <td><code>PKS</code></td> <td>Number of polyketide synthase (PKS) BGCs (T1PKS, T3PKS, etc.).</td> </tr> <tr> <td><code>saccharide</code></td> <td>Number of saccharide BGCs.</td> </tr> <tr> <td><code>hybrid</code></td> <td>Number of hybrid BGCs.</td> </tr> <tr> <td><code>other</code></td> <td>Number of BGCs not classified into the above categories.</td> </tr> <tr> <td><code>RiPP</code></td> <td>Number of ribosomally synthesized and post-translationally modified peptide (RiPP) BGCs.</td> </tr> <tr> <td><code>halogenated</code></td> <td>Number of halogenated BGCs.</td> </tr> </tbody> </table> <h3>BGC ratios by type</h3> <table> <tbody> <tr> <th>Column</th> <th>Description</th> </tr> <tr> <td><code>RiPP_ratio</code></td> <td>Fraction of total BGCs that are RiPPs.</td> </tr> <tr> <td><code>halogenated_ratio</code></td> <td>Fraction of total BGCs that are halogenated.</td> </tr> <tr> <td><code>NRPS_ratio</code></td> <td>Fraction of total BGCs that are NRPS/NRPS-like.</td> </tr> <tr> <td><code>terpene_ratio</code></td> <td>Fraction of total BGCs that are terpenes.</td> </tr> <tr> <td><code>PKS_ratio</code></td> <td>Fraction of total BGCs that are PKS.</td> </tr> <tr> <td><code>saccharide_ratio</code></td> <td>Fraction of total BGCs that are saccharides.</td> </tr> <tr> <td><code>hybrid_ratio</code></td> <td>Fraction of total BGCs that are hybrids.</td> </tr> <tr> <td><code>other_ratio</code></td> <td>Fraction of total BGCs in the "other" category.</td> </tr> </tbody> </table> <h3>Source provenance</h3> <table> <tbody> <tr> <th>Column</th> <th>Description</th> </tr> <tr> <td><code>source</code></td> <td>Metadata origin for the genome: <code>JGI</code> (3,132 genomes from IMG/M), <code>NCBI</code> (1,638 genomes from NCBI Assembly), <code>mOTUs-db</code> (1,278 genomes from the mOTUs database), or <code>Sponge</code> (121 genomes from a local sponge metagenome collection).</td> </tr> <tr> <td><code>project</code></td> <td>Project or study identifier. Contains BioProject accession for NCBI genomes (e.g., <code>PRJNA48971</code>), IMG taxon OID for JGI genomes (e.g., <code>3300012889</code>), or publication URL for mOTUs-db genomes. Empty for Sponge genomes.</td> </tr> <tr> <td><code>sample</code></td> <td>BioSample accession for NCBI genomes (e.g., <code>SAMN00100755</code>). Empty for all other sources.</td> </tr> </tbody> </table>