Guardado en:
| Autor principal: | |
|---|---|
| Formato: | Recurso digital |
| Lenguaje: | |
| Publicado: |
Zenodo
2026
|
| Acceso en línea: | https://doi.org/10.5281/zenodo.19225269 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Tabla de Contenidos:
- <h2>What's New</h2> <h3><code>bam2tensor-inspect</code> command</h3> <p>New CLI tool to inspect <code>.methylation.npz</code> output files without writing Python:</p> <pre><code>$ bam2tensor-inspect sample.methylation.npz sample.methylation.npz Genome: hg38 Chromosomes: 24 (chr1, chr2, ... chrX, chrY) Reads: 1,423,891 CpG sites: 28,217,448 Data points: 12,847,322 (sparsity: 99.97%) CpG index CRC32: a1b2c3d4 bam2tensor: v2.3 File size: 14.2 MB </code></pre> <p>Accepts multiple files and works on outputs from older versions (metadata fields are omitted gracefully).</p> <h3>Embedded provenance metadata in <code>.npz</code> files</h3> <p>Each output file now contains a <code>metadata.json</code> entry inside the ZIP archive with:</p> <ul> <li><code>bam2tensor_version</code> — version that produced the file</li> <li><code>genome_name</code> — reference genome identifier (e.g., <code>hg38</code>)</li> <li><code>expected_chromosomes</code> — chromosome list defining the column mapping</li> <li><code>total_cpg_sites</code> — number of CpG columns</li> <li><code>cpg_index_crc32</code> — CRC32 checksum of CpG positions (two files with the same CRC32 have identical column semantics and can be directly stacked/compared)</li> </ul> <p><code>scipy.sparse.load_npz</code> ignores this entry, so existing code is unaffected. Read metadata via <code>bam2tensor.metadata.read_npz_metadata()</code> or <code>unzip -p file.npz metadata.json</code>.</p> <h3>Improved output format documentation</h3> <p>The README now explicitly documents that column indices are determined by the reference genome's CpG sites and that <code>GenomeMethylationEmbedding</code> is needed to map columns back to genomic coordinates.</p>