Gespeichert in:
| Hauptverfasser: | , |
|---|---|
| Format: | Recurso digital |
| Sprache: | |
| Veröffentlicht: |
Zenodo
2026
|
| Schlagworte: | |
| Online-Zugang: | https://doi.org/10.5281/zenodo.19637642 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Inhaltsangabe:
- <div> <div># simGL</div> <br> <div>**simGL** simulates genotype likelihoods (GLs) from haplotypic genotype matrices,</div> <div>given per-sample coverage and sequencing error rates.</div> <div>It is designed to work seamlessly with</div> <div>[msprime](https://tskit.dev/msprime/) and [tskit](https://tskit.dev/tskit/)</div> <div>pipelines, but accepts any NumPy haplotype matrix.</div> <br> <div>## Installation</div> <br> <div>```bash</div> <div>pip install simGL</div> <div>```</div> <br> <div>Or from source:</div> <br> <div>```bash</div> <div>git clone https://github.com/RacimoLab/simGL.git</div> <div>cd simGL</div> <div>pip install -e .</div> <div>```</div> <br> <div>## Quick example</div> <br> <div>```python</div> <div>import msprime</div> <div>import numpy as np</div> <div>import simGL</div> <br> <div># 1. Simulate a tree sequence and extract the biallelic genotype matrix</div> <div>ts = msprime.sim_ancestry(</div> <div>samples=10, ploidy=2, sequence_length=100_000,</div> <div>recombination_rate=1e-8, population_size=10_000, random_seed=1,</div> <div>)</div> <div>ts = msprime.sim_mutations(ts, rate=1e-4, random_seed=1)</div> <br> <div>gm_full = ts.genotype_matrix()</div> <div>biallelic = gm_full.max(axis=1) == 1</div> <div>gm = gm_full[biallelic] # shape (n_sites, n_haplotypes)</div> <br> <div># 2. Get reference and alternative alleles</div> <div>variants = list(ts.variants())</div> <div>ref = np.array([v.alleles[0] for v in variants])[biallelic]</div> <div>alt = np.array([v.alleles[1] for v in variants])[biallelic]</div> <br> <div># 3. Simulate allele read counts</div> <div>arc = simGL.sim_allelereadcounts(</div> <div>gm, mean_depth=10., std_depth=2., e=0.01,</div> <div>ploidy=2, seed=42, ref=ref, alt=alt,</div> <div>)</div> <div># arc shape: (n_sites, n_individuals, 4) — A, C, G, T read counts</div> <br> <div># 4. Compute genotype likelihoods</div> <div>GL = simGL.allelereadcounts_to_GL(arc, e=0.01, ploidy=2)</div> <div># GL shape: (n_sites, n_individuals, 10) — all diploid ACGT genotypes</div> <br> <div># 5. Subset to biallelic genotypes and write a VCF</div> <div>Ra = simGL.ref_alt_to_index(ref, alt)</div> <div>GL_sub = simGL.subset_GL(GL, Ra, ploidy=2)</div> <br> <div>pos = np.array([int(v.site.position) for v in variants])[biallelic] + 1</div> <div>names = [f"ind{i}" for i in range(ts.num_individuals)]</div> <div>simGL.GL_to_vcf(GL_sub, arc, ref, alt, pos, names, "output.vcf")</div> <div>```</div> <br> <div>## Documentation</div> <br> <div>Full documentation — installation, user guide, API reference, and theory — is</div> <div>available at **https://simgl.readthedocs.io**.</div> <br> <div>## Citation</div> <br> <div>If you use simGL in your work, please cite the relevant methodological papers</div> <div>listed in the [Citation page](https://simgl.readthedocs.io/en/latest/citation.html)</div> <div>of the documentation.</div> <br> <div>## License</div> <br> <div>[ISC](LICENSE)</div> </div>