Збережено в:
| Автор: | |
|---|---|
| Формат: | Recurso digital |
| Мова: | Англійська |
| Опубліковано: |
Zenodo
2026
|
| Предмети: | |
| Онлайн доступ: | https://doi.org/10.5281/zenodo.19946459 |
| Теги: |
Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
|
Зміст:
- <p>Trained model checkpoints, normalization statistics, fitted ensemble stacker,<br>and preprocessed graph cache supporting the HIV bioactivity prediction<br>preprint by Agarwal (2026).</p> <p>Contents:<br>- best_molformer_fold{0..4}.pth: Five MolFormer-XL checkpoints fine-tuned<br> on MoleculeNet HIV scaffold-CV folds. Each ~170 MB.<br>- best_gnn_fold{0..4}_v5_desc.pth: Five GATv2-based GNN ("v5b") checkpoints<br> trained from scratch on the same folds.<br>- global_feature_stats_v5_desc_fold{0..4}.pt: Per-fold means/stds for the<br> RDKit global descriptors (z-score normalization).<br>- ensemble_stacker.pt: Logistic stacker coefficients, three principled<br> decision thresholds (Youden's J / F1-max / base-rate), and raw out-of-<br> fold prediction arrays for n=24,391 molecules.<br>- hiv_preprocessed_cache_v5_desc.pt: 41,119 RDKit-parsed molecules as<br> PyTorch Geometric Data objects with atom features (23-dim), bond features<br> (8-dim), global descriptors, and Bemis-Murcko scaffolds. Reproduces the<br> exact deterministic 5-fold scaffold split used in training.</p> <p>These artifacts reproduce the headline test AUC of 0.806 ± 0.018 on the<br>MoleculeNet HIV scaffold-split benchmark. Source code is at<br>https://github.com/v659/HIV-drug-discovery.</p> <p>License: MIT (matches the source repository).</p>