Збережено в:
Бібліографічні деталі
Автор: Agarwal, Arjun
Формат: Recurso digital
Мова:Англійська
Опубліковано: Zenodo 2026
Предмети:
Онлайн доступ:https://doi.org/10.5281/zenodo.19946459
Теги: Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
Зміст:
  • <p>Trained model checkpoints, normalization statistics, fitted ensemble stacker,<br>and preprocessed graph cache supporting the HIV bioactivity prediction<br>preprint by Agarwal (2026).</p> <p>Contents:<br>- best_molformer_fold{0..4}.pth: Five MolFormer-XL checkpoints fine-tuned<br>  on MoleculeNet HIV scaffold-CV folds. Each ~170 MB.<br>- best_gnn_fold{0..4}_v5_desc.pth: Five GATv2-based GNN ("v5b") checkpoints<br>  trained from scratch on the same folds.<br>- global_feature_stats_v5_desc_fold{0..4}.pt: Per-fold means/stds for the<br>  RDKit global descriptors (z-score normalization).<br>- ensemble_stacker.pt: Logistic stacker coefficients, three principled<br>  decision thresholds (Youden's J / F1-max / base-rate), and raw out-of-<br>  fold prediction arrays for n=24,391 molecules.<br>- hiv_preprocessed_cache_v5_desc.pt: 41,119 RDKit-parsed molecules as<br>  PyTorch Geometric Data objects with atom features (23-dim), bond features<br>  (8-dim), global descriptors, and Bemis-Murcko scaffolds. Reproduces the<br>  exact deterministic 5-fold scaffold split used in training.</p> <p>These artifacts reproduce the headline test AUC of 0.806 ± 0.018 on the<br>MoleculeNet HIV scaffold-split benchmark. Source code is at<br>https://github.com/v659/HIV-drug-discovery.</p> <p>License: MIT (matches the source repository).</p>