Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Qu, Yunni, Vaduri, Bhargav, Jatoth, Karthikeya, Wellnitz, James, Dinh, Dzung, Veenbaas, Seth, Chapman, Jonathan, Tropsha, Alexander, Oliva, Junier
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2406.01825
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917359836463104
author	Qu, Yunni Vaduri, Bhargav Jatoth, Karthikeya Wellnitz, James Dinh, Dzung Veenbaas, Seth Chapman, Jonathan Tropsha, Alexander Oliva, Junier
author_facet	Qu, Yunni Vaduri, Bhargav Jatoth, Karthikeya Wellnitz, James Dinh, Dzung Veenbaas, Seth Chapman, Jonathan Tropsha, Alexander Oliva, Junier
contents	Machine learning (ML) models are increasingly deployed for virtual screening in drug discovery, where the goal is to identify novel, chemically diverse scaffolds while minimizing experimental costs. This creates a fundamental challenge: the most valuable discoveries lie in out-of-distribution (OOD) regions beyond the training data, yet ML models often degrade under distribution shift. Standard novelty-rejection strategies ensure reliability within the training domain but limit discovery by rejecting precisely the novel scaffolds most worth finding. Moreover, experimental budgets permit testing only a small fraction of nominated candidates, demanding models that produce reliable confidence estimates. We introduce EXPLOR (Extrapolatory Pseudo-Label Matching for OOD Uncertainty-Based Rejection), a framework that addresses both challenges through extrapolatory pseudo-labeling on latent-space augmentations, requiring only a single labeled training set and no access to unlabeled test compounds, mirroring the realistic conditions of prospective screening campaigns. Through a multi-headed architecture with a novel per-head matching loss, EXPLOR learns to extrapolate to OOD chemical space while producing reliable confidence estimates, with particularly strong performance in high-confidence regions, which is critical for virtual screening where only top-ranked candidates advance to experimental validation. We demonstrate state-of-the-art performance across chemical and tabular benchmarks using different molecular embeddings.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_01825
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Reliable OOD Virtual Screening with Extrapolatory Pseudo-Label Matching Qu, Yunni Vaduri, Bhargav Jatoth, Karthikeya Wellnitz, James Dinh, Dzung Veenbaas, Seth Chapman, Jonathan Tropsha, Alexander Oliva, Junier Machine Learning Artificial Intelligence Machine learning (ML) models are increasingly deployed for virtual screening in drug discovery, where the goal is to identify novel, chemically diverse scaffolds while minimizing experimental costs. This creates a fundamental challenge: the most valuable discoveries lie in out-of-distribution (OOD) regions beyond the training data, yet ML models often degrade under distribution shift. Standard novelty-rejection strategies ensure reliability within the training domain but limit discovery by rejecting precisely the novel scaffolds most worth finding. Moreover, experimental budgets permit testing only a small fraction of nominated candidates, demanding models that produce reliable confidence estimates. We introduce EXPLOR (Extrapolatory Pseudo-Label Matching for OOD Uncertainty-Based Rejection), a framework that addresses both challenges through extrapolatory pseudo-labeling on latent-space augmentations, requiring only a single labeled training set and no access to unlabeled test compounds, mirroring the realistic conditions of prospective screening campaigns. Through a multi-headed architecture with a novel per-head matching loss, EXPLOR learns to extrapolate to OOD chemical space while producing reliable confidence estimates, with particularly strong performance in high-confidence regions, which is critical for virtual screening where only top-ranked candidates advance to experimental validation. We demonstrate state-of-the-art performance across chemical and tabular benchmarks using different molecular embeddings.
title	Reliable OOD Virtual Screening with Extrapolatory Pseudo-Label Matching
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2406.01825

Similar Items