Salvato in:
Dettagli Bibliografici
Autori principali: Rubbi, Andrea, Merchant, Arpit, Ogden, Samuel, Akbarnejad, Amir, Liò, Pietro, Vakili, Sattar, Lotfollahi, Mo
Natura: Preprint
Pubblicazione: 2026
Soggetti:
Accesso online:https://arxiv.org/abs/2605.10196
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866916000200392704
author Rubbi, Andrea
Merchant, Arpit
Ogden, Samuel
Akbarnejad, Amir
Liò, Pietro
Vakili, Sattar
Lotfollahi, Mo
author_facet Rubbi, Andrea
Merchant, Arpit
Ogden, Samuel
Akbarnejad, Amir
Liò, Pietro
Vakili, Sattar
Lotfollahi, Mo
contents High-throughput gene perturbation experiments can test several genetic interventions in parallel, yet experimental budgets remain limited. A central goal is hit discovery: identifying as many perturbations as possible whose phenotypic effect exceeds a predefined threshold. Pure exploration strategies are statistically inefficient, wasting budget on low-value regions. Bayesian optimization methods offer a principled alternative but target a single global optimum, over-exploiting dominant modes while neglecting other high-value regions. We formalize hit discovery as a sequential experimental design problem and propose Probability-of-Hit, an acquisition function that directly targets threshold exceedance by ranking candidates according to their posterior probability of being a hit. We prove asymptotic optimality of this approach and demonstrate strong empirical performance on both synthetic benchmarks and real biological immunology datasets, including up to 6.4% improvement over baselines on the Schmidt IL-2 dataset.
format Preprint
id arxiv_https___arxiv_org_abs_2605_10196
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Many Needles in a Haystack: Active Hit Discovery for Perturbation Experiments
Rubbi, Andrea
Merchant, Arpit
Ogden, Samuel
Akbarnejad, Amir
Liò, Pietro
Vakili, Sattar
Lotfollahi, Mo
Machine Learning
High-throughput gene perturbation experiments can test several genetic interventions in parallel, yet experimental budgets remain limited. A central goal is hit discovery: identifying as many perturbations as possible whose phenotypic effect exceeds a predefined threshold. Pure exploration strategies are statistically inefficient, wasting budget on low-value regions. Bayesian optimization methods offer a principled alternative but target a single global optimum, over-exploiting dominant modes while neglecting other high-value regions. We formalize hit discovery as a sequential experimental design problem and propose Probability-of-Hit, an acquisition function that directly targets threshold exceedance by ranking candidates according to their posterior probability of being a hit. We prove asymptotic optimality of this approach and demonstrate strong empirical performance on both synthetic benchmarks and real biological immunology datasets, including up to 6.4% improvement over baselines on the Schmidt IL-2 dataset.
title Many Needles in a Haystack: Active Hit Discovery for Perturbation Experiments
topic Machine Learning
url https://arxiv.org/abs/2605.10196