MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Rubbi, Andrea, Merchant, Arpit, Ogden, Samuel, Akbarnejad, Amir, Liò, Pietro, Vakili, Sattar, Lotfollahi, Mo
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Machine Learning
Accesso online:	https://arxiv.org/abs/2605.10196
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866916000200392704
author	Rubbi, Andrea Merchant, Arpit Ogden, Samuel Akbarnejad, Amir Liò, Pietro Vakili, Sattar Lotfollahi, Mo
author_facet	Rubbi, Andrea Merchant, Arpit Ogden, Samuel Akbarnejad, Amir Liò, Pietro Vakili, Sattar Lotfollahi, Mo
contents	High-throughput gene perturbation experiments can test several genetic interventions in parallel, yet experimental budgets remain limited. A central goal is hit discovery: identifying as many perturbations as possible whose phenotypic effect exceeds a predefined threshold. Pure exploration strategies are statistically inefficient, wasting budget on low-value regions. Bayesian optimization methods offer a principled alternative but target a single global optimum, over-exploiting dominant modes while neglecting other high-value regions. We formalize hit discovery as a sequential experimental design problem and propose Probability-of-Hit, an acquisition function that directly targets threshold exceedance by ranking candidates according to their posterior probability of being a hit. We prove asymptotic optimality of this approach and demonstrate strong empirical performance on both synthetic benchmarks and real biological immunology datasets, including up to 6.4% improvement over baselines on the Schmidt IL-2 dataset.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_10196
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Many Needles in a Haystack: Active Hit Discovery for Perturbation Experiments Rubbi, Andrea Merchant, Arpit Ogden, Samuel Akbarnejad, Amir Liò, Pietro Vakili, Sattar Lotfollahi, Mo Machine Learning High-throughput gene perturbation experiments can test several genetic interventions in parallel, yet experimental budgets remain limited. A central goal is hit discovery: identifying as many perturbations as possible whose phenotypic effect exceeds a predefined threshold. Pure exploration strategies are statistically inefficient, wasting budget on low-value regions. Bayesian optimization methods offer a principled alternative but target a single global optimum, over-exploiting dominant modes while neglecting other high-value regions. We formalize hit discovery as a sequential experimental design problem and propose Probability-of-Hit, an acquisition function that directly targets threshold exceedance by ranking candidates according to their posterior probability of being a hit. We prove asymptotic optimality of this approach and demonstrate strong empirical performance on both synthetic benchmarks and real biological immunology datasets, including up to 6.4% improvement over baselines on the Schmidt IL-2 dataset.
title	Many Needles in a Haystack: Active Hit Discovery for Perturbation Experiments
topic	Machine Learning
url	https://arxiv.org/abs/2605.10196

Documenti analoghi