Salvato in:
| Autori principali: | , , , , , , |
|---|---|
| Natura: | Preprint |
| Pubblicazione: |
2026
|
| Soggetti: | |
| Accesso online: | https://arxiv.org/abs/2605.10196 |
| Tags: |
Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
|
| _version_ | 1866916000200392704 |
|---|---|
| author | Rubbi, Andrea Merchant, Arpit Ogden, Samuel Akbarnejad, Amir Liò, Pietro Vakili, Sattar Lotfollahi, Mo |
| author_facet | Rubbi, Andrea Merchant, Arpit Ogden, Samuel Akbarnejad, Amir Liò, Pietro Vakili, Sattar Lotfollahi, Mo |
| contents | High-throughput gene perturbation experiments can test several genetic interventions in parallel, yet experimental budgets remain limited. A central goal is hit discovery: identifying as many perturbations as possible whose phenotypic effect exceeds a predefined threshold. Pure exploration strategies are statistically inefficient, wasting budget on low-value regions. Bayesian optimization methods offer a principled alternative but target a single global optimum, over-exploiting dominant modes while neglecting other high-value regions. We formalize hit discovery as a sequential experimental design problem and propose Probability-of-Hit, an acquisition function that directly targets threshold exceedance by ranking candidates according to their posterior probability of being a hit. We prove asymptotic optimality of this approach and demonstrate strong empirical performance on both synthetic benchmarks and real biological immunology datasets, including up to 6.4% improvement over baselines on the Schmidt IL-2 dataset. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2605_10196 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Many Needles in a Haystack: Active Hit Discovery for Perturbation Experiments Rubbi, Andrea Merchant, Arpit Ogden, Samuel Akbarnejad, Amir Liò, Pietro Vakili, Sattar Lotfollahi, Mo Machine Learning High-throughput gene perturbation experiments can test several genetic interventions in parallel, yet experimental budgets remain limited. A central goal is hit discovery: identifying as many perturbations as possible whose phenotypic effect exceeds a predefined threshold. Pure exploration strategies are statistically inefficient, wasting budget on low-value regions. Bayesian optimization methods offer a principled alternative but target a single global optimum, over-exploiting dominant modes while neglecting other high-value regions. We formalize hit discovery as a sequential experimental design problem and propose Probability-of-Hit, an acquisition function that directly targets threshold exceedance by ranking candidates according to their posterior probability of being a hit. We prove asymptotic optimality of this approach and demonstrate strong empirical performance on both synthetic benchmarks and real biological immunology datasets, including up to 6.4% improvement over baselines on the Schmidt IL-2 dataset. |
| title | Many Needles in a Haystack: Active Hit Discovery for Perturbation Experiments |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2605.10196 |