Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Wainrib, Gilles, Bodinier, Barbara, Dakhli, Haitem, Monserrat, Josep, Perez, Almudena Espin, Carpentier, Sabrina, Codato, Roberta, Klein, John
Format:	Preprint
Publié:	2026
Sujets:	Machine Learning
Accès en ligne:	https://arxiv.org/abs/2603.26177
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866915894828990464
author	Wainrib, Gilles Bodinier, Barbara Dakhli, Haitem Monserrat, Josep Perez, Almudena Espin Carpentier, Sabrina Codato, Roberta Klein, John
author_facet	Wainrib, Gilles Bodinier, Barbara Dakhli, Haitem Monserrat, Josep Perez, Almudena Espin Carpentier, Sabrina Codato, Roberta Klein, John
contents	Recent work has questioned whether large language models (LLMs) can perform genuine in-context learning (ICL) for scientific experimental design, with prior studies suggesting that LLM-based agents exhibit no sensitivity to experimental feedback. We shed new light on this question by carrying out 800 independently replicated experiments on iterative perturbation discovery in Cell Painting high-content screening. We compare an LLM agent that iteratively updates its hypotheses using experimental feedback to a zero-shot baseline that relies solely on pretraining knowledge retrieval. Access to feedback yields a $+53.4\%$ increase in discoveries per feature on average ($p = 0.003$). To test whether this improvement arises from genuine feedback-driven learning rather than prompt-induced recall of pretraining knowledge, we introduce a random feedback control in which hit/miss labels are permuted. Under this control, the performance gain disappears, indicating that the observed improvement depends on the structure of the feedback signal ($+13.0$ hits, $p = 0.003$). We further examine how model capability affects feedback utilization. Upgrading from Claude Sonnet 4.5 to 4.6 reduces gene hallucination rates from ${\sim}33\%$--$45\%$ to ${\sim}3$--$9\%$, converting a non-significant ICL effect ($+0.8$, $p = 0.32$) into a large and highly significant improvement ($+11.0$, $p=0.003$) for the best ICL strategy. These results suggest that effective in-context learning from experimental feedback emerges only once models reach a sufficient capability threshold.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_26177
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Can AI Scientist Agents Learn from Lab-in-the-Loop Feedback? Evidence from Iterative Perturbation Discovery Wainrib, Gilles Bodinier, Barbara Dakhli, Haitem Monserrat, Josep Perez, Almudena Espin Carpentier, Sabrina Codato, Roberta Klein, John Machine Learning Recent work has questioned whether large language models (LLMs) can perform genuine in-context learning (ICL) for scientific experimental design, with prior studies suggesting that LLM-based agents exhibit no sensitivity to experimental feedback. We shed new light on this question by carrying out 800 independently replicated experiments on iterative perturbation discovery in Cell Painting high-content screening. We compare an LLM agent that iteratively updates its hypotheses using experimental feedback to a zero-shot baseline that relies solely on pretraining knowledge retrieval. Access to feedback yields a $+53.4\%$ increase in discoveries per feature on average ($p = 0.003$). To test whether this improvement arises from genuine feedback-driven learning rather than prompt-induced recall of pretraining knowledge, we introduce a random feedback control in which hit/miss labels are permuted. Under this control, the performance gain disappears, indicating that the observed improvement depends on the structure of the feedback signal ($+13.0$ hits, $p = 0.003$). We further examine how model capability affects feedback utilization. Upgrading from Claude Sonnet 4.5 to 4.6 reduces gene hallucination rates from ${\sim}33\%$--$45\%$ to ${\sim}3$--$9\%$, converting a non-significant ICL effect ($+0.8$, $p = 0.32$) into a large and highly significant improvement ($+11.0$, $p=0.003$) for the best ICL strategy. These results suggest that effective in-context learning from experimental feedback emerges only once models reach a sufficient capability threshold.
title	Can AI Scientist Agents Learn from Lab-in-the-Loop Feedback? Evidence from Iterative Perturbation Discovery
topic	Machine Learning
url	https://arxiv.org/abs/2603.26177

Documents similaires