Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Jansen, Peter, Tafjord, Oyvind, Radensky, Marissa, Siangliulue, Pao, Hope, Tom, Mishra, Bhavana Dalvi, Majumder, Bodhisattwa Prasad, Weld, Daniel S., Clark, Peter
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2503.22708
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909671211663360
author	Jansen, Peter Tafjord, Oyvind Radensky, Marissa Siangliulue, Pao Hope, Tom Mishra, Bhavana Dalvi Majumder, Bodhisattwa Prasad Weld, Daniel S. Clark, Peter
author_facet	Jansen, Peter Tafjord, Oyvind Radensky, Marissa Siangliulue, Pao Hope, Tom Mishra, Bhavana Dalvi Majumder, Bodhisattwa Prasad Weld, Daniel S. Clark, Peter
contents	Despite the surge of interest in autonomous scientific discovery (ASD) of software artifacts (e.g., improved ML algorithms), current ASD systems face two key limitations: (1) they largely explore variants of existing codebases or similarly constrained design spaces, and (2) they produce large volumes of research artifacts (such as automatically generated papers and code) that are typically evaluated using conference-style paper review with limited evaluation of code. In this work we introduce CodeScientist, a novel ASD system that frames ideation and experiment construction as a form of genetic search jointly over combinations of research articles and codeblocks defining common actions in a domain (like prompting a language model). We use this paradigm to conduct hundreds of automated experiments on machine-generated ideas broadly in the domain of agents and virtual environments, with the system returning 19 discoveries, 6 of which were judged as being both at least minimally sound and incrementally novel after a multi-faceted evaluation beyond that typically conducted in prior work, including external (conference-style) review, code review, and replication attempts. Moreover, the discoveries span new tasks, agents, metrics, and data, suggesting a qualitative shift from benchmark optimization to broader discoveries.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_22708
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation Jansen, Peter Tafjord, Oyvind Radensky, Marissa Siangliulue, Pao Hope, Tom Mishra, Bhavana Dalvi Majumder, Bodhisattwa Prasad Weld, Daniel S. Clark, Peter Artificial Intelligence Computation and Language Despite the surge of interest in autonomous scientific discovery (ASD) of software artifacts (e.g., improved ML algorithms), current ASD systems face two key limitations: (1) they largely explore variants of existing codebases or similarly constrained design spaces, and (2) they produce large volumes of research artifacts (such as automatically generated papers and code) that are typically evaluated using conference-style paper review with limited evaluation of code. In this work we introduce CodeScientist, a novel ASD system that frames ideation and experiment construction as a form of genetic search jointly over combinations of research articles and codeblocks defining common actions in a domain (like prompting a language model). We use this paradigm to conduct hundreds of automated experiments on machine-generated ideas broadly in the domain of agents and virtual environments, with the system returning 19 discoveries, 6 of which were judged as being both at least minimally sound and incrementally novel after a multi-faceted evaluation beyond that typically conducted in prior work, including external (conference-style) review, code review, and replication attempts. Moreover, the discoveries span new tasks, agents, metrics, and data, suggesting a qualitative shift from benchmark optimization to broader discoveries.
title	CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation
topic	Artificial Intelligence Computation and Language
url	https://arxiv.org/abs/2503.22708

Similar Items