Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.07889 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Table of Contents:
- The realization of autonomous scientific experimentation is currently limited by LLMs' struggle to grasp the strict procedural logic and accuracy required by biological protocols. To address this fundamental challenge, we present \textbf{BioProBench}, a comprehensive resource for procedural reasoning in biology. BioProBench is grounded in \textbf{BioProCorpus}, a foundational collection of 27,000 human-written protocols. From this corpus, we systematically constructed a dataset of over 550,000 task instances, offering both a large-scale training resource and a rigorous benchmark with novel metrics. Evaluating 10 mainstream LLMs, we find that while general comprehension is high, performance drops significantly on tasks demanding deep reasoning, quantitative precision, and safety awareness. To demonstrate the value of BioProCorpus in mitigating these issues, we developed \textbf{ProAgent}, grounded in our corpus, ProAgent substantially advances the state-of-the-art. BioProBench provides a rigorous diagnostic benchmark and a foundational resource for developing the next generation of reliable scientific AI. Code and data are available at: https://github.com/YuyangSunshine/bioprotocolbench and https://huggingface.co/datasets/BioProBench/BioProBench.