Saved in:
Bibliographic Details
Main Authors: Liu, Yuyang, Lv, Liuzhenghao, Zhang, Xiancheng, Yuan, Jingya Wang Li, Tian, Yonghong
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2505.07889
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914270101372928
author Liu, Yuyang
Lv, Liuzhenghao
Zhang, Xiancheng
Yuan, Jingya Wang Li
Tian, Yonghong
author_facet Liu, Yuyang
Lv, Liuzhenghao
Zhang, Xiancheng
Yuan, Jingya Wang Li
Tian, Yonghong
contents The realization of autonomous scientific experimentation is currently limited by LLMs' struggle to grasp the strict procedural logic and accuracy required by biological protocols. To address this fundamental challenge, we present \textbf{BioProBench}, a comprehensive resource for procedural reasoning in biology. BioProBench is grounded in \textbf{BioProCorpus}, a foundational collection of 27,000 human-written protocols. From this corpus, we systematically constructed a dataset of over 550,000 task instances, offering both a large-scale training resource and a rigorous benchmark with novel metrics. Evaluating 10 mainstream LLMs, we find that while general comprehension is high, performance drops significantly on tasks demanding deep reasoning, quantitative precision, and safety awareness. To demonstrate the value of BioProCorpus in mitigating these issues, we developed \textbf{ProAgent}, grounded in our corpus, ProAgent substantially advances the state-of-the-art. BioProBench provides a rigorous diagnostic benchmark and a foundational resource for developing the next generation of reliable scientific AI. Code and data are available at: https://github.com/YuyangSunshine/bioprotocolbench and https://huggingface.co/datasets/BioProBench/BioProBench.
format Preprint
id arxiv_https___arxiv_org_abs_2505_07889
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle BioProBench: Comprehensive Dataset and Benchmark in Biological Protocol Understanding and Reasoning
Liu, Yuyang
Lv, Liuzhenghao
Zhang, Xiancheng
Yuan, Jingya Wang Li
Tian, Yonghong
Computation and Language
The realization of autonomous scientific experimentation is currently limited by LLMs' struggle to grasp the strict procedural logic and accuracy required by biological protocols. To address this fundamental challenge, we present \textbf{BioProBench}, a comprehensive resource for procedural reasoning in biology. BioProBench is grounded in \textbf{BioProCorpus}, a foundational collection of 27,000 human-written protocols. From this corpus, we systematically constructed a dataset of over 550,000 task instances, offering both a large-scale training resource and a rigorous benchmark with novel metrics. Evaluating 10 mainstream LLMs, we find that while general comprehension is high, performance drops significantly on tasks demanding deep reasoning, quantitative precision, and safety awareness. To demonstrate the value of BioProCorpus in mitigating these issues, we developed \textbf{ProAgent}, grounded in our corpus, ProAgent substantially advances the state-of-the-art. BioProBench provides a rigorous diagnostic benchmark and a foundational resource for developing the next generation of reliable scientific AI. Code and data are available at: https://github.com/YuyangSunshine/bioprotocolbench and https://huggingface.co/datasets/BioProBench/BioProBench.
title BioProBench: Comprehensive Dataset and Benchmark in Biological Protocol Understanding and Reasoning
topic Computation and Language
url https://arxiv.org/abs/2505.07889