Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Yuyang, Lv, Liuzhenghao, Zhang, Xiancheng, Yuan, Jingya Wang Li, Tian, Yonghong
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2505.07889
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914270101372928
author	Liu, Yuyang Lv, Liuzhenghao Zhang, Xiancheng Yuan, Jingya Wang Li Tian, Yonghong
author_facet	Liu, Yuyang Lv, Liuzhenghao Zhang, Xiancheng Yuan, Jingya Wang Li Tian, Yonghong
contents	The realization of autonomous scientific experimentation is currently limited by LLMs' struggle to grasp the strict procedural logic and accuracy required by biological protocols. To address this fundamental challenge, we present \textbf{BioProBench}, a comprehensive resource for procedural reasoning in biology. BioProBench is grounded in \textbf{BioProCorpus}, a foundational collection of 27,000 human-written protocols. From this corpus, we systematically constructed a dataset of over 550,000 task instances, offering both a large-scale training resource and a rigorous benchmark with novel metrics. Evaluating 10 mainstream LLMs, we find that while general comprehension is high, performance drops significantly on tasks demanding deep reasoning, quantitative precision, and safety awareness. To demonstrate the value of BioProCorpus in mitigating these issues, we developed \textbf{ProAgent}, grounded in our corpus, ProAgent substantially advances the state-of-the-art. BioProBench provides a rigorous diagnostic benchmark and a foundational resource for developing the next generation of reliable scientific AI. Code and data are available at: https://github.com/YuyangSunshine/bioprotocolbench and https://huggingface.co/datasets/BioProBench/BioProBench.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_07889
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	BioProBench: Comprehensive Dataset and Benchmark in Biological Protocol Understanding and Reasoning Liu, Yuyang Lv, Liuzhenghao Zhang, Xiancheng Yuan, Jingya Wang Li Tian, Yonghong Computation and Language The realization of autonomous scientific experimentation is currently limited by LLMs' struggle to grasp the strict procedural logic and accuracy required by biological protocols. To address this fundamental challenge, we present \textbf{BioProBench}, a comprehensive resource for procedural reasoning in biology. BioProBench is grounded in \textbf{BioProCorpus}, a foundational collection of 27,000 human-written protocols. From this corpus, we systematically constructed a dataset of over 550,000 task instances, offering both a large-scale training resource and a rigorous benchmark with novel metrics. Evaluating 10 mainstream LLMs, we find that while general comprehension is high, performance drops significantly on tasks demanding deep reasoning, quantitative precision, and safety awareness. To demonstrate the value of BioProCorpus in mitigating these issues, we developed \textbf{ProAgent}, grounded in our corpus, ProAgent substantially advances the state-of-the-art. BioProBench provides a rigorous diagnostic benchmark and a foundational resource for developing the next generation of reliable scientific AI. Code and data are available at: https://github.com/YuyangSunshine/bioprotocolbench and https://huggingface.co/datasets/BioProBench/BioProBench.
title	BioProBench: Comprehensive Dataset and Benchmark in Biological Protocol Understanding and Reasoning
topic	Computation and Language
url	https://arxiv.org/abs/2505.07889

Similar Items