Saved in:
Bibliographic Details
Main Authors: Tang, Bintao, Yang, Xin, Wang, Yuhao, Qiu, Zixuan, Ji, Zimo, Jiang, Wenyuan
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.21130
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908470234578944
author Tang, Bintao
Yang, Xin
Wang, Yuhao
Qiu, Zixuan
Ji, Zimo
Jiang, Wenyuan
author_facet Tang, Bintao
Yang, Xin
Wang, Yuhao
Qiu, Zixuan
Ji, Zimo
Jiang, Wenyuan
contents We present INTEGRALBENCH, a focused benchmark designed to evaluate Large Language Model (LLM) performance on definite integral problems. INTEGRALBENCH provides both symbolic and numerical ground truth solutions with manual difficulty annotations. Our evaluation of nine state-of-the-art LLMs reveals significant performance gaps and strong correlations between problem difficulty and model accuracy, establishing baseline metrics for this challenging domain. INTEGRALBENCH aims to advance automated mathematical reasoning by providing a rigorous evaluation framework specifically tailored for definite integral computation.
format Preprint
id arxiv_https___arxiv_org_abs_2507_21130
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle INTEGRALBENCH: Benchmarking LLMs with Definite Integral Problems
Tang, Bintao
Yang, Xin
Wang, Yuhao
Qiu, Zixuan
Ji, Zimo
Jiang, Wenyuan
Artificial Intelligence
We present INTEGRALBENCH, a focused benchmark designed to evaluate Large Language Model (LLM) performance on definite integral problems. INTEGRALBENCH provides both symbolic and numerical ground truth solutions with manual difficulty annotations. Our evaluation of nine state-of-the-art LLMs reveals significant performance gaps and strong correlations between problem difficulty and model accuracy, establishing baseline metrics for this challenging domain. INTEGRALBENCH aims to advance automated mathematical reasoning by providing a rigorous evaluation framework specifically tailored for definite integral computation.
title INTEGRALBENCH: Benchmarking LLMs with Definite Integral Problems
topic Artificial Intelligence
url https://arxiv.org/abs/2507.21130