Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Tang, Bintao, Yang, Xin, Wang, Yuhao, Qiu, Zixuan, Ji, Zimo, Jiang, Wenyuan
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2507.21130
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908470234578944
author	Tang, Bintao Yang, Xin Wang, Yuhao Qiu, Zixuan Ji, Zimo Jiang, Wenyuan
author_facet	Tang, Bintao Yang, Xin Wang, Yuhao Qiu, Zixuan Ji, Zimo Jiang, Wenyuan
contents	We present INTEGRALBENCH, a focused benchmark designed to evaluate Large Language Model (LLM) performance on definite integral problems. INTEGRALBENCH provides both symbolic and numerical ground truth solutions with manual difficulty annotations. Our evaluation of nine state-of-the-art LLMs reveals significant performance gaps and strong correlations between problem difficulty and model accuracy, establishing baseline metrics for this challenging domain. INTEGRALBENCH aims to advance automated mathematical reasoning by providing a rigorous evaluation framework specifically tailored for definite integral computation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_21130
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	INTEGRALBENCH: Benchmarking LLMs with Definite Integral Problems Tang, Bintao Yang, Xin Wang, Yuhao Qiu, Zixuan Ji, Zimo Jiang, Wenyuan Artificial Intelligence We present INTEGRALBENCH, a focused benchmark designed to evaluate Large Language Model (LLM) performance on definite integral problems. INTEGRALBENCH provides both symbolic and numerical ground truth solutions with manual difficulty annotations. Our evaluation of nine state-of-the-art LLMs reveals significant performance gaps and strong correlations between problem difficulty and model accuracy, establishing baseline metrics for this challenging domain. INTEGRALBENCH aims to advance automated mathematical reasoning by providing a rigorous evaluation framework specifically tailored for definite integral computation.
title	INTEGRALBENCH: Benchmarking LLMs with Definite Integral Problems
topic	Artificial Intelligence
url	https://arxiv.org/abs/2507.21130

Similar Items