MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Yuan, Zike, Liu, Ming, Wang, Hui, Qin, Bing
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Artificial Intelligence Computation and Language
Accesso online:	https://arxiv.org/abs/2407.02936
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866910844570304512
author	Yuan, Zike Liu, Ming Wang, Hui Qin, Bing
author_facet	Yuan, Zike Liu, Ming Wang, Hui Qin, Bing
contents	Evaluating the graph comprehension and reasoning abilities of Large Language Models (LLMs) is challenging and often incomplete. Existing benchmarks focus primarily on pure graph understanding, lacking a comprehensive evaluation across all graph types and detailed capability definitions. This paper presents GraCoRe, a benchmark for systematically assessing LLMs' graph comprehension and reasoning. GraCoRe uses a three-tier hierarchical taxonomy to categorize and test models on pure graph and heterogeneous graphs, subdividing capabilities into 10 distinct areas tested through 19 tasks. Our benchmark includes 11 datasets with 5,140 graphs of varying complexity. We evaluate four closed-source and eight open-source LLMs, conducting thorough analyses from both ability and task perspectives. Key findings reveal that OpenAI o1 model has amazing comprehension and reasoning capabilities, semantic enrichment enhances reasoning performance, node ordering impacts task success, and the ability to process longer texts does not necessarily improve graph comprehension or reasoning.GraCoRe is open-sourced at https://github.com/ZIKEYUAN/GraCoRe
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_02936
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models Yuan, Zike Liu, Ming Wang, Hui Qin, Bing Artificial Intelligence Computation and Language Evaluating the graph comprehension and reasoning abilities of Large Language Models (LLMs) is challenging and often incomplete. Existing benchmarks focus primarily on pure graph understanding, lacking a comprehensive evaluation across all graph types and detailed capability definitions. This paper presents GraCoRe, a benchmark for systematically assessing LLMs' graph comprehension and reasoning. GraCoRe uses a three-tier hierarchical taxonomy to categorize and test models on pure graph and heterogeneous graphs, subdividing capabilities into 10 distinct areas tested through 19 tasks. Our benchmark includes 11 datasets with 5,140 graphs of varying complexity. We evaluate four closed-source and eight open-source LLMs, conducting thorough analyses from both ability and task perspectives. Key findings reveal that OpenAI o1 model has amazing comprehension and reasoning capabilities, semantic enrichment enhances reasoning performance, node ordering impacts task success, and the ability to process longer texts does not necessarily improve graph comprehension or reasoning.GraCoRe is open-sourced at https://github.com/ZIKEYUAN/GraCoRe
title	GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models
topic	Artificial Intelligence Computation and Language
url	https://arxiv.org/abs/2407.02936

Documenti analoghi