Saved in:
Bibliographic Details
Main Authors: Robertson, Alex, Liang, Huizhi, Gani, Mahbub, Kumar, Rohit, Rajamohan, Srijith
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.19643
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918350490173440
author Robertson, Alex
Liang, Huizhi
Gani, Mahbub
Kumar, Rohit
Rajamohan, Srijith
author_facet Robertson, Alex
Liang, Huizhi
Gani, Mahbub
Kumar, Rohit
Rajamohan, Srijith
contents Large Language Models (LLMs) possess a remarkable capacity to generate persuasive and intelligible language. However, coherence does not equate to truthfulness, as the responses often contain subtle hallucinations. Existing benchmarks are limited by static and narrow questions, leading to limited coverage and misleading evaluations. We present KGHaluBench, a Knowledge Graph-based hallucination benchmark that assesses LLMs across the breadth and depth of their knowledge, providing a fairer and more comprehensive insight into LLM truthfulness. Our framework utilises the KG to dynamically construct challenging, multifaceted questions, whose difficulty is then statistically estimated to address popularity bias. Our automated verification pipeline detects abstentions and verifies the LLM's response at both conceptual and correctness levels to identify different types of hallucinations. We evaluate 25 frontier models, using novel accuracy and hallucination metrics. The results provide a more interpretable insight into the knowledge factors that cause hallucinations across different model sizes. KGHaluBench is publicly available to support future developments in hallucination mitigation.
format Preprint
id arxiv_https___arxiv_org_abs_2602_19643
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle KGHaluBench: A Knowledge Graph-Based Hallucination Benchmark for Evaluating the Breadth and Depth of LLM Knowledge
Robertson, Alex
Liang, Huizhi
Gani, Mahbub
Kumar, Rohit
Rajamohan, Srijith
Computation and Language
Large Language Models (LLMs) possess a remarkable capacity to generate persuasive and intelligible language. However, coherence does not equate to truthfulness, as the responses often contain subtle hallucinations. Existing benchmarks are limited by static and narrow questions, leading to limited coverage and misleading evaluations. We present KGHaluBench, a Knowledge Graph-based hallucination benchmark that assesses LLMs across the breadth and depth of their knowledge, providing a fairer and more comprehensive insight into LLM truthfulness. Our framework utilises the KG to dynamically construct challenging, multifaceted questions, whose difficulty is then statistically estimated to address popularity bias. Our automated verification pipeline detects abstentions and verifies the LLM's response at both conceptual and correctness levels to identify different types of hallucinations. We evaluate 25 frontier models, using novel accuracy and hallucination metrics. The results provide a more interpretable insight into the knowledge factors that cause hallucinations across different model sizes. KGHaluBench is publicly available to support future developments in hallucination mitigation.
title KGHaluBench: A Knowledge Graph-Based Hallucination Benchmark for Evaluating the Breadth and Depth of LLM Knowledge
topic Computation and Language
url https://arxiv.org/abs/2602.19643