Saved in:
Bibliographic Details
Main Authors: Choi, Sanghyeok, Jeon, Woosang, Yang, Kyuseok, Kim, Taehyeong
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.10003
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914501324963840
author Choi, Sanghyeok
Jeon, Woosang
Yang, Kyuseok
Kim, Taehyeong
author_facet Choi, Sanghyeok
Jeon, Woosang
Yang, Kyuseok
Kim, Taehyeong
contents Constructing Knowledge Graphs (KGs) from unstructured text provides a structured framework for knowledge representation and reasoning, yet current LLM-based approaches struggle with a fundamental trade-off: factual coverage often leads to relational fragmentation, while premature consolidation causes information loss. To address this, we propose SocraticKG, an automated KG construction method that introduces question-answer pairs as a structured intermediate representation to systematically unfold document-level semantics prior to triple extraction. By employing 5W1H-guided QA expansion, SocraticKG captures contextual dependencies and implicit relational links typically lost in direct KG extraction pipelines, providing explicit grounding in the source document that helps mitigate implicit reasoning errors. Evaluation on the MINE benchmark and HotpotQA downstream task demonstrates that our approach effectively addresses the coverage-connectivity trade-off, achieving superior factual retention and structural cohesion while supporting complex multi-hop reasoning.
format Preprint
id arxiv_https___arxiv_org_abs_2601_10003
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle SocraticKG: Knowledge Graph Construction via QA-Driven Fact Extraction
Choi, Sanghyeok
Jeon, Woosang
Yang, Kyuseok
Kim, Taehyeong
Computation and Language
Constructing Knowledge Graphs (KGs) from unstructured text provides a structured framework for knowledge representation and reasoning, yet current LLM-based approaches struggle with a fundamental trade-off: factual coverage often leads to relational fragmentation, while premature consolidation causes information loss. To address this, we propose SocraticKG, an automated KG construction method that introduces question-answer pairs as a structured intermediate representation to systematically unfold document-level semantics prior to triple extraction. By employing 5W1H-guided QA expansion, SocraticKG captures contextual dependencies and implicit relational links typically lost in direct KG extraction pipelines, providing explicit grounding in the source document that helps mitigate implicit reasoning errors. Evaluation on the MINE benchmark and HotpotQA downstream task demonstrates that our approach effectively addresses the coverage-connectivity trade-off, achieving superior factual retention and structural cohesion while supporting complex multi-hop reasoning.
title SocraticKG: Knowledge Graph Construction via QA-Driven Fact Extraction
topic Computation and Language
url https://arxiv.org/abs/2601.10003