Saved in:
Bibliographic Details
Main Authors: Loreti, Andrea, Chen, Kesi, George, Ruby, Firth, Robert, Agnello, Adriano, Tanaka, Shinnosuke
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.07738
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • In this document, we discuss a multi-step approach to automated construction of a knowledge graph, for structuring and representing domain-specific knowledge from large document corpora. We apply our method to build the first knowledge graph of nuclear fusion energy, a highly specialized field characterized by vast scope and heterogeneity. This is an ideal benchmark to test the key features of our pipeline, including automatic named entity recognition and entity resolution. We show how pre-trained large language models can be used to address these challenges and we evaluate their performance against Zipf's law, which characterizes human natural language. Additionally, we develop a knowledge-graph retrieval-augmented generation system that uses multiple prompts with large language models to provide contextually relevant answers to natural-language queries, including complex multi-hop questions requiring reasoning across interconnected entities.