Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Clemedtson, Alfred, Shi, Borun
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Computation and Language Information Retrieval
Online Access:	https://arxiv.org/abs/2504.05478
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915612410773504
author	Clemedtson, Alfred Shi, Borun
author_facet	Clemedtson, Alfred Shi, Borun
contents	Large language models have shown remarkable language processing and reasoning ability but are prone to hallucinate when asked about private data. Retrieval-augmented generation (RAG) retrieves relevant data that fit into an LLM's context window and prompts the LLM for an answer. GraphRAG extends this approach to structured Knowledge Graphs (KGs) and questions regarding entities multiple hops away. The majority of recent GraphRAG methods either overlook the retrieval step or have ad hoc retrieval processes that are abstract or inefficient. This prevents them from being adopted when the KGs are stored in graph databases supporting graph query languages. In this work, we present GraphRAFT, a retrieve-and-reason framework that finetunes LLMs to generate provably correct Cypher queries to retrieve high-quality subgraph contexts and produce accurate answers. Our method is the first such solution that can be taken off-the-shelf and used on KGs stored in native graph DBs. Benchmarks suggest that our method is sample-efficient and scales with the availability of training data. Our method achieves significantly better results than all state-of-the-art models across all four standard metrics on two challenging Q&As on large text-attributed KGs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_05478
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	GraphRAFT: Retrieval Augmented Fine-Tuning for Knowledge Graphs on Graph Databases Clemedtson, Alfred Shi, Borun Machine Learning Computation and Language Information Retrieval Large language models have shown remarkable language processing and reasoning ability but are prone to hallucinate when asked about private data. Retrieval-augmented generation (RAG) retrieves relevant data that fit into an LLM's context window and prompts the LLM for an answer. GraphRAG extends this approach to structured Knowledge Graphs (KGs) and questions regarding entities multiple hops away. The majority of recent GraphRAG methods either overlook the retrieval step or have ad hoc retrieval processes that are abstract or inefficient. This prevents them from being adopted when the KGs are stored in graph databases supporting graph query languages. In this work, we present GraphRAFT, a retrieve-and-reason framework that finetunes LLMs to generate provably correct Cypher queries to retrieve high-quality subgraph contexts and produce accurate answers. Our method is the first such solution that can be taken off-the-shelf and used on KGs stored in native graph DBs. Benchmarks suggest that our method is sample-efficient and scales with the availability of training data. Our method achieves significantly better results than all state-of-the-art models across all four standard metrics on two challenging Q&As on large text-attributed KGs.
title	GraphRAFT: Retrieval Augmented Fine-Tuning for Knowledge Graphs on Graph Databases
topic	Machine Learning Computation and Language Information Retrieval
url	https://arxiv.org/abs/2504.05478

Similar Items