Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wesslund, Dante, Stenström, Ville, Linde, Pontus, Holmberg, Alexander
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2602.11886
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918334267654144
author	Wesslund, Dante Stenström, Ville Linde, Pontus Holmberg, Alexander
author_facet	Wesslund, Dante Stenström, Ville Linde, Pontus Holmberg, Alexander
contents	Corporate financial reports are a valuable source of structured knowledge for Knowledge Graph construction, but the lack of annotated ground truth in this domain makes evaluation difficult. We present a semi-automated pipeline for Subject-Predicate-Object triplet extraction that uses ontology-driven proxy metrics, specifically Ontology Conformance and Faithfulness, instead of ground-truth-based evaluation. We compare a static, manually engineered ontology against a fully automated, document-specific ontology induction approach across different LLMs and two corporate annual reports. The automatically induced ontology achieves 100% schema conformance in all configurations, eliminating the ontology drift observed with the manual approach. We also propose a hybrid verification strategy that combines regex matching with an LLM-as-a-judge check, reducing apparent subject hallucination rates from 65.2% to 1.6% by filtering false positives caused by coreference resolution. Finally, we identify a systematic asymmetry between subject and object hallucinations, which we attribute to passive constructions and omitted agents in financial prose.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_11886
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	LLM-based Triplet Extraction from Financial Reports Wesslund, Dante Stenström, Ville Linde, Pontus Holmberg, Alexander Computation and Language Corporate financial reports are a valuable source of structured knowledge for Knowledge Graph construction, but the lack of annotated ground truth in this domain makes evaluation difficult. We present a semi-automated pipeline for Subject-Predicate-Object triplet extraction that uses ontology-driven proxy metrics, specifically Ontology Conformance and Faithfulness, instead of ground-truth-based evaluation. We compare a static, manually engineered ontology against a fully automated, document-specific ontology induction approach across different LLMs and two corporate annual reports. The automatically induced ontology achieves 100% schema conformance in all configurations, eliminating the ontology drift observed with the manual approach. We also propose a hybrid verification strategy that combines regex matching with an LLM-as-a-judge check, reducing apparent subject hallucination rates from 65.2% to 1.6% by filtering false positives caused by coreference resolution. Finally, we identify a systematic asymmetry between subject and object hallucinations, which we attribute to passive constructions and omitted agents in financial prose.
title	LLM-based Triplet Extraction from Financial Reports
topic	Computation and Language
url	https://arxiv.org/abs/2602.11886

Similar Items