Saved in:
Bibliographic Details
Main Authors: Wesslund, Dante, Stenström, Ville, Linde, Pontus, Holmberg, Alexander
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.11886
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918334267654144
author Wesslund, Dante
Stenström, Ville
Linde, Pontus
Holmberg, Alexander
author_facet Wesslund, Dante
Stenström, Ville
Linde, Pontus
Holmberg, Alexander
contents Corporate financial reports are a valuable source of structured knowledge for Knowledge Graph construction, but the lack of annotated ground truth in this domain makes evaluation difficult. We present a semi-automated pipeline for Subject-Predicate-Object triplet extraction that uses ontology-driven proxy metrics, specifically Ontology Conformance and Faithfulness, instead of ground-truth-based evaluation. We compare a static, manually engineered ontology against a fully automated, document-specific ontology induction approach across different LLMs and two corporate annual reports. The automatically induced ontology achieves 100% schema conformance in all configurations, eliminating the ontology drift observed with the manual approach. We also propose a hybrid verification strategy that combines regex matching with an LLM-as-a-judge check, reducing apparent subject hallucination rates from 65.2% to 1.6% by filtering false positives caused by coreference resolution. Finally, we identify a systematic asymmetry between subject and object hallucinations, which we attribute to passive constructions and omitted agents in financial prose.
format Preprint
id arxiv_https___arxiv_org_abs_2602_11886
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle LLM-based Triplet Extraction from Financial Reports
Wesslund, Dante
Stenström, Ville
Linde, Pontus
Holmberg, Alexander
Computation and Language
Corporate financial reports are a valuable source of structured knowledge for Knowledge Graph construction, but the lack of annotated ground truth in this domain makes evaluation difficult. We present a semi-automated pipeline for Subject-Predicate-Object triplet extraction that uses ontology-driven proxy metrics, specifically Ontology Conformance and Faithfulness, instead of ground-truth-based evaluation. We compare a static, manually engineered ontology against a fully automated, document-specific ontology induction approach across different LLMs and two corporate annual reports. The automatically induced ontology achieves 100% schema conformance in all configurations, eliminating the ontology drift observed with the manual approach. We also propose a hybrid verification strategy that combines regex matching with an LLM-as-a-judge check, reducing apparent subject hallucination rates from 65.2% to 1.6% by filtering false positives caused by coreference resolution. Finally, we identify a systematic asymmetry between subject and object hallucinations, which we attribute to passive constructions and omitted agents in financial prose.
title LLM-based Triplet Extraction from Financial Reports
topic Computation and Language
url https://arxiv.org/abs/2602.11886