Saved in:
Bibliographic Details
Main Authors: Reichenpfader, Daniel, Knupp, Jonas, Sander, André, Denecke, Kerstin
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2406.15465
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929395247087616
author Reichenpfader, Daniel
Knupp, Jonas
Sander, André
Denecke, Kerstin
author_facet Reichenpfader, Daniel
Knupp, Jonas
Sander, André
Denecke, Kerstin
contents Annually and globally, over three billion radiography examinations and computer tomography scans result in mostly unstructured radiology reports containing free text. Despite the potential benefits of structured reporting, its adoption is limited by factors such as established processes, resource constraints and potential loss of information. However, structured information would be necessary for various use cases, including automatic analysis, clinical trial matching, and prediction of health outcomes. This study introduces RadEx, an end-to-end framework comprising 15 software components and ten artifacts to develop systems that perform automated information extraction from radiology reports. It covers the complete process from annotating training data to extracting information by offering a consistent generic information model and setting boundaries for model development. Specifically, RadEx allows clinicians to define relevant information for clinical domains (e.g., mammography) and to create report templates. The framework supports both generative and encoder-only models and the decoupling of information extraction from template filling enables independent model improvements. Developing information extraction systems according to the RadEx framework facilitates implementation and maintenance as components are easily exchangeable, while standardized artifacts ensure interoperability between components.
format Preprint
id arxiv_https___arxiv_org_abs_2406_15465
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle RadEx: A Framework for Structured Information Extraction from Radiology Reports based on Large Language Models
Reichenpfader, Daniel
Knupp, Jonas
Sander, André
Denecke, Kerstin
Computation and Language
Artificial Intelligence
J.3
Annually and globally, over three billion radiography examinations and computer tomography scans result in mostly unstructured radiology reports containing free text. Despite the potential benefits of structured reporting, its adoption is limited by factors such as established processes, resource constraints and potential loss of information. However, structured information would be necessary for various use cases, including automatic analysis, clinical trial matching, and prediction of health outcomes. This study introduces RadEx, an end-to-end framework comprising 15 software components and ten artifacts to develop systems that perform automated information extraction from radiology reports. It covers the complete process from annotating training data to extracting information by offering a consistent generic information model and setting boundaries for model development. Specifically, RadEx allows clinicians to define relevant information for clinical domains (e.g., mammography) and to create report templates. The framework supports both generative and encoder-only models and the decoupling of information extraction from template filling enables independent model improvements. Developing information extraction systems according to the RadEx framework facilitates implementation and maintenance as components are easily exchangeable, while standardized artifacts ensure interoperability between components.
title RadEx: A Framework for Structured Information Extraction from Radiology Reports based on Large Language Models
topic Computation and Language
Artificial Intelligence
J.3
url https://arxiv.org/abs/2406.15465