Saved in:
Bibliographic Details
Main Authors: Wassell, Melinda, Butler-Henderson, Kerryn, Verspoor, Karin
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.00921
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917302739402752
author Wassell, Melinda
Butler-Henderson, Kerryn
Verspoor, Karin
author_facet Wassell, Melinda
Butler-Henderson, Kerryn
Verspoor, Karin
contents Data quality (DQ) and transparency of secondary data are critical factors that delay the adoption of clinical AI models and affect clinician trust in them. Many DQ studies fail to clarify where, along the lifecycle, quality checks occur, leading to uncertainty about provenance and fitness for reuse. This study develops a framework for transparent reporting of DQ assessments across the clinical electronic health record (EHR) data lifecycle. The reporting framework was developed through iterative analysis to identify actors and phases of the clinical data lifecycle. The framework distinguishes between data-generating organizations and data-receiving organizations to allow users to map DQ parameters to stages across the data lifecycle. The framework defines 5 key lifecycle phases and multiple actors. When applied to the real-world dataset, the framework demonstrated applicability in revealing where DQ issues may originate. The framework provides a structured approach for reporting DQ assessments, which can enhance transparency regarding data fitness for reuse, supporting reliable clinical research, AI model development, and internal organisational governance. This work provides practical guidance for researchers to understand data provenance and for organisations to target DQ improvement efforts across the data lifecycle.
format Preprint
id arxiv_https___arxiv_org_abs_2603_00921
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle A Framework for Transparent Reporting of Data Quality Analysis Across the Clinical Electronic Health Record Data Lifecycle
Wassell, Melinda
Butler-Henderson, Kerryn
Verspoor, Karin
Databases
Computers and Society
Data quality (DQ) and transparency of secondary data are critical factors that delay the adoption of clinical AI models and affect clinician trust in them. Many DQ studies fail to clarify where, along the lifecycle, quality checks occur, leading to uncertainty about provenance and fitness for reuse. This study develops a framework for transparent reporting of DQ assessments across the clinical electronic health record (EHR) data lifecycle. The reporting framework was developed through iterative analysis to identify actors and phases of the clinical data lifecycle. The framework distinguishes between data-generating organizations and data-receiving organizations to allow users to map DQ parameters to stages across the data lifecycle. The framework defines 5 key lifecycle phases and multiple actors. When applied to the real-world dataset, the framework demonstrated applicability in revealing where DQ issues may originate. The framework provides a structured approach for reporting DQ assessments, which can enhance transparency regarding data fitness for reuse, supporting reliable clinical research, AI model development, and internal organisational governance. This work provides practical guidance for researchers to understand data provenance and for organisations to target DQ improvement efforts across the data lifecycle.
title A Framework for Transparent Reporting of Data Quality Analysis Across the Clinical Electronic Health Record Data Lifecycle
topic Databases
Computers and Society
url https://arxiv.org/abs/2603.00921