Saved in:
Bibliographic Details
Main Authors: Soumma, Shovito Barua, Shahriar, Fahim, Mahi, Umme Niraj, Abrar, Md Hasin, Fahad, Md Abdur Rahman, Hoque, Abu Sayed Md. Latiful
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.16674
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915270918930432
author Soumma, Shovito Barua
Shahriar, Fahim
Mahi, Umme Niraj
Abrar, Md Hasin
Fahad, Md Abdur Rahman
Hoque, Abu Sayed Md. Latiful
author_facet Soumma, Shovito Barua
Shahriar, Fahim
Mahi, Umme Niraj
Abrar, Md Hasin
Fahad, Md Abdur Rahman
Hoque, Abu Sayed Md. Latiful
contents Centralized electronic health record repositories are critical for advancing disease surveillance, public health research, and evidence-based policymaking. However, developing countries face persistent challenges in achieving this due to fragmented healthcare data sources, inconsistent record-keeping practices, and the absence of standardized patient identifiers, limiting reliable record linkage, compromise data interoperability, and limit scalability-obstacles exacerbated by infrastructural constraints and privacy concerns. To address these barriers, this study proposes a scalable, privacy-preserving clinical data warehouse, NCDW, designed for heterogeneous EHR integration in resource-limited settings and tested with 1.16 million clinical records. The framework incorporates a wrapper-based data acquisition layer for secure, automated ingestion of multisource health data and introduces a soundex algorithm to resolve patient identity mismatches in the absence of unique IDs. A modular data mart is designed for disease-specific analytics, demonstrated through a dengue fever case study in Bangladesh, integrating clinical, demographic, and environmental data for outbreak prediction and resource planning. Quantitative assessment of the data mart underscores its utility in strengthening national decision-support systems, highlighting the model's adaptability for infectious disease management. Comparative evaluation of database technologies reveals NoSQL outperforms relational SQL by 40-69% in complex query processing, while system load estimates validate the architecture's capacity to manage 19 million daily records (34TB over 5 years). The framework can be adapted to various healthcare settings across developing nations by modifying the ingestion layer to accommodate standards like ICD-11 and HL7 FHIR, facilitating interoperability for managing infectious diseases (i.e., COVID, tuberculosis).
format Preprint
id arxiv_https___arxiv_org_abs_2502_16674
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Design and Implementation of a Scalable Clinical Data Warehouse for Resource-Constrained Healthcare Systems
Soumma, Shovito Barua
Shahriar, Fahim
Mahi, Umme Niraj
Abrar, Md Hasin
Fahad, Md Abdur Rahman
Hoque, Abu Sayed Md. Latiful
Information Retrieval
Centralized electronic health record repositories are critical for advancing disease surveillance, public health research, and evidence-based policymaking. However, developing countries face persistent challenges in achieving this due to fragmented healthcare data sources, inconsistent record-keeping practices, and the absence of standardized patient identifiers, limiting reliable record linkage, compromise data interoperability, and limit scalability-obstacles exacerbated by infrastructural constraints and privacy concerns. To address these barriers, this study proposes a scalable, privacy-preserving clinical data warehouse, NCDW, designed for heterogeneous EHR integration in resource-limited settings and tested with 1.16 million clinical records. The framework incorporates a wrapper-based data acquisition layer for secure, automated ingestion of multisource health data and introduces a soundex algorithm to resolve patient identity mismatches in the absence of unique IDs. A modular data mart is designed for disease-specific analytics, demonstrated through a dengue fever case study in Bangladesh, integrating clinical, demographic, and environmental data for outbreak prediction and resource planning. Quantitative assessment of the data mart underscores its utility in strengthening national decision-support systems, highlighting the model's adaptability for infectious disease management. Comparative evaluation of database technologies reveals NoSQL outperforms relational SQL by 40-69% in complex query processing, while system load estimates validate the architecture's capacity to manage 19 million daily records (34TB over 5 years). The framework can be adapted to various healthcare settings across developing nations by modifying the ingestion layer to accommodate standards like ICD-11 and HL7 FHIR, facilitating interoperability for managing infectious diseases (i.e., COVID, tuberculosis).
title Design and Implementation of a Scalable Clinical Data Warehouse for Resource-Constrained Healthcare Systems
topic Information Retrieval
url https://arxiv.org/abs/2502.16674