Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Agarwal, Shubham, Searle, Thomas, Ratas, Mart, Shek, Anthony, Teo, James, Dobson, Richard
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2408.17181
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909301431336960
author	Agarwal, Shubham Searle, Thomas Ratas, Mart Shek, Anthony Teo, James Dobson, Richard
author_facet	Agarwal, Shubham Searle, Thomas Ratas, Mart Shek, Anthony Teo, James Dobson, Richard
contents	Electronic Health Records are large repositories of valuable clinical data, with a significant portion stored in unstructured text format. This textual data includes clinical events (e.g., disorders, symptoms, findings, medications and procedures) in context that if extracted accurately at scale can unlock valuable downstream applications such as disease prediction. Using an existing Named Entity Recognition and Linking methodology, MedCAT, these identified concepts need to be further classified (contextualised) for their relevance to the patient, and their temporal and negated status for example, to be useful downstream. This study performs a comparative analysis of various natural language models for medical text classification. Extensive experimentation reveals the effectiveness of transformer-based language models, particularly BERT. When combined with class imbalance mitigation techniques, BERT outperforms Bi-LSTM models by up to 28% and the baseline BERT model by up to 16% for recall of the minority classes. The method has been implemented as part of CogStack/MedCAT framework and made available to the community for further research.
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_17181
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study Agarwal, Shubham Searle, Thomas Ratas, Mart Shek, Anthony Teo, James Dobson, Richard Computation and Language Electronic Health Records are large repositories of valuable clinical data, with a significant portion stored in unstructured text format. This textual data includes clinical events (e.g., disorders, symptoms, findings, medications and procedures) in context that if extracted accurately at scale can unlock valuable downstream applications such as disease prediction. Using an existing Named Entity Recognition and Linking methodology, MedCAT, these identified concepts need to be further classified (contextualised) for their relevance to the patient, and their temporal and negated status for example, to be useful downstream. This study performs a comparative analysis of various natural language models for medical text classification. Extensive experimentation reveals the effectiveness of transformer-based language models, particularly BERT. When combined with class imbalance mitigation techniques, BERT outperforms Bi-LSTM models by up to 28% and the baseline BERT model by up to 16% for recall of the minority classes. The method has been implemented as part of CogStack/MedCAT framework and made available to the community for further research.
title	Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study
topic	Computation and Language
url	https://arxiv.org/abs/2408.17181

Similar Items