Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lorge, Isabelle, Joyce, Dan W., Taylor, Niall, Nevado-Holgado, Alejo, Cipriani, Andrea, Kormilitzin, Andrey
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2402.07645
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913231528787968
author	Lorge, Isabelle Joyce, Dan W. Taylor, Niall Nevado-Holgado, Alejo Cipriani, Andrea Kormilitzin, Andrey
author_facet	Lorge, Isabelle Joyce, Dan W. Taylor, Niall Nevado-Holgado, Alejo Cipriani, Andrea Kormilitzin, Andrey
contents	Difficult-to-treat depression (DTD) has been proposed as a broader and more clinically comprehensive perspective on a person's depressive disorder where despite treatment, they continue to experience significant burden. We sought to develop a Large Language Model (LLM)-based tool capable of interrogating routinely-collected, narrative (free-text) electronic health record (EHR) data to locate published prognostic factors that capture the clinical syndrome of DTD. In this work, we use LLM-generated synthetic data (GPT3.5) and a Non-Maximum Suppression (NMS) algorithm to train a BERT-based span extraction model. The resulting model is then able to extract and label spans related to a variety of relevant positive and negative factors in real clinical data (i.e. spans of text that increase or decrease the likelihood of a patient matching the DTD syndrome). We show it is possible to obtain good overall performance (0.70 F1 across polarity) on real clinical data on a set of as many as 20 different factors, and high performance (0.85 F1 with 0.95 precision) on a subset of important DTD factors such as history of abuse, family history of affective disorder, illness severity and suicidality by training the model exclusively on synthetic data. Our results show promise for future healthcare applications especially in applications where traditionally, highly confidential medical data and human-expert annotation would normally be required.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_07645
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Detecting the Clinical Features of Difficult-to-Treat Depression using Synthetic Data from Large Language Models Lorge, Isabelle Joyce, Dan W. Taylor, Niall Nevado-Holgado, Alejo Cipriani, Andrea Kormilitzin, Andrey Computation and Language Difficult-to-treat depression (DTD) has been proposed as a broader and more clinically comprehensive perspective on a person's depressive disorder where despite treatment, they continue to experience significant burden. We sought to develop a Large Language Model (LLM)-based tool capable of interrogating routinely-collected, narrative (free-text) electronic health record (EHR) data to locate published prognostic factors that capture the clinical syndrome of DTD. In this work, we use LLM-generated synthetic data (GPT3.5) and a Non-Maximum Suppression (NMS) algorithm to train a BERT-based span extraction model. The resulting model is then able to extract and label spans related to a variety of relevant positive and negative factors in real clinical data (i.e. spans of text that increase or decrease the likelihood of a patient matching the DTD syndrome). We show it is possible to obtain good overall performance (0.70 F1 across polarity) on real clinical data on a set of as many as 20 different factors, and high performance (0.85 F1 with 0.95 precision) on a subset of important DTD factors such as history of abuse, family history of affective disorder, illness severity and suicidality by training the model exclusively on synthetic data. Our results show promise for future healthcare applications especially in applications where traditionally, highly confidential medical data and human-expert annotation would normally be required.
title	Detecting the Clinical Features of Difficult-to-Treat Depression using Synthetic Data from Large Language Models
topic	Computation and Language
url	https://arxiv.org/abs/2402.07645

Similar Items