Saved in:
Bibliographic Details
Main Authors: Gopalakrishnan, Seethalakshmi, Garbayo, Luciana, Zadrozny, Wlodek
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.10020
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914869743190016
author Gopalakrishnan, Seethalakshmi
Garbayo, Luciana
Zadrozny, Wlodek
author_facet Gopalakrishnan, Seethalakshmi
Garbayo, Luciana
Zadrozny, Wlodek
contents This study explores the potential of natural language models, including large language models, to extract causal relations from medical texts, specifically from Clinical Practice Guidelines (CPGs). The outcomes causality extraction from Clinical Practice Guidelines for gestational diabetes are presented, marking a first in the field. We report on a set of experiments using variants of BERT (BioBERT, DistilBERT, and BERT) and using Large Language Models (LLMs), namely GPT-4 and LLAMA2. Our experiments show that BioBERT performed better than other models, including the Large Language Models, with an average F1-score of 0.72. GPT-4 and LLAMA2 results show similar performance but less consistency. We also release the code and an annotated a corpus of causal statements within the Clinical Practice Guidelines for gestational diabetes.
format Preprint
id arxiv_https___arxiv_org_abs_2407_10020
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Causality extraction from medical text using Large Language Models (LLMs)
Gopalakrishnan, Seethalakshmi
Garbayo, Luciana
Zadrozny, Wlodek
Computation and Language
Artificial Intelligence
Information Retrieval
This study explores the potential of natural language models, including large language models, to extract causal relations from medical texts, specifically from Clinical Practice Guidelines (CPGs). The outcomes causality extraction from Clinical Practice Guidelines for gestational diabetes are presented, marking a first in the field. We report on a set of experiments using variants of BERT (BioBERT, DistilBERT, and BERT) and using Large Language Models (LLMs), namely GPT-4 and LLAMA2. Our experiments show that BioBERT performed better than other models, including the Large Language Models, with an average F1-score of 0.72. GPT-4 and LLAMA2 results show similar performance but less consistency. We also release the code and an annotated a corpus of causal statements within the Clinical Practice Guidelines for gestational diabetes.
title Causality extraction from medical text using Large Language Models (LLMs)
topic Computation and Language
Artificial Intelligence
Information Retrieval
url https://arxiv.org/abs/2407.10020