Saved in:
Bibliographic Details
Main Authors: Rubchinsky, Maxim, Rabinovich, Ella, Shraibman, Adi, Golan, Netanel, Sahar, Tali, Shweiki, Dorit
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.07373
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909249551990784
author Rubchinsky, Maxim
Rabinovich, Ella
Shraibman, Adi
Golan, Netanel
Sahar, Tali
Shweiki, Dorit
author_facet Rubchinsky, Maxim
Rabinovich, Ella
Shraibman, Adi
Golan, Netanel
Sahar, Tali
Shweiki, Dorit
contents We present a novel approach to automating the identification of risk factors for diseases from medical literature, leveraging pre-trained models in the bio-medical domain, while tuning them for the specific task. Faced with the challenges of the diverse and unstructured nature of medical articles, our study introduces a multi-step system to first identify relevant articles, then classify them based on the presence of risk factor discussions and, finally, extract specific risk factor information for a disease through a question-answering model. Our contributions include the development of a comprehensive pipeline for the automated extraction of risk factors and the compilation of several datasets, which can serve as valuable resources for further research in this area. These datasets encompass a wide range of diseases, as well as their associated risk factors, meticulously identified and validated through a fine-grained evaluation scheme. We conducted both automatic and thorough manual evaluation, demonstrating encouraging results. We also highlight the importance of improving models and expanding dataset comprehensiveness to keep pace with the rapidly evolving field of medical research.
format Preprint
id arxiv_https___arxiv_org_abs_2407_07373
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Automatic Extraction of Disease Risk Factors from Medical Publications
Rubchinsky, Maxim
Rabinovich, Ella
Shraibman, Adi
Golan, Netanel
Sahar, Tali
Shweiki, Dorit
Computation and Language
Machine Learning
We present a novel approach to automating the identification of risk factors for diseases from medical literature, leveraging pre-trained models in the bio-medical domain, while tuning them for the specific task. Faced with the challenges of the diverse and unstructured nature of medical articles, our study introduces a multi-step system to first identify relevant articles, then classify them based on the presence of risk factor discussions and, finally, extract specific risk factor information for a disease through a question-answering model. Our contributions include the development of a comprehensive pipeline for the automated extraction of risk factors and the compilation of several datasets, which can serve as valuable resources for further research in this area. These datasets encompass a wide range of diseases, as well as their associated risk factors, meticulously identified and validated through a fine-grained evaluation scheme. We conducted both automatic and thorough manual evaluation, demonstrating encouraging results. We also highlight the importance of improving models and expanding dataset comprehensiveness to keep pace with the rapidly evolving field of medical research.
title Automatic Extraction of Disease Risk Factors from Medical Publications
topic Computation and Language
Machine Learning
url https://arxiv.org/abs/2407.07373