MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Li, Mingchen, Huang, Jiatan, Yeung, Jeremy, Blaes, Anne, Johnson, Steven, Liu, Hongfang, Xu, Hua, Zhang, Rui
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computation and Language
Accesso online:	https://arxiv.org/abs/2406.10459
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866910900013760512
author	Li, Mingchen Huang, Jiatan Yeung, Jeremy Blaes, Anne Johnson, Steven Liu, Hongfang Xu, Hua Zhang, Rui
author_facet	Li, Mingchen Huang, Jiatan Yeung, Jeremy Blaes, Anne Johnson, Steven Liu, Hongfang Xu, Hua Zhang, Rui
contents	Medical Large Language Models (LLMs) have demonstrated impressive performance on a wide variety of medical NLP tasks; however, there still lacks a LLM specifically designed for phenotyping identification and diagnosis in cancer domain. Moreover, these LLMs typically have several billions of parameters, making them computationally expensive for healthcare systems. Thus, in this study, we propose CancerLLM, a model with 7 billion parameters and a Mistral-style architecture, pre-trained on nearly 2.7M clinical notes and over 515K pathology reports covering 17 cancer types, followed by fine-tuning on two cancer-relevant tasks, including cancer phenotypes extraction and cancer diagnosis generation. Our evaluation demonstrated that the CancerLLM achieves state-of-the-art results with F1 score of 91.78% on phenotyping extraction and 86.81% on disganois generation. It outperformed existing LLMs, with an average F1 score improvement of 9.23%. Additionally, the CancerLLM demonstrated its efficiency on time and GPU usage, and robustness comparing with other LLMs. We demonstrated that CancerLLM can potentially provide an effective and robust solution to advance clinical research and practice in cancer domain
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_10459
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	CancerLLM: A Large Language Model in Cancer Domain Li, Mingchen Huang, Jiatan Yeung, Jeremy Blaes, Anne Johnson, Steven Liu, Hongfang Xu, Hua Zhang, Rui Computation and Language Medical Large Language Models (LLMs) have demonstrated impressive performance on a wide variety of medical NLP tasks; however, there still lacks a LLM specifically designed for phenotyping identification and diagnosis in cancer domain. Moreover, these LLMs typically have several billions of parameters, making them computationally expensive for healthcare systems. Thus, in this study, we propose CancerLLM, a model with 7 billion parameters and a Mistral-style architecture, pre-trained on nearly 2.7M clinical notes and over 515K pathology reports covering 17 cancer types, followed by fine-tuning on two cancer-relevant tasks, including cancer phenotypes extraction and cancer diagnosis generation. Our evaluation demonstrated that the CancerLLM achieves state-of-the-art results with F1 score of 91.78% on phenotyping extraction and 86.81% on disganois generation. It outperformed existing LLMs, with an average F1 score improvement of 9.23%. Additionally, the CancerLLM demonstrated its efficiency on time and GPU usage, and robustness comparing with other LLMs. We demonstrated that CancerLLM can potentially provide an effective and robust solution to advance clinical research and practice in cancer domain
title	CancerLLM: A Large Language Model in Cancer Domain
topic	Computation and Language
url	https://arxiv.org/abs/2406.10459

Documenti analoghi