MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	T, Balamurali B, Chen, Jer-Ming
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computation and Language Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2402.01751
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866913222042320896
author	T, Balamurali B Chen, Jer-Ming
author_facet	T, Balamurali B Chen, Jer-Ming
contents	Large language models (LLMs) find increasing applications in many fields. Here, three LLM chatbots (ChatGPT-3.5, ChatGPT-4 and Bard) are assessed - in their current form, as publicly available - for their ability to recognize Alzheimer's Dementia (AD) and Cognitively Normal (CN) individuals using textual input derived from spontaneous speech recordings. Zero-shot learning approach is used at two levels of independent queries, with the second query (chain-of-thought prompting) eliciting more detailed than the first. Each LLM chatbot's performance is evaluated on the prediction generated in terms of accuracy, sensitivity, specificity, precision and F1 score. LLM chatbots generated three-class outcome ("AD", "CN", or "Unsure"). When positively identifying AD, Bard produced highest true-positives (89% recall) and highest F1 score (71%), but tended to misidentify CN as AD, with high confidence (low "Unsure" rates); for positively identifying CN, GPT-4 resulted in the highest true-negatives at 56% and highest F1 score (62%), adopting a diplomatic stance (moderate "Unsure" rates). Overall, three LLM chatbots identify AD vs CN surpassing chance-levels but do not currently satisfy clinical application.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_01751
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Performance Assessment of ChatGPT vs Bard in Detecting Alzheimer's Dementia T, Balamurali B Chen, Jer-Ming Computation and Language Artificial Intelligence Large language models (LLMs) find increasing applications in many fields. Here, three LLM chatbots (ChatGPT-3.5, ChatGPT-4 and Bard) are assessed - in their current form, as publicly available - for their ability to recognize Alzheimer's Dementia (AD) and Cognitively Normal (CN) individuals using textual input derived from spontaneous speech recordings. Zero-shot learning approach is used at two levels of independent queries, with the second query (chain-of-thought prompting) eliciting more detailed than the first. Each LLM chatbot's performance is evaluated on the prediction generated in terms of accuracy, sensitivity, specificity, precision and F1 score. LLM chatbots generated three-class outcome ("AD", "CN", or "Unsure"). When positively identifying AD, Bard produced highest true-positives (89% recall) and highest F1 score (71%), but tended to misidentify CN as AD, with high confidence (low "Unsure" rates); for positively identifying CN, GPT-4 resulted in the highest true-negatives at 56% and highest F1 score (62%), adopting a diplomatic stance (moderate "Unsure" rates). Overall, three LLM chatbots identify AD vs CN surpassing chance-levels but do not currently satisfy clinical application.
title	Performance Assessment of ChatGPT vs Bard in Detecting Alzheimer's Dementia
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2402.01751

Documenti analoghi