Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Maharana, Umakanta, Verma, Sarthak, Agarwal, Avarna, Mruthyunjaya, Prakashini, Mahapatra, Dwarikanath, Ahmed, Sakir, Mandal, Murari
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2504.06581
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912317535420416
author	Maharana, Umakanta Verma, Sarthak Agarwal, Avarna Mruthyunjaya, Prakashini Mahapatra, Dwarikanath Ahmed, Sakir Mandal, Murari
author_facet	Maharana, Umakanta Verma, Sarthak Agarwal, Avarna Mruthyunjaya, Prakashini Mahapatra, Dwarikanath Ahmed, Sakir Mandal, Murari
contents	Large language models (LLMs) offer a promising pre-screening tool, improving early disease detection and providing enhanced healthcare access for underprivileged communities. The early diagnosis of various diseases continues to be a significant challenge in healthcare, primarily due to the nonspecific nature of early symptoms, the shortage of expert medical practitioners, and the need for prolonged clinical evaluations, all of which can delay treatment and adversely affect patient outcomes. With impressive accuracy in prediction across a range of diseases, LLMs have the potential to revolutionize clinical pre-screening and decision-making for various medical conditions. In this work, we study the diagnostic capability of LLMs for Rheumatoid Arthritis (RA) with real world patients data. Patient data was collected alongside diagnoses from medical experts, and the performance of LLMs was evaluated in comparison to expert diagnoses for RA disease prediction. We notice an interesting pattern in disease diagnosis and find an unexpected \textit{misalignment between prediction and explanation}. We conduct a series of multi-round analyses using different LLM agents. The best-performing model accurately predicts rheumatoid arthritis (RA) diseases approximately 95\% of the time. However, when medical experts evaluated the reasoning generated by the model, they found that nearly 68\% of the reasoning was incorrect. This study highlights a clear misalignment between LLMs high prediction accuracy and its flawed reasoning, raising important questions about relying on LLM explanations in clinical settings. \textbf{LLMs provide incorrect reasoning to arrive at the correct answer for RA disease diagnosis.}
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_06581
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis Maharana, Umakanta Verma, Sarthak Agarwal, Avarna Mruthyunjaya, Prakashini Mahapatra, Dwarikanath Ahmed, Sakir Mandal, Murari Artificial Intelligence Large language models (LLMs) offer a promising pre-screening tool, improving early disease detection and providing enhanced healthcare access for underprivileged communities. The early diagnosis of various diseases continues to be a significant challenge in healthcare, primarily due to the nonspecific nature of early symptoms, the shortage of expert medical practitioners, and the need for prolonged clinical evaluations, all of which can delay treatment and adversely affect patient outcomes. With impressive accuracy in prediction across a range of diseases, LLMs have the potential to revolutionize clinical pre-screening and decision-making for various medical conditions. In this work, we study the diagnostic capability of LLMs for Rheumatoid Arthritis (RA) with real world patients data. Patient data was collected alongside diagnoses from medical experts, and the performance of LLMs was evaluated in comparison to expert diagnoses for RA disease prediction. We notice an interesting pattern in disease diagnosis and find an unexpected \textit{misalignment between prediction and explanation}. We conduct a series of multi-round analyses using different LLM agents. The best-performing model accurately predicts rheumatoid arthritis (RA) diseases approximately 95\% of the time. However, when medical experts evaluated the reasoning generated by the model, they found that nearly 68\% of the reasoning was incorrect. This study highlights a clear misalignment between LLMs high prediction accuracy and its flawed reasoning, raising important questions about relying on LLM explanations in clinical settings. \textbf{LLMs provide incorrect reasoning to arrive at the correct answer for RA disease diagnosis.}
title	Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis
topic	Artificial Intelligence
url	https://arxiv.org/abs/2504.06581

Similar Items