Saved in:
Bibliographic Details
Main Authors: Maharana, Umakanta, Verma, Sarthak, Agarwal, Avarna, Mruthyunjaya, Prakashini, Mahapatra, Dwarikanath, Ahmed, Sakir, Mandal, Murari
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.06581
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912317535420416
author Maharana, Umakanta
Verma, Sarthak
Agarwal, Avarna
Mruthyunjaya, Prakashini
Mahapatra, Dwarikanath
Ahmed, Sakir
Mandal, Murari
author_facet Maharana, Umakanta
Verma, Sarthak
Agarwal, Avarna
Mruthyunjaya, Prakashini
Mahapatra, Dwarikanath
Ahmed, Sakir
Mandal, Murari
contents Large language models (LLMs) offer a promising pre-screening tool, improving early disease detection and providing enhanced healthcare access for underprivileged communities. The early diagnosis of various diseases continues to be a significant challenge in healthcare, primarily due to the nonspecific nature of early symptoms, the shortage of expert medical practitioners, and the need for prolonged clinical evaluations, all of which can delay treatment and adversely affect patient outcomes. With impressive accuracy in prediction across a range of diseases, LLMs have the potential to revolutionize clinical pre-screening and decision-making for various medical conditions. In this work, we study the diagnostic capability of LLMs for Rheumatoid Arthritis (RA) with real world patients data. Patient data was collected alongside diagnoses from medical experts, and the performance of LLMs was evaluated in comparison to expert diagnoses for RA disease prediction. We notice an interesting pattern in disease diagnosis and find an unexpected \textit{misalignment between prediction and explanation}. We conduct a series of multi-round analyses using different LLM agents. The best-performing model accurately predicts rheumatoid arthritis (RA) diseases approximately 95\% of the time. However, when medical experts evaluated the reasoning generated by the model, they found that nearly 68\% of the reasoning was incorrect. This study highlights a clear misalignment between LLMs high prediction accuracy and its flawed reasoning, raising important questions about relying on LLM explanations in clinical settings. \textbf{LLMs provide incorrect reasoning to arrive at the correct answer for RA disease diagnosis.}
format Preprint
id arxiv_https___arxiv_org_abs_2504_06581
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis
Maharana, Umakanta
Verma, Sarthak
Agarwal, Avarna
Mruthyunjaya, Prakashini
Mahapatra, Dwarikanath
Ahmed, Sakir
Mandal, Murari
Artificial Intelligence
Large language models (LLMs) offer a promising pre-screening tool, improving early disease detection and providing enhanced healthcare access for underprivileged communities. The early diagnosis of various diseases continues to be a significant challenge in healthcare, primarily due to the nonspecific nature of early symptoms, the shortage of expert medical practitioners, and the need for prolonged clinical evaluations, all of which can delay treatment and adversely affect patient outcomes. With impressive accuracy in prediction across a range of diseases, LLMs have the potential to revolutionize clinical pre-screening and decision-making for various medical conditions. In this work, we study the diagnostic capability of LLMs for Rheumatoid Arthritis (RA) with real world patients data. Patient data was collected alongside diagnoses from medical experts, and the performance of LLMs was evaluated in comparison to expert diagnoses for RA disease prediction. We notice an interesting pattern in disease diagnosis and find an unexpected \textit{misalignment between prediction and explanation}. We conduct a series of multi-round analyses using different LLM agents. The best-performing model accurately predicts rheumatoid arthritis (RA) diseases approximately 95\% of the time. However, when medical experts evaluated the reasoning generated by the model, they found that nearly 68\% of the reasoning was incorrect. This study highlights a clear misalignment between LLMs high prediction accuracy and its flawed reasoning, raising important questions about relying on LLM explanations in clinical settings. \textbf{LLMs provide incorrect reasoning to arrive at the correct answer for RA disease diagnosis.}
title Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis
topic Artificial Intelligence
url https://arxiv.org/abs/2504.06581