Saved in:
Bibliographic Details
Main Author: Basu, Sanjay
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.00076
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911474417401856
author Basu, Sanjay
author_facet Basu, Sanjay
contents Large language models (LLMs) are entering clinical workflows as decision support tools, yet how they respond to explicit patient value statements -- the core content of shared decision-making -- remains unmeasured. We conducted a factorial experiment using clinical vignettes derived from 98,759 de-identified Medicaid encounter notes. We tested four LLM families (GPT-5.2, Claude 4.5 Sonnet, Gemini 3 Pro, and DeepSeek-R1) across 13 value conditions in two clinical domains, yielding 104 trials. Default value orientations differed across model families (aggressiveness range 2.0 to 3.5 on a 1-to-5 scale). Value sensitivity indices ranged from 0.13 to 0.27, and directional concordance with patient-stated preferences ranged from 0.625 to 1.0. All models acknowledged patient values in 100% of non-control trials, yet actual recommendation shifting remained modest. Decision-matrix and VIM self-report mitigations each improved directional concordance by 0.125 in a 78-trial Phase 2 evaluation. These findings provide empirical data for populating value disclosure labels proposed by clinical AI governance frameworks.
format Preprint
id arxiv_https___arxiv_org_abs_2603_00076
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle The Value Sensitivity Gap: How Clinical Large Language Models Respond to Patient Preference Statements in Shared Decision-Making
Basu, Sanjay
Computers and Society
Artificial Intelligence
Machine Learning
92C50, 68T42
J.3; I.2.1
Large language models (LLMs) are entering clinical workflows as decision support tools, yet how they respond to explicit patient value statements -- the core content of shared decision-making -- remains unmeasured. We conducted a factorial experiment using clinical vignettes derived from 98,759 de-identified Medicaid encounter notes. We tested four LLM families (GPT-5.2, Claude 4.5 Sonnet, Gemini 3 Pro, and DeepSeek-R1) across 13 value conditions in two clinical domains, yielding 104 trials. Default value orientations differed across model families (aggressiveness range 2.0 to 3.5 on a 1-to-5 scale). Value sensitivity indices ranged from 0.13 to 0.27, and directional concordance with patient-stated preferences ranged from 0.625 to 1.0. All models acknowledged patient values in 100% of non-control trials, yet actual recommendation shifting remained modest. Decision-matrix and VIM self-report mitigations each improved directional concordance by 0.125 in a 78-trial Phase 2 evaluation. These findings provide empirical data for populating value disclosure labels proposed by clinical AI governance frameworks.
title The Value Sensitivity Gap: How Clinical Large Language Models Respond to Patient Preference Statements in Shared Decision-Making
topic Computers and Society
Artificial Intelligence
Machine Learning
92C50, 68T42
J.3; I.2.1
url https://arxiv.org/abs/2603.00076