Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Basu, Sanjay
Format:	Preprint
Published:	2026
Subjects:	Computers and Society Artificial Intelligence Machine Learning 92C50, 68T42 J.3; I.2.1
Online Access:	https://arxiv.org/abs/2603.00076
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911474417401856
author	Basu, Sanjay
author_facet	Basu, Sanjay
contents	Large language models (LLMs) are entering clinical workflows as decision support tools, yet how they respond to explicit patient value statements -- the core content of shared decision-making -- remains unmeasured. We conducted a factorial experiment using clinical vignettes derived from 98,759 de-identified Medicaid encounter notes. We tested four LLM families (GPT-5.2, Claude 4.5 Sonnet, Gemini 3 Pro, and DeepSeek-R1) across 13 value conditions in two clinical domains, yielding 104 trials. Default value orientations differed across model families (aggressiveness range 2.0 to 3.5 on a 1-to-5 scale). Value sensitivity indices ranged from 0.13 to 0.27, and directional concordance with patient-stated preferences ranged from 0.625 to 1.0. All models acknowledged patient values in 100% of non-control trials, yet actual recommendation shifting remained modest. Decision-matrix and VIM self-report mitigations each improved directional concordance by 0.125 in a 78-trial Phase 2 evaluation. These findings provide empirical data for populating value disclosure labels proposed by clinical AI governance frameworks.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_00076
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	The Value Sensitivity Gap: How Clinical Large Language Models Respond to Patient Preference Statements in Shared Decision-Making Basu, Sanjay Computers and Society Artificial Intelligence Machine Learning 92C50, 68T42 J.3; I.2.1 Large language models (LLMs) are entering clinical workflows as decision support tools, yet how they respond to explicit patient value statements -- the core content of shared decision-making -- remains unmeasured. We conducted a factorial experiment using clinical vignettes derived from 98,759 de-identified Medicaid encounter notes. We tested four LLM families (GPT-5.2, Claude 4.5 Sonnet, Gemini 3 Pro, and DeepSeek-R1) across 13 value conditions in two clinical domains, yielding 104 trials. Default value orientations differed across model families (aggressiveness range 2.0 to 3.5 on a 1-to-5 scale). Value sensitivity indices ranged from 0.13 to 0.27, and directional concordance with patient-stated preferences ranged from 0.625 to 1.0. All models acknowledged patient values in 100% of non-control trials, yet actual recommendation shifting remained modest. Decision-matrix and VIM self-report mitigations each improved directional concordance by 0.125 in a 78-trial Phase 2 evaluation. These findings provide empirical data for populating value disclosure labels proposed by clinical AI governance frameworks.
title	The Value Sensitivity Gap: How Clinical Large Language Models Respond to Patient Preference Statements in Shared Decision-Making
topic	Computers and Society Artificial Intelligence Machine Learning 92C50, 68T42 J.3; I.2.1
url	https://arxiv.org/abs/2603.00076

Similar Items