Saved in:
| Main Author: | |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.00076 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911474417401856 |
|---|---|
| author | Basu, Sanjay |
| author_facet | Basu, Sanjay |
| contents | Large language models (LLMs) are entering clinical workflows as decision support tools, yet how they respond to explicit patient value statements -- the core content of shared decision-making -- remains unmeasured. We conducted a factorial experiment using clinical vignettes derived from 98,759 de-identified Medicaid encounter notes. We tested four LLM families (GPT-5.2, Claude 4.5 Sonnet, Gemini 3 Pro, and DeepSeek-R1) across 13 value conditions in two clinical domains, yielding 104 trials. Default value orientations differed across model families (aggressiveness range 2.0 to 3.5 on a 1-to-5 scale). Value sensitivity indices ranged from 0.13 to 0.27, and directional concordance with patient-stated preferences ranged from 0.625 to 1.0. All models acknowledged patient values in 100% of non-control trials, yet actual recommendation shifting remained modest. Decision-matrix and VIM self-report mitigations each improved directional concordance by 0.125 in a 78-trial Phase 2 evaluation. These findings provide empirical data for populating value disclosure labels proposed by clinical AI governance frameworks. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2603_00076 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | The Value Sensitivity Gap: How Clinical Large Language Models Respond to Patient Preference Statements in Shared Decision-Making Basu, Sanjay Computers and Society Artificial Intelligence Machine Learning 92C50, 68T42 J.3; I.2.1 Large language models (LLMs) are entering clinical workflows as decision support tools, yet how they respond to explicit patient value statements -- the core content of shared decision-making -- remains unmeasured. We conducted a factorial experiment using clinical vignettes derived from 98,759 de-identified Medicaid encounter notes. We tested four LLM families (GPT-5.2, Claude 4.5 Sonnet, Gemini 3 Pro, and DeepSeek-R1) across 13 value conditions in two clinical domains, yielding 104 trials. Default value orientations differed across model families (aggressiveness range 2.0 to 3.5 on a 1-to-5 scale). Value sensitivity indices ranged from 0.13 to 0.27, and directional concordance with patient-stated preferences ranged from 0.625 to 1.0. All models acknowledged patient values in 100% of non-control trials, yet actual recommendation shifting remained modest. Decision-matrix and VIM self-report mitigations each improved directional concordance by 0.125 in a 78-trial Phase 2 evaluation. These findings provide empirical data for populating value disclosure labels proposed by clinical AI governance frameworks. |
| title | The Value Sensitivity Gap: How Clinical Large Language Models Respond to Patient Preference Statements in Shared Decision-Making |
| topic | Computers and Society Artificial Intelligence Machine Learning 92C50, 68T42 J.3; I.2.1 |
| url | https://arxiv.org/abs/2603.00076 |