Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Sharoff, Serge, Baker, John, Hunt, David Francis, Simpson, Alan
Format: Preprint
Veröffentlicht: 2026
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2601.01171
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866918321859854336
author Sharoff, Serge
Baker, John
Hunt, David Francis
Simpson, Alan
author_facet Sharoff, Serge
Baker, John
Hunt, David Francis
Simpson, Alan
contents This study evaluates the linguistic and clinical suitability of synthetic electronic health records in mental health. First, we describe the rationale and the methodology for creating the synthetic corpus. Second, we examine expressions of agency, modality, and information flow across four clinical genres (Assessments, Correspondence, Referrals and Care plans) with the aim to understand how LLMs grammatically construct medical authority and patient agency through linguistic choices. While LLMs produce coherent, terminology-appropriate texts that approximate clinical practice, systematic divergences remain, including registerial shifts, insufficient clinical specificity, and inaccuracies in medication use and diagnostic procedures. The results show both the potential and limitations of synthetic corpora for enabling large-scale linguistic research otherwise impossible with genuine patient records.
format Preprint
id arxiv_https___arxiv_org_abs_2601_01171
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Almost Clinical: Linguistic properties of synthetic electronic health records
Sharoff, Serge
Baker, John
Hunt, David Francis
Simpson, Alan
Computation and Language
This study evaluates the linguistic and clinical suitability of synthetic electronic health records in mental health. First, we describe the rationale and the methodology for creating the synthetic corpus. Second, we examine expressions of agency, modality, and information flow across four clinical genres (Assessments, Correspondence, Referrals and Care plans) with the aim to understand how LLMs grammatically construct medical authority and patient agency through linguistic choices. While LLMs produce coherent, terminology-appropriate texts that approximate clinical practice, systematic divergences remain, including registerial shifts, insufficient clinical specificity, and inaccuracies in medication use and diagnostic procedures. The results show both the potential and limitations of synthetic corpora for enabling large-scale linguistic research otherwise impossible with genuine patient records.
title Almost Clinical: Linguistic properties of synthetic electronic health records
topic Computation and Language
url https://arxiv.org/abs/2601.01171