Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autor principal:	Bhujel, Sudip
Formato:	Preprint
Publicado:	2026
Materias:	Computation and Language
Acceso en línea:	https://arxiv.org/abs/2603.03054
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866910044782592000
author	Bhujel, Sudip
author_facet	Bhujel, Sudip
contents	Large language models are increasingly used for patient-facing medical assistance and clinical decision support, but adapting them to clinical dialogue often requires supervision derived from doctor-patient conversations that may contain sensitive information. Conventional supervised fine-tuning and reinforcement learning from human feedback (RLHF) can amplify memorization, enabling membership inference and disclosure of rare training-set details. We present PrivMedChat (Private Medical Chat), an end-to-end framework for differentially private RLHF (DP-RLHF) for medical dialogue systems. Our approach enforces differential privacy at each training stage that accesses dialogue-derived supervision, combining DP-SGD for supervised fine-tuning and reward model learning from preference pairs, and DP-aware policy optimization for alignment. To avoid costly clinician labeling, we introduce an annotation-free preference construction strategy that pairs physician responses with filtered non-expert generations. We evaluate PrivMedChat across medical dialogue tasks and assess utility, safety, and privacy under consistent privacy accounting, thereby providing a practical pathway to align medical chatbots while offering formal privacy guarantees. We open-source our code at https://github.com/sudip-bhujel/privmedchat.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_03054
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems Bhujel, Sudip Computation and Language Large language models are increasingly used for patient-facing medical assistance and clinical decision support, but adapting them to clinical dialogue often requires supervision derived from doctor-patient conversations that may contain sensitive information. Conventional supervised fine-tuning and reinforcement learning from human feedback (RLHF) can amplify memorization, enabling membership inference and disclosure of rare training-set details. We present PrivMedChat (Private Medical Chat), an end-to-end framework for differentially private RLHF (DP-RLHF) for medical dialogue systems. Our approach enforces differential privacy at each training stage that accesses dialogue-derived supervision, combining DP-SGD for supervised fine-tuning and reward model learning from preference pairs, and DP-aware policy optimization for alignment. To avoid costly clinician labeling, we introduce an annotation-free preference construction strategy that pairs physician responses with filtered non-expert generations. We evaluate PrivMedChat across medical dialogue tasks and assess utility, safety, and privacy under consistent privacy accounting, thereby providing a practical pathway to align medical chatbots while offering formal privacy guarantees. We open-source our code at https://github.com/sudip-bhujel/privmedchat.
title	PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems
topic	Computation and Language
url	https://arxiv.org/abs/2603.03054

Ejemplares similares