Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Hanyin, Gao, Chufan, Xu, Qiping, Liu, Bolun, Hussein, Guleid, Korsapati, Hariprasad, Labban, Mohamad El, Iheasirim, Kingsley, Hassan, Mohamed, Anil, Gokhan, Bartlett, Brian, Sun, Jimeng
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2412.12583
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911141898223616
author	Wang, Hanyin Gao, Chufan Xu, Qiping Liu, Bolun Hussein, Guleid Korsapati, Hariprasad Labban, Mohamad El Iheasirim, Kingsley Hassan, Mohamed Anil, Gokhan Bartlett, Brian Sun, Jimeng
author_facet	Wang, Hanyin Gao, Chufan Xu, Qiping Liu, Bolun Hussein, Guleid Korsapati, Hariprasad Labban, Mohamad El Iheasirim, Kingsley Hassan, Mohamed Anil, Gokhan Bartlett, Brian Sun, Jimeng
contents	Process-supervised reward models (PRMs) excel at providing step-by-step verification for large language model (LLM) outputs in domains like mathematics and coding. However, their application to fields lacking ground-truth answers, such as clinical note generation, poses significant challenges. We introduce a novel framework for training PRMs to deliver step-level reward signals for LLM-generated clinical notes. By precisely defining meaningful "steps," injecting realistic "errors" informed by domain expertise, and leveraging LLMs to generate process supervision data at scale, we overcome previous limitations. Our PRM, built on LLaMA-3.1 8B, consistently outperforms proprietary reasoning and non-reasoning models, achieving state-of-the-art performance on two key evaluations: (1) distinguishing gold-standard from error-containing samples with 98.8% accuracy, and (2) selecting physician-preferred clinical notes with 56.2% accuracy. We investigate critical components for effective PRM training, including optimal loss functions and data selection strategies, and present a comprehensive physician reader study identifying predictors of downstream Best-of-N performance. Our study sheds light on unlocking the potential of PRMs for diverse generative tasks across domains.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_12583
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Process-Supervised Reward Models for Verifying Clinical Note Generation: A Scalable Approach Guided by Domain Expertise Wang, Hanyin Gao, Chufan Xu, Qiping Liu, Bolun Hussein, Guleid Korsapati, Hariprasad Labban, Mohamad El Iheasirim, Kingsley Hassan, Mohamed Anil, Gokhan Bartlett, Brian Sun, Jimeng Computation and Language Process-supervised reward models (PRMs) excel at providing step-by-step verification for large language model (LLM) outputs in domains like mathematics and coding. However, their application to fields lacking ground-truth answers, such as clinical note generation, poses significant challenges. We introduce a novel framework for training PRMs to deliver step-level reward signals for LLM-generated clinical notes. By precisely defining meaningful "steps," injecting realistic "errors" informed by domain expertise, and leveraging LLMs to generate process supervision data at scale, we overcome previous limitations. Our PRM, built on LLaMA-3.1 8B, consistently outperforms proprietary reasoning and non-reasoning models, achieving state-of-the-art performance on two key evaluations: (1) distinguishing gold-standard from error-containing samples with 98.8% accuracy, and (2) selecting physician-preferred clinical notes with 56.2% accuracy. We investigate critical components for effective PRM training, including optimal loss functions and data selection strategies, and present a comprehensive physician reader study identifying predictors of downstream Best-of-N performance. Our study sheds light on unlocking the potential of PRMs for diverse generative tasks across domains.
title	Process-Supervised Reward Models for Verifying Clinical Note Generation: A Scalable Approach Guided by Domain Expertise
topic	Computation and Language
url	https://arxiv.org/abs/2412.12583

Similar Items