Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Qin, Jeremy, Liu, Bang, Nguyen, Quoc Dinh
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2409.03225
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909305968525312
author	Qin, Jeremy Liu, Bang Nguyen, Quoc Dinh
author_facet	Qin, Jeremy Liu, Bang Nguyen, Quoc Dinh
contents	Black-box large language models (LLMs) are increasingly deployed in various environments, making it essential for these models to effectively convey their confidence and uncertainty, especially in high-stakes settings. However, these models often exhibit overconfidence, leading to potential risks and misjudgments. Existing techniques for eliciting and calibrating LLM confidence have primarily focused on general reasoning datasets, yielding only modest improvements. Accurate calibration is crucial for informed decision-making and preventing adverse outcomes but remains challenging due to the complexity and variability of tasks these models perform. In this work, we investigate the miscalibration behavior of black-box LLMs within the healthcare setting. We propose a novel method, \textit{Atypical Presentations Recalibration}, which leverages atypical presentations to adjust the model's confidence estimates. Our approach significantly improves calibration, reducing calibration errors by approximately 60\% on three medical question answering datasets and outperforming existing methods such as vanilla verbalized confidence, CoT verbalized confidence and others. Additionally, we provide an in-depth analysis of the role of atypicality within the recalibration framework.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_03225
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration Qin, Jeremy Liu, Bang Nguyen, Quoc Dinh Computation and Language Black-box large language models (LLMs) are increasingly deployed in various environments, making it essential for these models to effectively convey their confidence and uncertainty, especially in high-stakes settings. However, these models often exhibit overconfidence, leading to potential risks and misjudgments. Existing techniques for eliciting and calibrating LLM confidence have primarily focused on general reasoning datasets, yielding only modest improvements. Accurate calibration is crucial for informed decision-making and preventing adverse outcomes but remains challenging due to the complexity and variability of tasks these models perform. In this work, we investigate the miscalibration behavior of black-box LLMs within the healthcare setting. We propose a novel method, \textit{Atypical Presentations Recalibration}, which leverages atypical presentations to adjust the model's confidence estimates. Our approach significantly improves calibration, reducing calibration errors by approximately 60\% on three medical question answering datasets and outperforming existing methods such as vanilla verbalized confidence, CoT verbalized confidence and others. Additionally, we provide an in-depth analysis of the role of atypicality within the recalibration framework.
title	Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration
topic	Computation and Language
url	https://arxiv.org/abs/2409.03225

Similar Items