Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Phillips, Edward, Gustafsson, Fredrik K., Wu, Sean, Thakur, Anshul, Clifton, David A.
Format:	Preprint
Publié:	2026
Sujets:	Computation and Language
Accès en ligne:	https://arxiv.org/abs/2603.21172
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866910062864236544
author	Phillips, Edward Gustafsson, Fredrik K. Wu, Sean Thakur, Anshul Clifton, David A.
author_facet	Phillips, Edward Gustafsson, Fredrik K. Wu, Sean Thakur, Anshul Clifton, David A.
contents	Selective prediction systems can mitigate harms resulting from language model hallucinations by abstaining from answering in high-risk cases. Uncertainty quantification techniques are often employed to identify such cases, but are rarely evaluated in the context of the wider selective prediction policy and its ability to operate at low target error rates. We identify a model-dependent failure mode of entropy-based uncertainty methods that leads to unreliable abstention behaviour, and address it by combining entropy scores with a correctness probe signal. We find that across three QA benchmarks (TriviaQA, BioASQ, MedicalQA) and four model families, the combined score generally improves both the risk--coverage trade-off and calibration performance relative to entropy-only baselines. Our results highlight the importance of deployment-facing evaluation of uncertainty methods, using metrics that directly reflect whether a system can be trusted to operate at a stated risk level.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_21172
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Entropy Alone is Insufficient for Safe Selective Prediction in LLMs Phillips, Edward Gustafsson, Fredrik K. Wu, Sean Thakur, Anshul Clifton, David A. Computation and Language Selective prediction systems can mitigate harms resulting from language model hallucinations by abstaining from answering in high-risk cases. Uncertainty quantification techniques are often employed to identify such cases, but are rarely evaluated in the context of the wider selective prediction policy and its ability to operate at low target error rates. We identify a model-dependent failure mode of entropy-based uncertainty methods that leads to unreliable abstention behaviour, and address it by combining entropy scores with a correctness probe signal. We find that across three QA benchmarks (TriviaQA, BioASQ, MedicalQA) and four model families, the combined score generally improves both the risk--coverage trade-off and calibration performance relative to entropy-only baselines. Our results highlight the importance of deployment-facing evaluation of uncertainty methods, using metrics that directly reflect whether a system can be trusted to operate at a stated risk level.
title	Entropy Alone is Insufficient for Safe Selective Prediction in LLMs
topic	Computation and Language
url	https://arxiv.org/abs/2603.21172

Documents similaires