Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Michael, Noam, BenShushan, Daniel, Bien, Jacob, Moore, Don A.
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2605.23909
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918519162011648
author	Michael, Noam BenShushan, Daniel Bien, Jacob Moore, Don A.
author_facet	Michael, Noam BenShushan, Daniel Bien, Jacob Moore, Don A.
contents	We investigate the calibration of large language models' (LLMs') confidence across diverse tasks. The results of our preregistered study show that the current crop of LLMs are, like people, too sure they are right: confidence exceeds accuracy, on average. Importantly, however, this tendency is moderated by a powerful hard-easy effect, wherein overconfidence is greatest on difficult tests; by contrast, easy tests actually show substantial underconfidence. We develop LifeEval, a test for evaluating model calibration across levels of difficulty.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_23909
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Confidence Calibration in Large Language Models Michael, Noam BenShushan, Daniel Bien, Jacob Moore, Don A. Artificial Intelligence Machine Learning We investigate the calibration of large language models' (LLMs') confidence across diverse tasks. The results of our preregistered study show that the current crop of LLMs are, like people, too sure they are right: confidence exceeds accuracy, on average. Importantly, however, this tendency is moderated by a powerful hard-easy effect, wherein overconfidence is greatest on difficult tests; by contrast, easy tests actually show substantial underconfidence. We develop LifeEval, a test for evaluating model calibration across levels of difficulty.
title	Confidence Calibration in Large Language Models
topic	Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2605.23909

Similar Items