Saved in:
Bibliographic Details
Main Authors: Michael, Noam, BenShushan, Daniel, Bien, Jacob, Moore, Don A.
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.23909
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918519162011648
author Michael, Noam
BenShushan, Daniel
Bien, Jacob
Moore, Don A.
author_facet Michael, Noam
BenShushan, Daniel
Bien, Jacob
Moore, Don A.
contents We investigate the calibration of large language models' (LLMs') confidence across diverse tasks. The results of our preregistered study show that the current crop of LLMs are, like people, too sure they are right: confidence exceeds accuracy, on average. Importantly, however, this tendency is moderated by a powerful hard-easy effect, wherein overconfidence is greatest on difficult tests; by contrast, easy tests actually show substantial underconfidence. We develop LifeEval, a test for evaluating model calibration across levels of difficulty.
format Preprint
id arxiv_https___arxiv_org_abs_2605_23909
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Confidence Calibration in Large Language Models
Michael, Noam
BenShushan, Daniel
Bien, Jacob
Moore, Don A.
Artificial Intelligence
Machine Learning
We investigate the calibration of large language models' (LLMs') confidence across diverse tasks. The results of our preregistered study show that the current crop of LLMs are, like people, too sure they are right: confidence exceeds accuracy, on average. Importantly, however, this tendency is moderated by a powerful hard-easy effect, wherein overconfidence is greatest on difficult tests; by contrast, easy tests actually show substantial underconfidence. We develop LifeEval, a test for evaluating model calibration across levels of difficulty.
title Confidence Calibration in Large Language Models
topic Artificial Intelligence
Machine Learning
url https://arxiv.org/abs/2605.23909