Saved in:
Bibliographic Details
Main Authors: Yeh, Yi-Fan, Tao, Linwei, Dong, Minjing, Huang, Tao, Yu, Jialin, Torr, Philip, Xu, Chang
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.19344
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910272797540352
author Yeh, Yi-Fan
Tao, Linwei
Dong, Minjing
Huang, Tao
Yu, Jialin
Torr, Philip
Xu, Chang
author_facet Yeh, Yi-Fan
Tao, Linwei
Dong, Minjing
Huang, Tao
Yu, Jialin
Torr, Philip
Xu, Chang
contents Linguistic cues such as "I believe" and "probably" offer an intuitive interface for communicating confidence, yet a generalisable, principled calibration framework for linguistic confidence expressions remains underexplored. In particular, co-occurring linguistic cues, contextual variation, and subjective audience interpretation pose unique challenges. We therefore model linguistic confidence as a distribution over plausible perceived probability values that a statement is correct, capturing interpretation variability that scalar representations discard. Within this distributional framework, we introduce faithfulness as a complementary evaluation dimension and present Faithfulness Divergence (FD), an information-theoretic metric quantifying the surprise induced in audience beliefs upon truth revelation. Building on these foundations, we present Retrieval-Augmented Linguistic Calibration (RALC), a lightweight post-hoc pipeline that propagates calibrated confidence signals back into natural language via retrieval-augmented rewriting. Across three QA benchmarks and five LLM families, RALC improves in-domain faithfulness and calibration up to 66% and 58%, respectively, outperforming black-box and grey-box calibration baselines.
format Preprint
id arxiv_https___arxiv_org_abs_2605_19344
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Retrieval-Augmented Linguistic Calibration
Yeh, Yi-Fan
Tao, Linwei
Dong, Minjing
Huang, Tao
Yu, Jialin
Torr, Philip
Xu, Chang
Computation and Language
Linguistic cues such as "I believe" and "probably" offer an intuitive interface for communicating confidence, yet a generalisable, principled calibration framework for linguistic confidence expressions remains underexplored. In particular, co-occurring linguistic cues, contextual variation, and subjective audience interpretation pose unique challenges. We therefore model linguistic confidence as a distribution over plausible perceived probability values that a statement is correct, capturing interpretation variability that scalar representations discard. Within this distributional framework, we introduce faithfulness as a complementary evaluation dimension and present Faithfulness Divergence (FD), an information-theoretic metric quantifying the surprise induced in audience beliefs upon truth revelation. Building on these foundations, we present Retrieval-Augmented Linguistic Calibration (RALC), a lightweight post-hoc pipeline that propagates calibrated confidence signals back into natural language via retrieval-augmented rewriting. Across three QA benchmarks and five LLM families, RALC improves in-domain faithfulness and calibration up to 66% and 58%, respectively, outperforming black-box and grey-box calibration baselines.
title Retrieval-Augmented Linguistic Calibration
topic Computation and Language
url https://arxiv.org/abs/2605.19344