Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Bhatnagar, Rohan, Sun, Youran, Zhang, Chi Andrew, Wen, Yixin, Yang, Haizhao
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2601.14210
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917232077963264
author	Bhatnagar, Rohan Sun, Youran Zhang, Chi Andrew Wen, Yixin Yang, Haizhao
author_facet	Bhatnagar, Rohan Sun, Youran Zhang, Chi Andrew Wen, Yixin Yang, Haizhao
contents	LLMs often produce fluent but incorrect answers, yet detecting such hallucinations typically requires multiple sampling passes or post-hoc verification, adding significant latency and cost. We hypothesize that intermediate layers encode confidence signals that are lost in the final output layer, and propose a lightweight probe to read these signals directly from hidden states. The probe adds less than 0.1\% computational overhead and can run fully in parallel with generation, enabling hallucination detection before the answer is produced. Building on this, we develop an LLM router that answers confident queries immediately while delegating uncertain ones to stronger models. Despite its simplicity, our method achieves SOTA AUROC on 10 out of 12 settings across four QA benchmarks and three LLM families, with gains of up to 13 points over prior methods, and generalizes across dataset shifts without retraining.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_14210
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	DRIFT: Detecting Representational Inconsistencies for Factual Truthfulness Bhatnagar, Rohan Sun, Youran Zhang, Chi Andrew Wen, Yixin Yang, Haizhao Computation and Language LLMs often produce fluent but incorrect answers, yet detecting such hallucinations typically requires multiple sampling passes or post-hoc verification, adding significant latency and cost. We hypothesize that intermediate layers encode confidence signals that are lost in the final output layer, and propose a lightweight probe to read these signals directly from hidden states. The probe adds less than 0.1\% computational overhead and can run fully in parallel with generation, enabling hallucination detection before the answer is produced. Building on this, we develop an LLM router that answers confident queries immediately while delegating uncertain ones to stronger models. Despite its simplicity, our method achieves SOTA AUROC on 10 out of 12 settings across four QA benchmarks and three LLM families, with gains of up to 13 points over prior methods, and generalizes across dataset shifts without retraining.
title	DRIFT: Detecting Representational Inconsistencies for Factual Truthfulness
topic	Computation and Language
url	https://arxiv.org/abs/2601.14210

Similar Items