Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Luo, Beier, Wang, Cheng, Wei, Hongxin, Li, Sharon, Du, Xuefeng
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2601.04277
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915715359965184
author	Luo, Beier Wang, Cheng Wei, Hongxin Li, Sharon Du, Xuefeng
author_facet	Luo, Beier Wang, Cheng Wei, Hongxin Li, Sharon Du, Xuefeng
contents	Post-training improves large language models (LLMs) but often worsens confidence calibration, leading to systematic overconfidence. Recent unsupervised post-hoc methods for post-trained LMs (PoLMs) mitigate this by aligning PoLM confidence to that of well-calibrated pre-trained counterparts. However, framing calibration as static output-distribution matching overlooks the inference-time dynamics introduced by post-training. In particular, we show that calibration errors arise from two regimes: (i) confidence drift, where final confidence inflates despite largely consistent intermediate decision processes, and (ii) process drift, where intermediate inference pathways diverge. Guided by this diagnosis, we propose Dual-Align, an unsupervised post-hoc framework for dual alignment in confidence calibration. Dual-Align performs confidence alignment to correct confidence drift via final-distribution matching, and introduces process alignment to address process drift by locating the layer where trajectories diverge and realigning the stability of subsequent inference. This dual strategy learns a single temperature parameter that corrects both drift types without sacrificing post-training performance gains. Experiments show consistent improvements over baselines, reducing calibration errors and approaching a supervised oracle.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_04277
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Unlocking the Pre-Trained Model as a Dual-Alignment Calibrator for Post-Trained LLMs Luo, Beier Wang, Cheng Wei, Hongxin Li, Sharon Du, Xuefeng Machine Learning Post-training improves large language models (LLMs) but often worsens confidence calibration, leading to systematic overconfidence. Recent unsupervised post-hoc methods for post-trained LMs (PoLMs) mitigate this by aligning PoLM confidence to that of well-calibrated pre-trained counterparts. However, framing calibration as static output-distribution matching overlooks the inference-time dynamics introduced by post-training. In particular, we show that calibration errors arise from two regimes: (i) confidence drift, where final confidence inflates despite largely consistent intermediate decision processes, and (ii) process drift, where intermediate inference pathways diverge. Guided by this diagnosis, we propose Dual-Align, an unsupervised post-hoc framework for dual alignment in confidence calibration. Dual-Align performs confidence alignment to correct confidence drift via final-distribution matching, and introduces process alignment to address process drift by locating the layer where trajectories diverge and realigning the stability of subsequent inference. This dual strategy learns a single temperature parameter that corrects both drift types without sacrificing post-training performance gains. Experiments show consistent improvements over baselines, reducing calibration errors and approaching a supervised oracle.
title	Unlocking the Pre-Trained Model as a Dual-Alignment Calibrator for Post-Trained LLMs
topic	Machine Learning
url	https://arxiv.org/abs/2601.04277

Similar Items