Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sharifdeen, Ashshak, Shamshad, Fahad, Munir, Muhammad Akhtar, Basu, Abhishek, Ismithdeen, Mohamed Insaf, Jeyamohan, Jeyapriyan, Silva, Chathurika Sewwandi, Nandakumar, Karthik, Khan, Muhammad Haris
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.19024
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910029171392512
author	Sharifdeen, Ashshak Shamshad, Fahad Munir, Muhammad Akhtar Basu, Abhishek Ismithdeen, Mohamed Insaf Jeyamohan, Jeyapriyan Silva, Chathurika Sewwandi Nandakumar, Karthik Khan, Muhammad Haris
author_facet	Sharifdeen, Ashshak Shamshad, Fahad Munir, Muhammad Akhtar Basu, Abhishek Ismithdeen, Mohamed Insaf Jeyamohan, Jeyapriyan Silva, Chathurika Sewwandi Nandakumar, Karthik Khan, Muhammad Haris
contents	Prompt tuning of large-scale vision-language models such as CLIP enables efficient task adaptation without updating model weights. However, it often leads to poor confidence calibration and unreliable predictive uncertainty. We address this problem by proposing a calibration framework that enhances predictive reliability while preserving the geometry of the pretrained CLIP embedding space, which is required for robust generalization. Our approach extends the standard cross-entropy loss with two complementary regularizers: (1) a mean-variance margin penalty that stabilizes inter-class logit margins by maximizing their average while minimizing dispersion, mitigating underconfidence and overconfidence spikes; and (2) a text moment-matching loss that aligns the first and second moments of tuned text embeddings with their frozen CLIP counterparts, preserving semantic dispersion crucial for generalization. Through extensive experiments across 7 prompt-tuning methods and 11 diverse datasets, we demonstrate that our approach significantly reduces the Expected Calibration Error (ECE) compared to competitive calibration techniques on both base and novel classes
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_19024
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Towards Calibrating Prompt Tuning of Vision-Language Models Sharifdeen, Ashshak Shamshad, Fahad Munir, Muhammad Akhtar Basu, Abhishek Ismithdeen, Mohamed Insaf Jeyamohan, Jeyapriyan Silva, Chathurika Sewwandi Nandakumar, Karthik Khan, Muhammad Haris Computer Vision and Pattern Recognition Prompt tuning of large-scale vision-language models such as CLIP enables efficient task adaptation without updating model weights. However, it often leads to poor confidence calibration and unreliable predictive uncertainty. We address this problem by proposing a calibration framework that enhances predictive reliability while preserving the geometry of the pretrained CLIP embedding space, which is required for robust generalization. Our approach extends the standard cross-entropy loss with two complementary regularizers: (1) a mean-variance margin penalty that stabilizes inter-class logit margins by maximizing their average while minimizing dispersion, mitigating underconfidence and overconfidence spikes; and (2) a text moment-matching loss that aligns the first and second moments of tuned text embeddings with their frozen CLIP counterparts, preserving semantic dispersion crucial for generalization. Through extensive experiments across 7 prompt-tuning methods and 11 diverse datasets, we demonstrate that our approach significantly reduces the Expected Calibration Error (ECE) compared to competitive calibration techniques on both base and novel classes
title	Towards Calibrating Prompt Tuning of Vision-Language Models
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.19024

Similar Items