Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Döbler, Mario, Marsden, Robert A., Raichle, Tobias, Yang, Bin
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2405.14977
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914942758682624
author	Döbler, Mario Marsden, Robert A. Raichle, Tobias Yang, Bin
author_facet	Döbler, Mario Marsden, Robert A. Raichle, Tobias Yang, Bin
contents	In deep learning, maintaining model robustness against distribution shifts is critical. This work explores a broad range of possibilities to adapt vision-language foundation models at test-time, with a particular emphasis on CLIP and its variants. The study systematically examines prompt-based techniques and existing test-time adaptation methods, aiming to improve the robustness under distribution shift in diverse real-world scenarios. Specifically, the investigation covers various prompt engineering strategies, including handcrafted prompts, prompt ensembles, and prompt learning techniques. Additionally, we introduce a vision-text-space ensemble that substantially enhances average performance compared to text-space-only ensembles. Since online test-time adaptation has shown to be effective to mitigate performance drops under distribution shift, the study extends its scope to evaluate the effectiveness of existing test-time adaptation methods that were originally designed for vision-only classification models. Through extensive experimental evaluations conducted across multiple datasets and diverse model architectures, the research demonstrates the effectiveness of these adaptation strategies. Code is available at: https://github.com/mariodoebler/test-time-adaptation
format	Preprint
id	arxiv_https___arxiv_org_abs_2405_14977
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	A Lost Opportunity for Vision-Language Models: A Comparative Study of Online Test-Time Adaptation for Vision-Language Models Döbler, Mario Marsden, Robert A. Raichle, Tobias Yang, Bin Computer Vision and Pattern Recognition In deep learning, maintaining model robustness against distribution shifts is critical. This work explores a broad range of possibilities to adapt vision-language foundation models at test-time, with a particular emphasis on CLIP and its variants. The study systematically examines prompt-based techniques and existing test-time adaptation methods, aiming to improve the robustness under distribution shift in diverse real-world scenarios. Specifically, the investigation covers various prompt engineering strategies, including handcrafted prompts, prompt ensembles, and prompt learning techniques. Additionally, we introduce a vision-text-space ensemble that substantially enhances average performance compared to text-space-only ensembles. Since online test-time adaptation has shown to be effective to mitigate performance drops under distribution shift, the study extends its scope to evaluate the effectiveness of existing test-time adaptation methods that were originally designed for vision-only classification models. Through extensive experimental evaluations conducted across multiple datasets and diverse model architectures, the research demonstrates the effectiveness of these adaptation strategies. Code is available at: https://github.com/mariodoebler/test-time-adaptation
title	A Lost Opportunity for Vision-Language Models: A Comparative Study of Online Test-Time Adaptation for Vision-Language Models
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2405.14977

Similar Items