Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.01821 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866909304108351488 |
|---|---|
| author | Tsao, Hsi-Ai Hsiung, Lei Chen, Pin-Yu Ho, Tsung-Yi |
| author_facet | Tsao, Hsi-Ai Hsiung, Lei Chen, Pin-Yu Ho, Tsung-Yi |
| contents | Adapting pre-trained models to new tasks can exhibit varying effectiveness across datasets. Visual prompting, a state-of-the-art parameter-efficient transfer learning method, can significantly improve the performance of out-of-distribution tasks. On the other hand, linear probing, a standard transfer learning method, can sometimes become the best approach. We propose a log-likelihood ratio (LLR) approach to analyze the comparative benefits of visual prompting and linear probing. By employing the LLR score alongside resource-efficient visual prompts approximations, our cost-effective measure attains up to a 100-fold reduction in run time compared to full training, while achieving prediction accuracies up to 91%. The source code is available at https://github.com/IBM/VP-LLR. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2409_01821 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective Tsao, Hsi-Ai Hsiung, Lei Chen, Pin-Yu Ho, Tsung-Yi Computer Vision and Pattern Recognition Machine Learning Adapting pre-trained models to new tasks can exhibit varying effectiveness across datasets. Visual prompting, a state-of-the-art parameter-efficient transfer learning method, can significantly improve the performance of out-of-distribution tasks. On the other hand, linear probing, a standard transfer learning method, can sometimes become the best approach. We propose a log-likelihood ratio (LLR) approach to analyze the comparative benefits of visual prompting and linear probing. By employing the LLR score alongside resource-efficient visual prompts approximations, our cost-effective measure attains up to a 100-fold reduction in run time compared to full training, while achieving prediction accuracies up to 91%. The source code is available at https://github.com/IBM/VP-LLR. |
| title | When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective |
| topic | Computer Vision and Pattern Recognition Machine Learning |
| url | https://arxiv.org/abs/2409.01821 |