Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lee, Sua, Shin, Kyubum, Park, Jung Ho
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2507.07147
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909682444009472
author	Lee, Sua Shin, Kyubum Park, Jung Ho
author_facet	Lee, Sua Shin, Kyubum Park, Jung Ho
contents	Recent advances in pre-trained Vision Language Models (VLM) have shown promising potential for effectively adapting to downstream tasks through prompt learning, without the need for additional annotated paired datasets. To supplement the text information in VLM trained on correlations with vision data, new approaches leveraging Large Language Models (LLM) in prompts have been proposed, enhancing robustness to unseen and diverse data. Existing methods typically extract text-based responses (i.e., descriptions) from LLM to incorporate into prompts; however, this approach suffers from high variability and low reliability. In this work, we propose Description-free Multi-prompt Learning(DeMul), a novel method that eliminates the process of extracting descriptions and instead directly distills knowledge from LLM into prompts. By adopting a description-free approach, prompts can encapsulate richer semantics while still being represented as continuous vectors for optimization, thereby eliminating the need for discrete pre-defined templates. Additionally, in a multi-prompt setting, we empirically demonstrate the potential of prompt weighting in reflecting the importance of different prompts during training. Experimental results show that our approach achieves superior performance across 11 recognition datasets.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_07147
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Weighted Multi-Prompt Learning with Description-free Large Language Model Distillation Lee, Sua Shin, Kyubum Park, Jung Ho Machine Learning Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition Recent advances in pre-trained Vision Language Models (VLM) have shown promising potential for effectively adapting to downstream tasks through prompt learning, without the need for additional annotated paired datasets. To supplement the text information in VLM trained on correlations with vision data, new approaches leveraging Large Language Models (LLM) in prompts have been proposed, enhancing robustness to unseen and diverse data. Existing methods typically extract text-based responses (i.e., descriptions) from LLM to incorporate into prompts; however, this approach suffers from high variability and low reliability. In this work, we propose Description-free Multi-prompt Learning(DeMul), a novel method that eliminates the process of extracting descriptions and instead directly distills knowledge from LLM into prompts. By adopting a description-free approach, prompts can encapsulate richer semantics while still being represented as continuous vectors for optimization, thereby eliminating the need for discrete pre-defined templates. Additionally, in a multi-prompt setting, we empirically demonstrate the potential of prompt weighting in reflecting the importance of different prompts during training. Experimental results show that our approach achieves superior performance across 11 recognition datasets.
title	Weighted Multi-Prompt Learning with Description-free Large Language Model Distillation
topic	Machine Learning Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2507.07147

Similar Items