Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Zheng, Song, Yibing, Cheng, Ming-Ming, Li, Xiang, Yang, Jian
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2412.09442
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915399379976192
author	Li, Zheng Song, Yibing Cheng, Ming-Ming Li, Xiang Yang, Jian
author_facet	Li, Zheng Song, Yibing Cheng, Ming-Ming Li, Xiang Yang, Jian
contents	Textual-based prompt learning methods primarily employ multiple learnable soft prompts and hard class tokens in a cascading manner as text inputs, aiming to align image and text (category) spaces for downstream tasks. However, current training is restricted to aligning images with predefined known categories and cannot be associated with unknown categories. In this work, we propose utilizing universal attributes as a bridge to enhance the alignment between images and unknown categories. Specifically, we introduce an Attribute-anchored Textual Prompt learning method for vision-language models, named ATPrompt. This approach expands the learning space of soft prompts from the original one-dimensional category level into the multi-dimensional attribute level by incorporating multiple attribute tokens into the learnable soft prompts. Through this modification, we transform the text prompt from a category-centric form to an attribute-category hybrid form. Additionally, we introduce a straightforward differentiable attribute search method to identify representative and suitable attributes for downstream tasks. As an easy-to-use plug-in technique, ATPrompt can seamlessly replace the existing basic prompt format in textual-based methods, providing general improvements at a negligible computational cost. Extensive experiments across 11 datasets validate the effectiveness of our method. Code is publicly available at https://github.com/zhengli97/ATPrompt.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_09442
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Advancing Textual Prompt Learning with Anchored Attributes Li, Zheng Song, Yibing Cheng, Ming-Ming Li, Xiang Yang, Jian Computer Vision and Pattern Recognition Textual-based prompt learning methods primarily employ multiple learnable soft prompts and hard class tokens in a cascading manner as text inputs, aiming to align image and text (category) spaces for downstream tasks. However, current training is restricted to aligning images with predefined known categories and cannot be associated with unknown categories. In this work, we propose utilizing universal attributes as a bridge to enhance the alignment between images and unknown categories. Specifically, we introduce an Attribute-anchored Textual Prompt learning method for vision-language models, named ATPrompt. This approach expands the learning space of soft prompts from the original one-dimensional category level into the multi-dimensional attribute level by incorporating multiple attribute tokens into the learnable soft prompts. Through this modification, we transform the text prompt from a category-centric form to an attribute-category hybrid form. Additionally, we introduce a straightforward differentiable attribute search method to identify representative and suitable attributes for downstream tasks. As an easy-to-use plug-in technique, ATPrompt can seamlessly replace the existing basic prompt format in textual-based methods, providing general improvements at a negligible computational cost. Extensive experiments across 11 datasets validate the effectiveness of our method. Code is publicly available at https://github.com/zhengli97/ATPrompt.
title	Advancing Textual Prompt Learning with Anchored Attributes
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2412.09442

Similar Items