Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xu, Yicheng, Chen, Yuxin, Nie, Jiahao, Wang, Yusong, Zhuang, Huiping, Okumura, Manabu
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2406.18868
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929635833413632
author	Xu, Yicheng Chen, Yuxin Nie, Jiahao Wang, Yusong Zhuang, Huiping Okumura, Manabu
author_facet	Xu, Yicheng Chen, Yuxin Nie, Jiahao Wang, Yusong Zhuang, Huiping Okumura, Manabu
contents	Continual learning (CL) with Vision-Language Models (VLMs) has overcome the constraints of traditional CL, which only focuses on previously encountered classes. During the CL of VLMs, we need not only to prevent the catastrophic forgetting on incrementally learned knowledge but also to preserve the zero-shot ability of VLMs. However, existing methods require additional reference datasets to maintain such zero-shot ability and rely on domain-identity hints to classify images across different domains. In this study, we propose Regression-based Analytic Incremental Learning (RAIL), which utilizes a recursive ridge regression-based adapter to learn from a sequence of domains in a non-forgetting manner and decouple the cross-domain correlations by projecting features to a higher-dimensional space. Cooperating with a training-free fusion module, RAIL absolutely preserves the VLM's zero-shot ability on unseen domains without any reference data. Additionally, we introduce Cross-domain Task-Agnostic Incremental Learning (X-TAIL) setting. In this setting, a CL learner is required to incrementally learn from multiple domains and classify test images from both seen and unseen domains without any domain-identity hint. We theoretically prove RAIL's absolute memorization on incrementally learned domains. Experiment results affirm RAIL's state-of-the-art performance in both X-TAIL and existing Multi-domain Task-Incremental Learning settings. The code is released at https://github.com/linghan1997/Regression-based-Analytic-Incremental-Learning.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_18868
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Advancing Cross-domain Discriminability in Continual Learning of Vision-Language Models Xu, Yicheng Chen, Yuxin Nie, Jiahao Wang, Yusong Zhuang, Huiping Okumura, Manabu Computer Vision and Pattern Recognition Continual learning (CL) with Vision-Language Models (VLMs) has overcome the constraints of traditional CL, which only focuses on previously encountered classes. During the CL of VLMs, we need not only to prevent the catastrophic forgetting on incrementally learned knowledge but also to preserve the zero-shot ability of VLMs. However, existing methods require additional reference datasets to maintain such zero-shot ability and rely on domain-identity hints to classify images across different domains. In this study, we propose Regression-based Analytic Incremental Learning (RAIL), which utilizes a recursive ridge regression-based adapter to learn from a sequence of domains in a non-forgetting manner and decouple the cross-domain correlations by projecting features to a higher-dimensional space. Cooperating with a training-free fusion module, RAIL absolutely preserves the VLM's zero-shot ability on unseen domains without any reference data. Additionally, we introduce Cross-domain Task-Agnostic Incremental Learning (X-TAIL) setting. In this setting, a CL learner is required to incrementally learn from multiple domains and classify test images from both seen and unseen domains without any domain-identity hint. We theoretically prove RAIL's absolute memorization on incrementally learned domains. Experiment results affirm RAIL's state-of-the-art performance in both X-TAIL and existing Multi-domain Task-Incremental Learning settings. The code is released at https://github.com/linghan1997/Regression-based-Analytic-Incremental-Learning.
title	Advancing Cross-domain Discriminability in Continual Learning of Vision-Language Models
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2406.18868

Similar Items