Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yuan, Han, Hong, Chuan
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2402.02561
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911815545389056
author	Yuan, Han Hong, Chuan
author_facet	Yuan, Han Hong, Chuan
contents	Active learning selects the most informative samples from the unlabelled dataset to annotate in the context of a limited annotation budget. While numerous methods have been proposed for subsequent sample selection based on an initialized model, scant attention has been paid to the indispensable phase of active learning: selecting samples for model cold-start initialization. Most of the previous studies resort to random sampling or naive clustering. However, random sampling is prone to fluctuation, and naive clustering suffers from convergence speed, particularly when dealing with high-dimensional data such as imaging data. In this work, we propose to integrate foundation models with clustering methods to select samples for cold-start active learning initialization. Foundation models refer to those trained on massive datasets by the self-supervised paradigm and capable of generating informative and compacted embeddings for various downstream tasks. Leveraging these embeddings to replace raw features such as pixel values, clustering quickly converges and identifies better initial samples. For a comprehensive comparison, we included a classic ImageNet-supervised model to acquire embeddings. Experiments on two clinical tasks of image classification and segmentation demonstrated that foundation model-based clustering efficiently pinpointed informative initial samples, leading to models showcasing enhanced performance than the baseline methods. We envisage that this study provides an effective paradigm for future cold-start active learning.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_02561
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Foundation Model Makes Clustering A Better Initialization For Cold-Start Active Learning Yuan, Han Hong, Chuan Machine Learning Computer Vision and Pattern Recognition Active learning selects the most informative samples from the unlabelled dataset to annotate in the context of a limited annotation budget. While numerous methods have been proposed for subsequent sample selection based on an initialized model, scant attention has been paid to the indispensable phase of active learning: selecting samples for model cold-start initialization. Most of the previous studies resort to random sampling or naive clustering. However, random sampling is prone to fluctuation, and naive clustering suffers from convergence speed, particularly when dealing with high-dimensional data such as imaging data. In this work, we propose to integrate foundation models with clustering methods to select samples for cold-start active learning initialization. Foundation models refer to those trained on massive datasets by the self-supervised paradigm and capable of generating informative and compacted embeddings for various downstream tasks. Leveraging these embeddings to replace raw features such as pixel values, clustering quickly converges and identifies better initial samples. For a comprehensive comparison, we included a classic ImageNet-supervised model to acquire embeddings. Experiments on two clinical tasks of image classification and segmentation demonstrated that foundation model-based clustering efficiently pinpointed informative initial samples, leading to models showcasing enhanced performance than the baseline methods. We envisage that this study provides an effective paradigm for future cold-start active learning.
title	Foundation Model Makes Clustering A Better Initialization For Cold-Start Active Learning
topic	Machine Learning Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2402.02561

Similar Items