Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Yiping, Chen, Yifang, Yan, Wendan, Jamieson, Kevin, Du, Simon Shaolei
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2402.02055
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913222391496704
author	Wang, Yiping Chen, Yifang Yan, Wendan Jamieson, Kevin Du, Simon Shaolei
author_facet	Wang, Yiping Chen, Yifang Yan, Wendan Jamieson, Kevin Du, Simon Shaolei
contents	In recent years, data selection has emerged as a core issue for large-scale visual-language model pretraining, especially on noisy web-curated datasets. One widely adopted strategy assigns quality scores such as CLIP similarity for each sample and retains the data pairs with the highest scores. However, these approaches are agnostic of data distribution and always fail to select the most informative samples. To solve this problem, we propose a simple yet theoretically principled metric named Variance Alignment Score (VAS), which has the form $\langle Σ_{\text{test}}, Σ_i\rangle$. Here, $Σ_{\text{test}}$ represents the target (cross-)covariance matrix we aim to align, potentially based on prior knowledge, while $Σ_i$ denotes the tensor product of single or multi-modal representations for the $i$-th sample. We further design a new data selection method that maximizes the total VAS. We provide theoretical analysis in a simplified setting to demonstrate the theoretical advantage of VAS over random or other existing data selection. Experimentally, applying VAS and CLIP scores together can outperform baselines by a margin of $1.3\%$ average on 38 evaluation sets for noisy dataset DataComp and $2.5\%$ on VTAB for high-quality dataset CC12M. Additionally, our ablation study also shows visual features are better than text for calculating VAS, and the related classical experimental design methods may fail under this context.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_02055
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Variance Alignment Score: A Simple But Tough-to-Beat Data Selection Method for Multimodal Contrastive Learning Wang, Yiping Chen, Yifang Yan, Wendan Jamieson, Kevin Du, Simon Shaolei Machine Learning Artificial Intelligence In recent years, data selection has emerged as a core issue for large-scale visual-language model pretraining, especially on noisy web-curated datasets. One widely adopted strategy assigns quality scores such as CLIP similarity for each sample and retains the data pairs with the highest scores. However, these approaches are agnostic of data distribution and always fail to select the most informative samples. To solve this problem, we propose a simple yet theoretically principled metric named Variance Alignment Score (VAS), which has the form $\langle Σ_{\text{test}}, Σ_i\rangle$. Here, $Σ_{\text{test}}$ represents the target (cross-)covariance matrix we aim to align, potentially based on prior knowledge, while $Σ_i$ denotes the tensor product of single or multi-modal representations for the $i$-th sample. We further design a new data selection method that maximizes the total VAS. We provide theoretical analysis in a simplified setting to demonstrate the theoretical advantage of VAS over random or other existing data selection. Experimentally, applying VAS and CLIP scores together can outperform baselines by a margin of $1.3\%$ average on 38 evaluation sets for noisy dataset DataComp and $2.5\%$ on VTAB for high-quality dataset CC12M. Additionally, our ablation study also shows visual features are better than text for calculating VAS, and the related classical experimental design methods may fail under this context.
title	Variance Alignment Score: A Simple But Tough-to-Beat Data Selection Method for Multimodal Contrastive Learning
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2402.02055

Similar Items