Saved in:
Bibliographic Details
Main Authors: Liu, Lang, Mehta, Ronak, Pal, Soumik, Harchaoui, Zaid
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2408.15065
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910821096882176
author Liu, Lang
Mehta, Ronak
Pal, Soumik
Harchaoui, Zaid
author_facet Liu, Lang
Mehta, Ronak
Pal, Soumik
Harchaoui, Zaid
contents Data balancing across multiple modalities and sources appears in various forms in foundation models in machine learning and AI, e.g. in CLIP and DINO. We show that data balancing across modalities and sources actually offers an unsuspected benefit: variance reduction. We present a non-asymptotic statistical bound that quantifies this variance reduction effect and relates it to the eigenvalue decay of Markov operators. Furthermore, we describe how various forms of data balancing in contrastive multimodal learning and self-supervised clustering can be better understood, and even improved upon, owing to our variance reduction viewpoint.
format Preprint
id arxiv_https___arxiv_org_abs_2408_15065
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle The Benefits of Balance: From Information Projections to Variance Reduction
Liu, Lang
Mehta, Ronak
Pal, Soumik
Harchaoui, Zaid
Machine Learning
Statistics Theory
Data balancing across multiple modalities and sources appears in various forms in foundation models in machine learning and AI, e.g. in CLIP and DINO. We show that data balancing across modalities and sources actually offers an unsuspected benefit: variance reduction. We present a non-asymptotic statistical bound that quantifies this variance reduction effect and relates it to the eigenvalue decay of Markov operators. Furthermore, we describe how various forms of data balancing in contrastive multimodal learning and self-supervised clustering can be better understood, and even improved upon, owing to our variance reduction viewpoint.
title The Benefits of Balance: From Information Projections to Variance Reduction
topic Machine Learning
Statistics Theory
url https://arxiv.org/abs/2408.15065