MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Tinati, Mohammad, Tu, Stephen
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Machine Learning
Accesso online:	https://arxiv.org/abs/2603.27631
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866915897445187584
author	Tinati, Mohammad Tu, Stephen
author_facet	Tinati, Mohammad Tu, Stephen
contents	Self-supervised pre-training, where large corpora of unlabeled data are used to learn representations for downstream fine-tuning, has become a cornerstone of modern machine learning. While a growing body of theoretical work has begun to analyze this paradigm, existing bounds leave open the question of how sharp the current rates are, and whether they accurately capture the complex interaction between pre-training and fine-tuning. In this paper, we address this gap by developing an asymptotic theory of pre-training via two-stage M-estimation. A key challenge is that the pre-training estimator is often identifiable only up to a group symmetry, a feature common in representation learning that requires careful treatment. We address this issue using tools from Riemannian geometry to study the intrinsic parameters of the pre-training representation, which we link with the downstream predictor through a notion of orbit-invariance, precisely characterizing the limiting distribution of the downstream test risk. We apply our main result to several case studies, including spectral pre-training, factor models, and Gaussian mixture models, and obtain substantial improvements in problem-specific factors over prior art when applicable.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_27631
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	On the Asymptotics of Self-Supervised Pre-training: Two-Stage M-Estimation and Representation Symmetry Tinati, Mohammad Tu, Stephen Machine Learning Self-supervised pre-training, where large corpora of unlabeled data are used to learn representations for downstream fine-tuning, has become a cornerstone of modern machine learning. While a growing body of theoretical work has begun to analyze this paradigm, existing bounds leave open the question of how sharp the current rates are, and whether they accurately capture the complex interaction between pre-training and fine-tuning. In this paper, we address this gap by developing an asymptotic theory of pre-training via two-stage M-estimation. A key challenge is that the pre-training estimator is often identifiable only up to a group symmetry, a feature common in representation learning that requires careful treatment. We address this issue using tools from Riemannian geometry to study the intrinsic parameters of the pre-training representation, which we link with the downstream predictor through a notion of orbit-invariance, precisely characterizing the limiting distribution of the downstream test risk. We apply our main result to several case studies, including spectral pre-training, factor models, and Gaussian mixture models, and obtain substantial improvements in problem-specific factors over prior art when applicable.
title	On the Asymptotics of Self-Supervised Pre-training: Two-Stage M-Estimation and Representation Symmetry
topic	Machine Learning
url	https://arxiv.org/abs/2603.27631

Documenti analoghi