Enregistré dans:
Détails bibliographiques
Auteur principal: He, Even
Format: Preprint
Publié: 2025
Sujets:
Accès en ligne:https://arxiv.org/abs/2512.17632
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866914233694814208
author He, Even
author_facet He, Even
contents High-dimensional covariance estimation is notoriously sensitive to outliers. While statistically optimal estimators exist for general heavy-tailed distributions, they often rely on computationally expensive techniques like semidefinite programming or iterative M-estimation ($O(d^3)$). In this work, we target the specific regime of \textbf{Sub-Weibull distributions} (characterized by stretched exponential tails $\exp(-t^α)$). We investigate a computationally efficient alternative: the \textbf{Cross-Fitted Norm-Truncated Estimator}. Unlike element-wise truncation, our approach preserves the spectral geometry while requiring $O(Nd^2)$ operations, which represents the theoretical lower bound for constructing a full covariance matrix. Although spherical truncation is geometrically suboptimal for anisotropic data, we prove that within the Sub-Weibull class, the exponential tail decay compensates for this mismatch. Leveraging weighted Hanson-Wright inequalities, we derive non-asymptotic error bounds showing that our estimator recovers the optimal sub-Gaussian rate $\tilde{O}(\sqrt{r(Σ)/N})$ with high probability. This provides a scalable solution for high-dimensional data that exhibits tails heavier than Gaussian but lighter than polynomial decay.
format Preprint
id arxiv_https___arxiv_org_abs_2512_17632
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Fast and Robust: Computationally Efficient Covariance Estimation for Sub-Weibull Vectors
He, Even
Machine Learning
Computation
High-dimensional covariance estimation is notoriously sensitive to outliers. While statistically optimal estimators exist for general heavy-tailed distributions, they often rely on computationally expensive techniques like semidefinite programming or iterative M-estimation ($O(d^3)$). In this work, we target the specific regime of \textbf{Sub-Weibull distributions} (characterized by stretched exponential tails $\exp(-t^α)$). We investigate a computationally efficient alternative: the \textbf{Cross-Fitted Norm-Truncated Estimator}. Unlike element-wise truncation, our approach preserves the spectral geometry while requiring $O(Nd^2)$ operations, which represents the theoretical lower bound for constructing a full covariance matrix. Although spherical truncation is geometrically suboptimal for anisotropic data, we prove that within the Sub-Weibull class, the exponential tail decay compensates for this mismatch. Leveraging weighted Hanson-Wright inequalities, we derive non-asymptotic error bounds showing that our estimator recovers the optimal sub-Gaussian rate $\tilde{O}(\sqrt{r(Σ)/N})$ with high probability. This provides a scalable solution for high-dimensional data that exhibits tails heavier than Gaussian but lighter than polynomial decay.
title Fast and Robust: Computationally Efficient Covariance Estimation for Sub-Weibull Vectors
topic Machine Learning
Computation
url https://arxiv.org/abs/2512.17632