Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.20708 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866915176488370176 |
|---|---|
| author | Duchi, John Haque, Saminul Kuditipudi, Rohith |
| author_facet | Duchi, John Haque, Saminul Kuditipudi, Rohith |
| contents | Let $\mathcal{Z} = \{Z_1, \dots, Z_n\} \stackrel{\mathrm{i.i.d.}}{\sim} P \subset \mathbb{R}^d$ from a distribution $P$ with mean zero and covariance $Σ$. Given a dataset $\mathcal{X}$ such that $d_{\mathrm{ham}}(\mathcal{X}, \mathcal{Z}) \leq \varepsilon n$, we are interested in finding an efficient estimator $\widehatΣ$ that achieves $\mathrm{err}(\widehatΣ, Σ) := \|Σ^{-\frac{1}{2}}\widehatΣΣ^{-\frac{1}{2}} - I\| _{\mathrm{op}} \leq 1/2$. We focus on the low contamination regime $\varepsilon = o(1/\sqrt{d}$). In this regime, prior work required either $Ω(d^{3/2})$ samples or runtime that is exponential in $d$. We present an algorithm that, for subgaussian data, has near-linear sample complexity $n = \widetildeΩ(d)$ and runtime $O((n+d)^{ω+ \frac{1}{2}})$, where $ω$ is the matrix multiplication exponent. We also show that this algorithm works for heavy-tailed data with near-linear sample complexity, but in a smaller regime of $\varepsilon$. Concurrent to our work, Diakonikolas et al. [2024] give Sum-of-Squares estimators that achieve similar sample complexity but with large polynomial runtime. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2502_20708 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | A fast and slightly robust covariance estimator Duchi, John Haque, Saminul Kuditipudi, Rohith Data Structures and Algorithms Let $\mathcal{Z} = \{Z_1, \dots, Z_n\} \stackrel{\mathrm{i.i.d.}}{\sim} P \subset \mathbb{R}^d$ from a distribution $P$ with mean zero and covariance $Σ$. Given a dataset $\mathcal{X}$ such that $d_{\mathrm{ham}}(\mathcal{X}, \mathcal{Z}) \leq \varepsilon n$, we are interested in finding an efficient estimator $\widehatΣ$ that achieves $\mathrm{err}(\widehatΣ, Σ) := \|Σ^{-\frac{1}{2}}\widehatΣΣ^{-\frac{1}{2}} - I\| _{\mathrm{op}} \leq 1/2$. We focus on the low contamination regime $\varepsilon = o(1/\sqrt{d}$). In this regime, prior work required either $Ω(d^{3/2})$ samples or runtime that is exponential in $d$. We present an algorithm that, for subgaussian data, has near-linear sample complexity $n = \widetildeΩ(d)$ and runtime $O((n+d)^{ω+ \frac{1}{2}})$, where $ω$ is the matrix multiplication exponent. We also show that this algorithm works for heavy-tailed data with near-linear sample complexity, but in a smaller regime of $\varepsilon$. Concurrent to our work, Diakonikolas et al. [2024] give Sum-of-Squares estimators that achieve similar sample complexity but with large polynomial runtime. |
| title | A fast and slightly robust covariance estimator |
| topic | Data Structures and Algorithms |
| url | https://arxiv.org/abs/2502.20708 |