Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kumar, Syamantak, Sarkar, Purnamrita, Tian, Kevin, Zhang, Peiyuan
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Data Structures and Algorithms Optimization and Control
Online Access:	https://arxiv.org/abs/2603.02607
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917308718383104
author	Kumar, Syamantak Sarkar, Purnamrita Tian, Kevin Zhang, Peiyuan
author_facet	Kumar, Syamantak Sarkar, Purnamrita Tian, Kevin Zhang, Peiyuan
contents	Sparse PCA is one of the most well-studied problems in high-dimensional statistics. In this problem, we are given samples from a distribution with covariance $Σ$, whose top eigenvector $v \in R^d$ is $s$-sparse. Existing sparse PCA algorithms can be broadly categorized into (1) combinatorial algorithms (e.g., diagonal or elementwise covariance thresholding) and (2) SDP-based algorithms. While combinatorial algorithms are much simpler, they are typically only analyzed under the spiked identity model (where $Σ= I_d + γvv^\top$ for some $γ> 0$), whereas SDP-based algorithms require no additional assumptions on $Σ$. We demonstrate explicit counterexample covariances $Σ$ against the success of standard combinatorial algorithms for sparse PCA, when moving beyond the spiked identity model. In light of this discrepancy, we give the first combinatorial method for sparse PCA that provably succeeds for general $Σ$ using $s^2 \cdot \mathrm{polylog}(d)$ samples and $d^2 \cdot \mathrm{poly}(s, \log(d))$ time, by providing a global convergence guarantee on a variant of the truncated power method of Yuan and Zhang (2013). We provide a natural generalization of our method to recovering a vector in a sparse leading eigenspace. Finally, we evaluate our method on synthetic and real-world sparse PCA datasets.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_02607
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Combinatorial Sparse PCA Beyond the Spiked Identity Model Kumar, Syamantak Sarkar, Purnamrita Tian, Kevin Zhang, Peiyuan Machine Learning Data Structures and Algorithms Optimization and Control Sparse PCA is one of the most well-studied problems in high-dimensional statistics. In this problem, we are given samples from a distribution with covariance $Σ$, whose top eigenvector $v \in R^d$ is $s$-sparse. Existing sparse PCA algorithms can be broadly categorized into (1) combinatorial algorithms (e.g., diagonal or elementwise covariance thresholding) and (2) SDP-based algorithms. While combinatorial algorithms are much simpler, they are typically only analyzed under the spiked identity model (where $Σ= I_d + γvv^\top$ for some $γ> 0$), whereas SDP-based algorithms require no additional assumptions on $Σ$. We demonstrate explicit counterexample covariances $Σ$ against the success of standard combinatorial algorithms for sparse PCA, when moving beyond the spiked identity model. In light of this discrepancy, we give the first combinatorial method for sparse PCA that provably succeeds for general $Σ$ using $s^2 \cdot \mathrm{polylog}(d)$ samples and $d^2 \cdot \mathrm{poly}(s, \log(d))$ time, by providing a global convergence guarantee on a variant of the truncated power method of Yuan and Zhang (2013). We provide a natural generalization of our method to recovering a vector in a sparse leading eigenspace. Finally, we evaluate our method on synthetic and real-world sparse PCA datasets.
title	Combinatorial Sparse PCA Beyond the Spiked Identity Model
topic	Machine Learning Data Structures and Algorithms Optimization and Control
url	https://arxiv.org/abs/2603.02607

Similar Items