Saved in:
Bibliographic Details
Main Authors: Kumar, Syamantak, Sarkar, Purnamrita, Tian, Kevin, Zhang, Peiyuan
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.02607
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917308718383104
author Kumar, Syamantak
Sarkar, Purnamrita
Tian, Kevin
Zhang, Peiyuan
author_facet Kumar, Syamantak
Sarkar, Purnamrita
Tian, Kevin
Zhang, Peiyuan
contents Sparse PCA is one of the most well-studied problems in high-dimensional statistics. In this problem, we are given samples from a distribution with covariance $Σ$, whose top eigenvector $v \in R^d$ is $s$-sparse. Existing sparse PCA algorithms can be broadly categorized into (1) combinatorial algorithms (e.g., diagonal or elementwise covariance thresholding) and (2) SDP-based algorithms. While combinatorial algorithms are much simpler, they are typically only analyzed under the spiked identity model (where $Σ= I_d + γvv^\top$ for some $γ> 0$), whereas SDP-based algorithms require no additional assumptions on $Σ$. We demonstrate explicit counterexample covariances $Σ$ against the success of standard combinatorial algorithms for sparse PCA, when moving beyond the spiked identity model. In light of this discrepancy, we give the first combinatorial method for sparse PCA that provably succeeds for general $Σ$ using $s^2 \cdot \mathrm{polylog}(d)$ samples and $d^2 \cdot \mathrm{poly}(s, \log(d))$ time, by providing a global convergence guarantee on a variant of the truncated power method of Yuan and Zhang (2013). We provide a natural generalization of our method to recovering a vector in a sparse leading eigenspace. Finally, we evaluate our method on synthetic and real-world sparse PCA datasets.
format Preprint
id arxiv_https___arxiv_org_abs_2603_02607
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Combinatorial Sparse PCA Beyond the Spiked Identity Model
Kumar, Syamantak
Sarkar, Purnamrita
Tian, Kevin
Zhang, Peiyuan
Machine Learning
Data Structures and Algorithms
Optimization and Control
Sparse PCA is one of the most well-studied problems in high-dimensional statistics. In this problem, we are given samples from a distribution with covariance $Σ$, whose top eigenvector $v \in R^d$ is $s$-sparse. Existing sparse PCA algorithms can be broadly categorized into (1) combinatorial algorithms (e.g., diagonal or elementwise covariance thresholding) and (2) SDP-based algorithms. While combinatorial algorithms are much simpler, they are typically only analyzed under the spiked identity model (where $Σ= I_d + γvv^\top$ for some $γ> 0$), whereas SDP-based algorithms require no additional assumptions on $Σ$. We demonstrate explicit counterexample covariances $Σ$ against the success of standard combinatorial algorithms for sparse PCA, when moving beyond the spiked identity model. In light of this discrepancy, we give the first combinatorial method for sparse PCA that provably succeeds for general $Σ$ using $s^2 \cdot \mathrm{polylog}(d)$ samples and $d^2 \cdot \mathrm{poly}(s, \log(d))$ time, by providing a global convergence guarantee on a variant of the truncated power method of Yuan and Zhang (2013). We provide a natural generalization of our method to recovering a vector in a sparse leading eigenspace. Finally, we evaluate our method on synthetic and real-world sparse PCA datasets.
title Combinatorial Sparse PCA Beyond the Spiked Identity Model
topic Machine Learning
Data Structures and Algorithms
Optimization and Control
url https://arxiv.org/abs/2603.02607