Saved in:
Bibliographic Details
Main Authors: Bhaskara, Aditya, Mahabadi, Sepideh, Pittu, Madhusudhan Reddy, Vakilian, Ali, Woodruff, David P.
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.20883
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908343183867904
author Bhaskara, Aditya
Mahabadi, Sepideh
Pittu, Madhusudhan Reddy
Vakilian, Ali
Woodruff, David P.
author_facet Bhaskara, Aditya
Mahabadi, Sepideh
Pittu, Madhusudhan Reddy
Vakilian, Ali
Woodruff, David P.
contents In this paper we study constrained subspace approximation problem. Given a set of $n$ points $\{a_1,\ldots,a_n\}$ in $\mathbb{R}^d$, the goal of the {\em subspace approximation} problem is to find a $k$ dimensional subspace that best approximates the input points. More precisely, for a given $p\geq 1$, we aim to minimize the $p$th power of the $\ell_p$ norm of the error vector $(\|a_1-\bm{P}a_1\|,\ldots,\|a_n-\bm{P}a_n\|)$, where $\bm{P}$ denotes the projection matrix onto the subspace and the norms are Euclidean. In \emph{constrained} subspace approximation (CSA), we additionally have constraints on the projection matrix $\bm{P}$. In its most general form, we require $\bm{P}$ to belong to a given subset $\mathcal{S}$ that is described explicitly or implicitly. We introduce a general framework for constrained subspace approximation. Our approach, that we term coreset-guess-solve, yields either $(1+\varepsilon)$-multiplicative or $\varepsilon$-additive approximations for a variety of constraints. We show that it provides new algorithms for partition-constrained subspace approximation with applications to {\it fair} subspace approximation, $k$-means clustering, and projected non-negative matrix factorization, among others. Specifically, while we reconstruct the best known bounds for $k$-means clustering in Euclidean spaces, we improve the known results for the remainder of the problems.
format Preprint
id arxiv_https___arxiv_org_abs_2504_20883
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Guessing Efficiently for Constrained Subspace Approximation
Bhaskara, Aditya
Mahabadi, Sepideh
Pittu, Madhusudhan Reddy
Vakilian, Ali
Woodruff, David P.
Data Structures and Algorithms
Machine Learning
In this paper we study constrained subspace approximation problem. Given a set of $n$ points $\{a_1,\ldots,a_n\}$ in $\mathbb{R}^d$, the goal of the {\em subspace approximation} problem is to find a $k$ dimensional subspace that best approximates the input points. More precisely, for a given $p\geq 1$, we aim to minimize the $p$th power of the $\ell_p$ norm of the error vector $(\|a_1-\bm{P}a_1\|,\ldots,\|a_n-\bm{P}a_n\|)$, where $\bm{P}$ denotes the projection matrix onto the subspace and the norms are Euclidean. In \emph{constrained} subspace approximation (CSA), we additionally have constraints on the projection matrix $\bm{P}$. In its most general form, we require $\bm{P}$ to belong to a given subset $\mathcal{S}$ that is described explicitly or implicitly. We introduce a general framework for constrained subspace approximation. Our approach, that we term coreset-guess-solve, yields either $(1+\varepsilon)$-multiplicative or $\varepsilon$-additive approximations for a variety of constraints. We show that it provides new algorithms for partition-constrained subspace approximation with applications to {\it fair} subspace approximation, $k$-means clustering, and projected non-negative matrix factorization, among others. Specifically, while we reconstruct the best known bounds for $k$-means clustering in Euclidean spaces, we improve the known results for the remainder of the problems.
title Guessing Efficiently for Constrained Subspace Approximation
topic Data Structures and Algorithms
Machine Learning
url https://arxiv.org/abs/2504.20883