Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chen, Yin-Jen, Tang, Minh
Format:	Preprint
Published:	2021
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2110.01950
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918331467956224
author	Chen, Yin-Jen Tang, Minh
author_facet	Chen, Yin-Jen Tang, Minh
contents	We study the classification problem for high-dimensional data with $n$ observations on $p$ features where the $p \times p$ covariance matrix $Σ$ exhibits a spiked eigenvalue structure and the vector $ζ$, given by the difference between the {\em whitened} mean vectors, is sparse. We analyze an adaptive classifier (adaptive with respect to the sparsity $s$) that first performs dimension reduction on the feature vectors prior to classification in the dimensionally reduced space, i.e., the classifier whitens the data, then screens the features by keeping only those corresponding to the $s$ largest coordinates of $ζ$ and finally applies Fisher linear discriminant on the selected features. Leveraging recent results on entrywise matrix perturbation bounds for covariance matrices, we show that the resulting classifier is Bayes optimal whenever $n \rightarrow \infty$ and $s \sqrt{n^{-1} \ln p} \rightarrow 0$. Notably, our theory also guarantees Bayes optimality for the corresponding quadratic discriminant analysis (QDA). Experimental results on real and synthetic data further indicate that the proposed approach is competitive with state-of-the-art methods while operating on a substantially lower-dimensional representation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2110_01950
institution	arXiv
publishDate	2021
record_format	arxiv
spellingShingle	Classification of high-dimensional data with spiked covariance matrix structure Chen, Yin-Jen Tang, Minh Machine Learning We study the classification problem for high-dimensional data with $n$ observations on $p$ features where the $p \times p$ covariance matrix $Σ$ exhibits a spiked eigenvalue structure and the vector $ζ$, given by the difference between the {\em whitened} mean vectors, is sparse. We analyze an adaptive classifier (adaptive with respect to the sparsity $s$) that first performs dimension reduction on the feature vectors prior to classification in the dimensionally reduced space, i.e., the classifier whitens the data, then screens the features by keeping only those corresponding to the $s$ largest coordinates of $ζ$ and finally applies Fisher linear discriminant on the selected features. Leveraging recent results on entrywise matrix perturbation bounds for covariance matrices, we show that the resulting classifier is Bayes optimal whenever $n \rightarrow \infty$ and $s \sqrt{n^{-1} \ln p} \rightarrow 0$. Notably, our theory also guarantees Bayes optimality for the corresponding quadratic discriminant analysis (QDA). Experimental results on real and synthetic data further indicate that the proposed approach is competitive with state-of-the-art methods while operating on a substantially lower-dimensional representation.
title	Classification of high-dimensional data with spiked covariance matrix structure
topic	Machine Learning
url	https://arxiv.org/abs/2110.01950

Similar Items