Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Wang, Changhu, Chen, Zihao, Xi, Ruibin
Format: Preprint
Veröffentlicht: 2023
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2306.12671
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866929231337881600
author Wang, Changhu
Chen, Zihao
Xi, Ruibin
author_facet Wang, Changhu
Chen, Zihao
Xi, Ruibin
contents In this paper, we consider feature screening for ultrahigh dimensional clustering analyses. Based on the observation that the marginal distribution of any given feature is a mixture of its conditional distributions in different clusters, we propose to screen clustering features by independently evaluating the homogeneity of each feature's mixture distribution. Important cluster-relevant features have heterogeneous components in their mixture distributions and unimportant features have homogeneous components. The well-known EM-test statistic is used to evaluate the homogeneity. Under general parametric settings, we establish the tail probability bounds of the EM-test statistic for the homogeneous and heterogeneous features, and further show that the proposed screening procedure can achieve the sure independent screening and even the consistency in selection properties. Limiting distribution of the EM-test statistic is also obtained for general parametric distributions. The proposed method is computationally efficient, can accurately screen for important cluster-relevant features and help to significantly improve clustering, as demonstrated in our extensive simulation and real data analyses.
format Preprint
id arxiv_https___arxiv_org_abs_2306_12671
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Feature screening for clustering analysis
Wang, Changhu
Chen, Zihao
Xi, Ruibin
Methodology
In this paper, we consider feature screening for ultrahigh dimensional clustering analyses. Based on the observation that the marginal distribution of any given feature is a mixture of its conditional distributions in different clusters, we propose to screen clustering features by independently evaluating the homogeneity of each feature's mixture distribution. Important cluster-relevant features have heterogeneous components in their mixture distributions and unimportant features have homogeneous components. The well-known EM-test statistic is used to evaluate the homogeneity. Under general parametric settings, we establish the tail probability bounds of the EM-test statistic for the homogeneous and heterogeneous features, and further show that the proposed screening procedure can achieve the sure independent screening and even the consistency in selection properties. Limiting distribution of the EM-test statistic is also obtained for general parametric distributions. The proposed method is computationally efficient, can accurately screen for important cluster-relevant features and help to significantly improve clustering, as demonstrated in our extensive simulation and real data analyses.
title Feature screening for clustering analysis
topic Methodology
url https://arxiv.org/abs/2306.12671