Saved in:
| Main Authors: | , |
|---|---|
| Format: | Preprint |
| Published: |
2019
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/1912.11209 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866913688484577280 |
|---|---|
| author | Singh, Vikas Verma, Nishchal K. |
| author_facet | Singh, Vikas Verma, Nishchal K. |
| contents | This paper presents a new fuzzy k-means algorithm for the clustering of high-dimensional data in various subspaces. Since high-dimensional data, some features might be irrelevant and relevant but may have different significance in the clustering process. For better clustering, it is crucial to incorporate the contribution of these features in the clustering process. To combine these features, in this paper, we have proposed a novel fuzzy k-means clustering algorithm by modifying the objective function of the fuzzy k-means using two different entropy terms. The first entropy term helps to minimize the within-cluster dispersion and maximize the negative entropy to determine clusters to contribute to the association of data points. The second entropy term helps control the weight of the features because different features have different contributing weights during the clustering to obtain a better partition. The proposed approach performance is presented in various clustering measures (AR, RI and NMI) on multiple datasets and compared with six other state-of-the-art methods. Impact Statement- In real-world applications, cluster-dependent feature weights help in partitioning the data set into more meaningful clusters. These features may be relevant, irrelevant, or redundant, but they each have different contributions during the clustering process. In this paper, a cluster-dependent feature weights approach is presented using fuzzy k-means to assign higher weights to relevant features and lower weights to irrelevant features during clustering. The method is validated using both supervised and unsupervised performance measures on real-world and synthetic datasets to demonstrate its effectiveness compared to state-of-the-art methods. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_1912_11209 |
| institution | arXiv |
| publishDate | 2019 |
| record_format | arxiv |
| spellingShingle | Variable feature weighted fuzzy k-means algorithm for high dimensional data Singh, Vikas Verma, Nishchal K. Machine Learning This paper presents a new fuzzy k-means algorithm for the clustering of high-dimensional data in various subspaces. Since high-dimensional data, some features might be irrelevant and relevant but may have different significance in the clustering process. For better clustering, it is crucial to incorporate the contribution of these features in the clustering process. To combine these features, in this paper, we have proposed a novel fuzzy k-means clustering algorithm by modifying the objective function of the fuzzy k-means using two different entropy terms. The first entropy term helps to minimize the within-cluster dispersion and maximize the negative entropy to determine clusters to contribute to the association of data points. The second entropy term helps control the weight of the features because different features have different contributing weights during the clustering to obtain a better partition. The proposed approach performance is presented in various clustering measures (AR, RI and NMI) on multiple datasets and compared with six other state-of-the-art methods. Impact Statement- In real-world applications, cluster-dependent feature weights help in partitioning the data set into more meaningful clusters. These features may be relevant, irrelevant, or redundant, but they each have different contributions during the clustering process. In this paper, a cluster-dependent feature weights approach is presented using fuzzy k-means to assign higher weights to relevant features and lower weights to irrelevant features during clustering. The method is validated using both supervised and unsupervised performance measures on real-world and synthetic datasets to demonstrate its effectiveness compared to state-of-the-art methods. |
| title | Variable feature weighted fuzzy k-means algorithm for high dimensional data |
| topic | Machine Learning |
| url | https://arxiv.org/abs/1912.11209 |