Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Dinh, Duy-Tai, Fujinami, Tsutomu, Huynh, Van-Nam
Formato:	Preprint
Publicado:	2025
Materias:	Machine Learning
Acceso en línea:	https://arxiv.org/abs/2501.15542
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866915122983731200
author	Dinh, Duy-Tai Fujinami, Tsutomu Huynh, Van-Nam
author_facet	Dinh, Duy-Tai Fujinami, Tsutomu Huynh, Van-Nam
contents	The problem of estimating the number of clusters (say k) is one of the major challenges for the partitional clustering. This paper proposes an algorithm named k-SCC to estimate the optimal k in categorical data clustering. For the clustering step, the algorithm uses the kernel density estimation approach to define cluster centers. In addition, it uses an information-theoretic based dissimilarity to measure the distance between centers and objects in each cluster. The silhouette analysis based approach is then used to evaluate the quality of different clustering obtained in the former step to choose the best k. Comparative experiments were conducted on both synthetic and real datasets to compare the performance of k-SCC with three other algorithms. Experimental results show that k-SCC outperforms the compared algorithms in determining the number of clusters for each dataset.
format	Preprint
id	arxiv_https___arxiv_org_abs_2501_15542
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient Dinh, Duy-Tai Fujinami, Tsutomu Huynh, Van-Nam Machine Learning The problem of estimating the number of clusters (say k) is one of the major challenges for the partitional clustering. This paper proposes an algorithm named k-SCC to estimate the optimal k in categorical data clustering. For the clustering step, the algorithm uses the kernel density estimation approach to define cluster centers. In addition, it uses an information-theoretic based dissimilarity to measure the distance between centers and objects in each cluster. The silhouette analysis based approach is then used to evaluate the quality of different clustering obtained in the former step to choose the best k. Comparative experiments were conducted on both synthetic and real datasets to compare the performance of k-SCC with three other algorithms. Experimental results show that k-SCC outperforms the compared algorithms in determining the number of clusters for each dataset.
title	Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient
topic	Machine Learning
url	https://arxiv.org/abs/2501.15542

Ejemplares similares