Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kamgar-Parsi, Behzad, Kamgar-Parsi, Behrooz
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Computer Vision and Pattern Recognition 68 I.5.3
Online Access:	https://arxiv.org/abs/2505.22991
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Finding the number of meaningful clusters in an unlabeled dataset is important in many applications. Regularized k-means algorithm is a possible approach frequently used to find the correct number of distinct clusters in datasets. The most common formulation of the regularization function is the additive linear term $λk$, where $k$ is the number of clusters and $λ$ a positive coefficient. Currently, there are no principled guidelines for setting a value for the critical hyperparameter $λ$. In this paper, we derive rigorous bounds for $λ$ assuming clusters are {\em ideal}. Ideal clusters (defined as $d$-dimensional spheres with identical radii) are close proxies for k-means clusters ($d$-dimensional spherically symmetric distributions with identical standard deviations). Experiments show that the k-means algorithm with additive regularizer often yields multiple solutions. Thus, we also analyze k-means algorithm with multiplicative regularizer. The consensus among k-means solutions with additive and multiplicative regularizations reduces the ambiguity of multiple solutions in certain cases. We also present selected experiments that demonstrate performance of the regularized k-means algorithms as clusters deviate from the ideal assumption.

Similar Items