Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	de Mathelin, Antoine, Cecchi, Nicolas Enrique, Deheeger, François, Mougeot, Mathilde, Vayatis, Nicolas
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2501.19285
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916592518955008
author	de Mathelin, Antoine Cecchi, Nicolas Enrique Deheeger, François Mougeot, Mathilde Vayatis, Nicolas
author_facet	de Mathelin, Antoine Cecchi, Nicolas Enrique Deheeger, François Mougeot, Mathilde Vayatis, Nicolas
contents	This paper proposes a novel k-medoids approximation algorithm to handle large-scale datasets with reasonable computational time and memory complexity. We develop a local-search algorithm that iteratively improves the medoid selection based on the estimation of the k-medoids objective. A single batch of size m << n provides the estimation, which reduces the required memory size and the number of pairwise dissimilarities computations to O(mn), instead of O(n^2) compared to most k-medoids baselines. We obtain theoretical results highlighting that a batch of size m = O(log(n)) is sufficient to guarantee, with strong probability, the same performance as the original local-search algorithm. Multiple experiments conducted on real datasets of various sizes and dimensions show that our algorithm provides similar performances as state-of-the-art methods such as FasterPAM and BanditPAM++ with a drastically reduced running time.
format	Preprint
id	arxiv_https___arxiv_org_abs_2501_19285
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	OneBatchPAM: A Fast and Frugal K-Medoids Algorithm de Mathelin, Antoine Cecchi, Nicolas Enrique Deheeger, François Mougeot, Mathilde Vayatis, Nicolas Machine Learning This paper proposes a novel k-medoids approximation algorithm to handle large-scale datasets with reasonable computational time and memory complexity. We develop a local-search algorithm that iteratively improves the medoid selection based on the estimation of the k-medoids objective. A single batch of size m << n provides the estimation, which reduces the required memory size and the number of pairwise dissimilarities computations to O(mn), instead of O(n^2) compared to most k-medoids baselines. We obtain theoretical results highlighting that a batch of size m = O(log(n)) is sufficient to guarantee, with strong probability, the same performance as the original local-search algorithm. Multiple experiments conducted on real datasets of various sizes and dimensions show that our algorithm provides similar performances as state-of-the-art methods such as FasterPAM and BanditPAM++ with a drastically reduced running time.
title	OneBatchPAM: A Fast and Frugal K-Medoids Algorithm
topic	Machine Learning
url	https://arxiv.org/abs/2501.19285

Similar Items