Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.19285 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866916592518955008 |
|---|---|
| author | de Mathelin, Antoine Cecchi, Nicolas Enrique Deheeger, François Mougeot, Mathilde Vayatis, Nicolas |
| author_facet | de Mathelin, Antoine Cecchi, Nicolas Enrique Deheeger, François Mougeot, Mathilde Vayatis, Nicolas |
| contents | This paper proposes a novel k-medoids approximation algorithm to handle large-scale datasets with reasonable computational time and memory complexity. We develop a local-search algorithm that iteratively improves the medoid selection based on the estimation of the k-medoids objective. A single batch of size m << n provides the estimation, which reduces the required memory size and the number of pairwise dissimilarities computations to O(mn), instead of O(n^2) compared to most k-medoids baselines. We obtain theoretical results highlighting that a batch of size m = O(log(n)) is sufficient to guarantee, with strong probability, the same performance as the original local-search algorithm. Multiple experiments conducted on real datasets of various sizes and dimensions show that our algorithm provides similar performances as state-of-the-art methods such as FasterPAM and BanditPAM++ with a drastically reduced running time. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2501_19285 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | OneBatchPAM: A Fast and Frugal K-Medoids Algorithm de Mathelin, Antoine Cecchi, Nicolas Enrique Deheeger, François Mougeot, Mathilde Vayatis, Nicolas Machine Learning This paper proposes a novel k-medoids approximation algorithm to handle large-scale datasets with reasonable computational time and memory complexity. We develop a local-search algorithm that iteratively improves the medoid selection based on the estimation of the k-medoids objective. A single batch of size m << n provides the estimation, which reduces the required memory size and the number of pairwise dissimilarities computations to O(mn), instead of O(n^2) compared to most k-medoids baselines. We obtain theoretical results highlighting that a batch of size m = O(log(n)) is sufficient to guarantee, with strong probability, the same performance as the original local-search algorithm. Multiple experiments conducted on real datasets of various sizes and dimensions show that our algorithm provides similar performances as state-of-the-art methods such as FasterPAM and BanditPAM++ with a drastically reduced running time. |
| title | OneBatchPAM: A Fast and Frugal K-Medoids Algorithm |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2501.19285 |