Saved in:
Bibliographic Details
Main Authors: Chen, Yewang, Li, Junfeng, Xia, Shuyin, Lai, Qinghong, Gao, Xinbo, Wang, Guoyin, Cheng, Dongdong, Liu, Yi, Wang, Yi
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.23742
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918149878710272
author Chen, Yewang
Li, Junfeng
Xia, Shuyin
Lai, Qinghong
Gao, Xinbo
Wang, Guoyin
Cheng, Dongdong
Liu, Yi
Wang, Yi
author_facet Chen, Yewang
Li, Junfeng
Xia, Shuyin
Lai, Qinghong
Gao, Xinbo
Wang, Guoyin
Cheng, Dongdong
Liu, Yi
Wang, Yi
contents To effectively handle clustering task for large-scale datasets, we propose a novel scalable skeleton clustering algorithm, namely GBSK, which leverages the granular-ball technique to capture the underlying structure of data. By multi-sampling the dataset and constructing multi-grained granular-balls, GBSK progressively uncovers a statistical "skeleton" -- a spatial abstraction that approximates the essential structure and distribution of the original data. This strategy enables GBSK to dramatically reduce computational overhead while maintaining high clustering accuracy. In addition, we introduce an adaptive version, AGBSK, with simplified parameter settings to enhance usability and facilitate deployment in real-world scenarios. Extensive experiments conducted on standard computing hardware demonstrate that GBSK achieves high efficiency and strong clustering performance on large-scale datasets, including one with up to 100 million instances across 256 dimensions. Our implementation and experimental results are available at: https://github.com/XFastDataLab/GBSK/.
format Preprint
id arxiv_https___arxiv_org_abs_2509_23742
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle GBSK: Skeleton Clustering via Granular-ball Computing and Multi-Sampling for Large-Scale Data
Chen, Yewang
Li, Junfeng
Xia, Shuyin
Lai, Qinghong
Gao, Xinbo
Wang, Guoyin
Cheng, Dongdong
Liu, Yi
Wang, Yi
Machine Learning
Computer Vision and Pattern Recognition
Information Retrieval
To effectively handle clustering task for large-scale datasets, we propose a novel scalable skeleton clustering algorithm, namely GBSK, which leverages the granular-ball technique to capture the underlying structure of data. By multi-sampling the dataset and constructing multi-grained granular-balls, GBSK progressively uncovers a statistical "skeleton" -- a spatial abstraction that approximates the essential structure and distribution of the original data. This strategy enables GBSK to dramatically reduce computational overhead while maintaining high clustering accuracy. In addition, we introduce an adaptive version, AGBSK, with simplified parameter settings to enhance usability and facilitate deployment in real-world scenarios. Extensive experiments conducted on standard computing hardware demonstrate that GBSK achieves high efficiency and strong clustering performance on large-scale datasets, including one with up to 100 million instances across 256 dimensions. Our implementation and experimental results are available at: https://github.com/XFastDataLab/GBSK/.
title GBSK: Skeleton Clustering via Granular-ball Computing and Multi-Sampling for Large-Scale Data
topic Machine Learning
Computer Vision and Pattern Recognition
Information Retrieval
url https://arxiv.org/abs/2509.23742