Saved in:
| Main Author: | |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.13339 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Table of Contents:
- Density-based clustering algorithms like DBSCAN and HDBSCAN are foundational tools for discovering arbitrarily shaped clusters, yet their practical utility is undermined by acute hyperparameter sensitivity -- parameters tuned on one dataset frequently fail to transfer to others, requiring expensive re-optimization for each deployment. We introduce AdaBox (Adaptive Density-Based Box Clustering), a grid-based density clustering algorithm designed for robustness across diverse data geometries. AdaBox features a six-parameter design where parameters capture cluster structure rather than pairwise point relationships. Four parameters are inherently scale-invariant, one self-corrects for sampling bias, and one is adjusted via a density scaling stage, enabling reliable parameter transfer across 30-200x scale factors. AdaBox processes data through five stages: adaptive grid construction, liberal seed initialization, iterative growth with graduation, statistical cluster merging, and Gaussian boundary refinement. Comprehensive evaluation across 111 datasets demonstrates three key findings: (1) AdaBox significantly outperforms DBSCAN and HDBSCAN across five evaluation metrics, achieving the best score on 78\% of datasets with p < 0.05; (2) AdaBox uniquely exhibits parameter generalization. Protocol A (direct transfer to 30-100x larger datasets) shows AdaBox maintains performance while baselines collapse. (3) Ablation studies confirm the necessity of all five architectural stages for maintaining robustness.