Saved in:
Bibliographic Details
Main Authors: Zhang, Bingyuan, Terada, Yoshikazu
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.24012
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915219259785216
author Zhang, Bingyuan
Terada, Yoshikazu
author_facet Zhang, Bingyuan
Terada, Yoshikazu
contents Convex clustering is a modern clustering framework that guarantees globally optimal solutions and performs comparably to other advanced clustering methods. However, obtaining a complete dendrogram (clusterpath) for large-scale datasets remains computationally challenging due to the extensive costs associated with iterative optimization approaches. To address this limitation, we develop a novel convex clustering algorithm called Tree-Guided $L_1$-Convex Clustering (TGCC). We first focus on the fact that the loss function of $L_1$-convex clustering with tree-structured weights can be efficiently optimized using a dynamic programming approach. We then develop an efficient cluster fusion algorithm that utilizes the tree structure of the weights to accelerate the optimization process and eliminate the issue of cluster splits commonly observed in convex clustering. By combining the dynamic programming approach with the cluster fusion algorithm, the TGCC algorithm achieves superior computational efficiency without sacrificing clustering performance. Remarkably, our TGCC algorithm can construct a complete clusterpath for $10^6$ points in $\mathbb{R}^2$ within 15 seconds on a standard laptop without the need for parallel or distributed computing frameworks. Moreover, we extend the TGCC algorithm to develop biclustering and sparse convex clustering algorithms.
format Preprint
id arxiv_https___arxiv_org_abs_2503_24012
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Tree-Guided $L_1$-Convex Clustering
Zhang, Bingyuan
Terada, Yoshikazu
Machine Learning
Computation
Convex clustering is a modern clustering framework that guarantees globally optimal solutions and performs comparably to other advanced clustering methods. However, obtaining a complete dendrogram (clusterpath) for large-scale datasets remains computationally challenging due to the extensive costs associated with iterative optimization approaches. To address this limitation, we develop a novel convex clustering algorithm called Tree-Guided $L_1$-Convex Clustering (TGCC). We first focus on the fact that the loss function of $L_1$-convex clustering with tree-structured weights can be efficiently optimized using a dynamic programming approach. We then develop an efficient cluster fusion algorithm that utilizes the tree structure of the weights to accelerate the optimization process and eliminate the issue of cluster splits commonly observed in convex clustering. By combining the dynamic programming approach with the cluster fusion algorithm, the TGCC algorithm achieves superior computational efficiency without sacrificing clustering performance. Remarkably, our TGCC algorithm can construct a complete clusterpath for $10^6$ points in $\mathbb{R}^2$ within 15 seconds on a standard laptop without the need for parallel or distributed computing frameworks. Moreover, we extend the TGCC algorithm to develop biclustering and sparse convex clustering algorithms.
title Tree-Guided $L_1$-Convex Clustering
topic Machine Learning
Computation
url https://arxiv.org/abs/2503.24012