Zapisane w:
Opis bibliograficzny
Główni autorzy: Jourdan, Ben, Schwartzman, Gregory
Format: Preprint
Wydane: 2024
Hasła przedmiotowe:
Dostęp online:https://arxiv.org/abs/2410.05902
Etykiety: Dodaj etykietę
Nie ma etykietki, Dołącz pierwszą etykiete!
Spis treści:
  • We present the first mini-batch kernel $k$-means algorithm, offering an order of magnitude improvement in running time compared to the full batch algorithm. A single iteration of our algorithm takes $\widetilde{O}(kb^2)$ time, significantly faster than the $O(n^2)$ time required by the full batch kernel $k$-means, where $n$ is the dataset size and $b$ is the batch size. Extensive experiments demonstrate that our algorithm consistently achieves a 10-100x speedup with minimal loss in quality, addressing the slow runtime that has limited kernel $k$-means adoption in practice. We further complement these results with a theoretical analysis under an early stopping condition, proving that with a batch size of $\widetildeΩ(\max \{γ^{4}, γ^{2}\} \cdot ε^{-2})$, the algorithm terminates in $O(γ^2/ε)$ iterations with high probability, where $γ$ bounds the norm of points in feature space and $ε$ is a termination threshold. Our analysis holds for any reasonable center initialization, and when using $k$-means++ initialization, the algorithm achieves an approximation ratio of $O(\log k)$ in expectation. For normalized kernels, such as Gaussian or Laplacian it holds that $γ=1$. Taking $ε= O(1)$ and $b=Θ(\log n)$, the algorithm terminates in $O(1)$ iterations, with each iteration running in $\widetilde{O}(k)$ time.