Saved in:
Bibliographic Details
Main Authors: Sokoloski, Sacha, Berens, Philipp
Format: Preprint
Published: 2022
Subjects:
Online Access:https://arxiv.org/abs/2206.04841
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908469644230656
author Sokoloski, Sacha
Berens, Philipp
author_facet Sokoloski, Sacha
Berens, Philipp
contents We introduce hierarchical mixtures of Gaussians (HMoGs), which unify dimensionality reduction and clustering into a single probabilistic model. HMoGs provide closed-form expressions for the model likelihood, exact inference over latent states and cluster membership, and exact algorithms for maximum-likelihood optimization. The novel exponential family parameterization of HMoGs greatly reduces their computational complexity relative to similar model-based methods, allowing them to efficiently model hundreds of latent dimensions, and thereby capture additional structure in high-dimensional data. We demonstrate HMoGs on synthetic experiments and MNIST, and show how joint optimization of dimensionality reduction and clustering facilitates increased model performance. We also explore how sparsity-constrained dimensionality reduction can further improve clustering performance while encouraging interpretability. By bridging classical statistical modelling with the scale of modern data and compute, HMoGs offer a practical approach to high-dimensional clustering that preserves statistical rigour, interpretability, and uncertainty quantification that is often missing from embedding-based, variational, and self-supervised methods.
format Preprint
id arxiv_https___arxiv_org_abs_2206_04841
institution arXiv
publishDate 2022
record_format arxiv
spellingShingle Hierarchical mixtures of Gaussians for combined dimensionality reduction and clustering
Sokoloski, Sacha
Berens, Philipp
Machine Learning
We introduce hierarchical mixtures of Gaussians (HMoGs), which unify dimensionality reduction and clustering into a single probabilistic model. HMoGs provide closed-form expressions for the model likelihood, exact inference over latent states and cluster membership, and exact algorithms for maximum-likelihood optimization. The novel exponential family parameterization of HMoGs greatly reduces their computational complexity relative to similar model-based methods, allowing them to efficiently model hundreds of latent dimensions, and thereby capture additional structure in high-dimensional data. We demonstrate HMoGs on synthetic experiments and MNIST, and show how joint optimization of dimensionality reduction and clustering facilitates increased model performance. We also explore how sparsity-constrained dimensionality reduction can further improve clustering performance while encouraging interpretability. By bridging classical statistical modelling with the scale of modern data and compute, HMoGs offer a practical approach to high-dimensional clustering that preserves statistical rigour, interpretability, and uncertainty quantification that is often missing from embedding-based, variational, and self-supervised methods.
title Hierarchical mixtures of Gaussians for combined dimensionality reduction and clustering
topic Machine Learning
url https://arxiv.org/abs/2206.04841