Saved in:
Bibliographic Details
Main Authors: DeWolfe, Ryan, Andrews, Jeffery L.
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2312.10270
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913689970409472
author DeWolfe, Ryan
Andrews, Jeffery L.
author_facet DeWolfe, Ryan
Andrews, Jeffery L.
contents The Adjusted Rand Index (ARI) is a widely used method for comparing hard clusterings, but requires a choice of random model that is often left implicit. Several recent works have extended the Rand Index to fuzzy clusterings, but the assumptions of the most common random model is difficult to justify in fuzzy settings. We propose a single framework for computing the ARI with three random models that are intuitive and explainable for both hard and fuzzy clusterings, along with the benefit of lower computational complexity. The theory and assumptions of the proposed models are contrasted with the existing permutation model. Computations on synthetic and benchmark data show that each model has distinct behaviour, meaning that accurate model selection is important for the reliability of results.
format Preprint
id arxiv_https___arxiv_org_abs_2312_10270
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Random Models for Fuzzy Clustering Similarity Measures
DeWolfe, Ryan
Andrews, Jeffery L.
Machine Learning
G.3
The Adjusted Rand Index (ARI) is a widely used method for comparing hard clusterings, but requires a choice of random model that is often left implicit. Several recent works have extended the Rand Index to fuzzy clusterings, but the assumptions of the most common random model is difficult to justify in fuzzy settings. We propose a single framework for computing the ARI with three random models that are intuitive and explainable for both hard and fuzzy clusterings, along with the benefit of lower computational complexity. The theory and assumptions of the proposed models are contrasted with the existing permutation model. Computations on synthetic and benchmark data show that each model has distinct behaviour, meaning that accurate model selection is important for the reliability of results.
title Random Models for Fuzzy Clustering Similarity Measures
topic Machine Learning
G.3
url https://arxiv.org/abs/2312.10270