Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Zeya, Ye, Chenglong
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2403.14830
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929285844959232
author	Wang, Zeya Ye, Chenglong
author_facet	Wang, Zeya Ye, Chenglong
contents	Deep clustering, a method for partitioning complex, high-dimensional data using deep neural networks, presents unique evaluation challenges. Traditional clustering validation measures, designed for low-dimensional spaces, are problematic for deep clustering, which involves projecting data into lower-dimensional embeddings before partitioning. Two key issues are identified: 1) the curse of dimensionality when applying these measures to raw data, and 2) the unreliable comparison of clustering results across different embedding spaces stemming from variations in training procedures and parameter settings in different clustering models. This paper addresses these challenges in evaluating clustering quality in deep learning. We present a theoretical framework to highlight ineffectiveness arising from using internal validation measures on raw and embedded data and propose a systematic approach to applying clustering validity indices in deep clustering contexts. Experiments show that this framework aligns better with external validation measures, effectively reducing the misguidance from the improper use of clustering validity indices in deep learning.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_14830
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Deep Clustering Evaluation: How to Validate Internal Clustering Validation Measures Wang, Zeya Ye, Chenglong Machine Learning Deep clustering, a method for partitioning complex, high-dimensional data using deep neural networks, presents unique evaluation challenges. Traditional clustering validation measures, designed for low-dimensional spaces, are problematic for deep clustering, which involves projecting data into lower-dimensional embeddings before partitioning. Two key issues are identified: 1) the curse of dimensionality when applying these measures to raw data, and 2) the unreliable comparison of clustering results across different embedding spaces stemming from variations in training procedures and parameter settings in different clustering models. This paper addresses these challenges in evaluating clustering quality in deep learning. We present a theoretical framework to highlight ineffectiveness arising from using internal validation measures on raw and embedded data and propose a systematic approach to applying clustering validity indices in deep clustering contexts. Experiments show that this framework aligns better with external validation measures, effectively reducing the misguidance from the improper use of clustering validity indices in deep learning.
title	Deep Clustering Evaluation: How to Validate Internal Clustering Validation Measures
topic	Machine Learning
url	https://arxiv.org/abs/2403.14830

Similar Items