Saved in:
Bibliographic Details
Main Authors: Vysala, Anupriya, Gomes, Joseph
Format: Preprint
Published: 2020
Subjects:
Online Access:https://arxiv.org/abs/2007.08034
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913488666886144
author Vysala, Anupriya
Gomes, Joseph
author_facet Vysala, Anupriya
Gomes, Joseph
contents Clustering is the technique to partition data according to their characteristics. Data that are similar in nature belong to the same cluster [1]. There are two types of evaluation methods to evaluate clustering quality. One is an external evaluation where the truth labels in the data sets are known in advance and the other is internal evaluation in which the evaluation is done with data set itself without true labels. In this paper, both external evaluation and internal evaluation are performed on the cluster results of the IRIS dataset. In the case of external evaluation Homogeneity, Correctness and V-measure scores are calculated for the dataset. For internal performance measures, the Silhouette Index and Sum of Square Errors are used. These internal performance measures along with the dendrogram (graphical tool from hierarchical Clustering) are used first to validate the number of clusters. Finally, as a statistical tool, we used the frequency distribution method to compare and provide a visual representation of the distribution of observations within a clustering result and the original data.
format Preprint
id arxiv_https___arxiv_org_abs_2007_08034
institution arXiv
publishDate 2020
record_format arxiv
spellingShingle Evaluating and Validating Cluster Results
Vysala, Anupriya
Gomes, Joseph
Machine Learning
Clustering is the technique to partition data according to their characteristics. Data that are similar in nature belong to the same cluster [1]. There are two types of evaluation methods to evaluate clustering quality. One is an external evaluation where the truth labels in the data sets are known in advance and the other is internal evaluation in which the evaluation is done with data set itself without true labels. In this paper, both external evaluation and internal evaluation are performed on the cluster results of the IRIS dataset. In the case of external evaluation Homogeneity, Correctness and V-measure scores are calculated for the dataset. For internal performance measures, the Silhouette Index and Sum of Square Errors are used. These internal performance measures along with the dendrogram (graphical tool from hierarchical Clustering) are used first to validate the number of clusters. Finally, as a statistical tool, we used the frequency distribution method to compare and provide a visual representation of the distribution of observations within a clustering result and the original data.
title Evaluating and Validating Cluster Results
topic Machine Learning
url https://arxiv.org/abs/2007.08034