Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Vysala, Anupriya, Gomes, Joseph
Format:	Preprint
Published:	2020
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2007.08034
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913488666886144
author	Vysala, Anupriya Gomes, Joseph
author_facet	Vysala, Anupriya Gomes, Joseph
contents	Clustering is the technique to partition data according to their characteristics. Data that are similar in nature belong to the same cluster [1]. There are two types of evaluation methods to evaluate clustering quality. One is an external evaluation where the truth labels in the data sets are known in advance and the other is internal evaluation in which the evaluation is done with data set itself without true labels. In this paper, both external evaluation and internal evaluation are performed on the cluster results of the IRIS dataset. In the case of external evaluation Homogeneity, Correctness and V-measure scores are calculated for the dataset. For internal performance measures, the Silhouette Index and Sum of Square Errors are used. These internal performance measures along with the dendrogram (graphical tool from hierarchical Clustering) are used first to validate the number of clusters. Finally, as a statistical tool, we used the frequency distribution method to compare and provide a visual representation of the distribution of observations within a clustering result and the original data.
format	Preprint
id	arxiv_https___arxiv_org_abs_2007_08034
institution	arXiv
publishDate	2020
record_format	arxiv
spellingShingle	Evaluating and Validating Cluster Results Vysala, Anupriya Gomes, Joseph Machine Learning Clustering is the technique to partition data according to their characteristics. Data that are similar in nature belong to the same cluster [1]. There are two types of evaluation methods to evaluate clustering quality. One is an external evaluation where the truth labels in the data sets are known in advance and the other is internal evaluation in which the evaluation is done with data set itself without true labels. In this paper, both external evaluation and internal evaluation are performed on the cluster results of the IRIS dataset. In the case of external evaluation Homogeneity, Correctness and V-measure scores are calculated for the dataset. For internal performance measures, the Silhouette Index and Sum of Square Errors are used. These internal performance measures along with the dendrogram (graphical tool from hierarchical Clustering) are used first to validate the number of clusters. Finally, as a statistical tool, we used the frequency distribution method to compare and provide a visual representation of the distribution of observations within a clustering result and the original data.
title	Evaluating and Validating Cluster Results
topic	Machine Learning
url	https://arxiv.org/abs/2007.08034

Similar Items