Evaluation of cluster analysis and latent class analysis in clustering
No Thumbnail Available
Date
2019
Authors
Murisa, Tatenda
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The study compares the performance of latent class, K-means and hierarchical clustering on data with different degrees of cluster overlap. It also assesses how various standardisation methods affect the results of hierarchical and K-means clustering. Several distance and agglomeration methods are evaluated to observe how they perform depending on cluster overlap. Three artificial datasets were simulated whose clusters were poorly, moderately and well separated. These along with the seeds data were run through the three clustering methods. Several external validity indices were calculated for each cluster solution. The adjusted Rand index was used for comparison in the discussion because it is not affected by the number of clusters.
Results showed that Ward’s method performed better compared to all other agglomeration methods and the Manhattan distance performed better across the different cluster types in hierarchical clustering. Latent class clustering performed better for poorly and well separated clusters. When the variance of the variables were comparable, K-means clustering with no standardisation performed well. Standardisation by the maximum value and z-score had the best cluster recovery when the variance of variables were large.
Description
A research report submitted in partial fulfilment of the requirements for the degree of Master of Science to the Faculty of Science, School of Statistics and Actuarial Science,
University of the Witwatersrand, Johannesburg, 2019
Keywords
Citation
Murisa, Tatenda Kenneth. (2019). Evaluation of cluster analysis and latent class analysis in clustering. University of the Witwatersrand, https://hdl.handle.net/10539/29564