Murisa, Tatenda2020-09-082020-09-082019Murisa, Tatenda Kenneth. (2019). Evaluation of cluster analysis and latent class analysis in clustering. University of the Witwatersrand, https://hdl.handle.net/10539/29564https://hdl.handle.net/10539/29564A research report submitted in partial fulfilment of the requirements for the degree of Master of Science to the Faculty of Science, School of Statistics and Actuarial Science, University of the Witwatersrand, Johannesburg, 2019The study compares the performance of latent class, K-means and hierarchical clustering on data with different degrees of cluster overlap. It also assesses how various standardisation methods affect the results of hierarchical and K-means clustering. Several distance and agglomeration methods are evaluated to observe how they perform depending on cluster overlap. Three artificial datasets were simulated whose clusters were poorly, moderately and well separated. These along with the seeds data were run through the three clustering methods. Several external validity indices were calculated for each cluster solution. The adjusted Rand index was used for comparison in the discussion because it is not affected by the number of clusters. Results showed that Ward’s method performed better compared to all other agglomeration methods and the Manhattan distance performed better across the different cluster types in hierarchical clustering. Latent class clustering performed better for poorly and well separated clusters. When the variance of the variables were comparable, K-means clustering with no standardisation performed well. Standardisation by the maximum value and z-score had the best cluster recovery when the variance of variables were large.Online resource (x, 134 pages)enSampling (Statistics)Estimation theoryError analysis (Mathematics)Evaluation of cluster analysis and latent class analysis in clusteringThesis