School of Statistics and Actuarial Science (ETDs)

Permanent URI for this communityhttps://hdl.handle.net/10539/38022

Browse

Search Results

Now showing 1 - 2 of 2
  • Thumbnail Image
    Item
    Optimising Visual Clarity using Clustering Techniques for Overcrowded Biplots
    (University of the Witwatersrand, Johannesburg, 2025-06) Balisa, Yamkela; Ganey, Raeesa
    The increasing use of data in various industries has driven the need for effective data analysis and visualisation. Data visualisation is a key methodology for extracting insights from the data. One powerful visualisation technique based on dimensionality reduction methods is the biplot. Biplots are multivariate scatterplots that facilitate the visualisation of high-dimensional data by projecting it onto lower dimensional spaces, usually two or three dimensions. This reduction in dimensionality is achieved using techniques such as Principal Component Analysis (PCA) for continuous data. A biplot simultaneously represents both samples and variables within the same visualisation. However, biplots often face challenges when dealing with a very large number of variables in data. A key issue is the overcrowding of variables within the biplot, making it difficult to obtain meaningful insights. To address this issue, this study explores the integration of unsupervised learning techniques, specifically clustering into the biplot framework. Unsupervised learning refers to a type of machine learning approach in which the algorithm learns patterns and relationships in the data without prior knowledge of the expected output. Clustering, a fundamental unsupervised learning technique, involves grouping similar data points into clusters, enabling the identification of underlying structures and relationships. By applying clustering, specifically the k-means clustering algorithm, this study aims to cluster similar variables into distinct clusters within the biplot. Similar variables are determined by the proximity of their endpoints and the angles they form within the biplot. Ultimately, the refined biplot displays only a representative cluster of vectors, thus enhancing the clarity and interpretability.
  • Thumbnail Image
    Item
    Clustering and Classification Techniques in the Presence of Outliers: An Application to the Johannesburg Stock Exchange Stocks
    (University of the Witwatersrand, Johannesburg, 2024) Maphalla, Retsebile; Chipoyera, HW
    In this study, the impact of outliers on clustering using the K-means algorithm was explored. It was observed that a high prevalence of outliers can seriously compromise the results of clustering. A novel algorithm called Clustering-quality-aided outlier detection (CQAOD) is proposed in this study. The novelty stems from the fact that apart from identifying outliers, good quality clustering is achieved and the “optimal” number of clusters for K-means clustering of multivariate Gaussian data is simultaneously proffered. In the case of the Johannesburg Stock Exchange (JSE) data, an investigation to compare the efficacy of the following clustering techniques: Hierarchical clustering, spectral clustering, Clustering Large Applications (Clara), Density-based spatial clustering of applications with noise (DBSCAN) was done with the aim of constructing a diversified stock portfolio. The study found that the hierarchical clustering algorithm is the best algorithm to cluster the shares on the JSE