School of Computer Science and Applied Mathematics (ETDs)

Permanent URI for this community

https://hdl.handle.net/10539/38004

Browse

Now showing 1 - 20 of 26

A Continuous Reinforcement Learning Approach to Self-Adaptive Particle Swarm Optimisation
(University of the Witwatersrand, Johannesburg, 2023-08) Tilley, Duncan; Cleghorn, Christopher
Particle Swarm Optimisation (PSO) is a popular black-box optimisation technique due to its simple implementation and surprising ability to perform well on various problems. Unfortunately, PSO is fairly sensitive to the choice of hyper-parameters. For this reason, many self-adaptive techniques have been proposed that attempt to both simplify hyper-parameter selection and improve the performance of PSO. Surveys however show that many self-adaptive techniques are still outperformed by time-varying techniques where the value of coefficients are simply increased or decreased over time. More recent works have shown the successful application of Reinforcement Learning (RL) to learn self-adaptive control policies for optimisers such as differential evolution, genetic algorithms, and PSO. However, many of these applications were limited to only discrete state and action spaces, which severely limits the choices available to a control policy, given that the PSO coefficients are continuous variables. This dissertation therefore investigates the application of continuous RL techniques to learn a self-adaptive control policy that can make full use of the continuous nature of the PSO coefficients. The dissertation first introduces the RL framework used to learn a continuous control policy by defining the environment, action-space, state-space, and a number of possible reward functions. An effective learning environment that is able to overcome the difficulties of continuous RL is then derived through a series of experiments, culminating in a successfully learned continuous control policy. The policy is then shown to perform well on the benchmark problems used during training when compared to other self-adaptive PSO algorithms. Further testing on benchmark problems not seen during training suggest that the learned policy may however not generalise well to other functions, but this is shown to also be a problem in other PSO algorithms. Finally, the dissertation performs a number of experiments to provide insights into the behaviours learned by the continuous control policy.
A fully-decentralised general-sum approach for multi-agent reinforcement learning using minimal modelling
(University of the Witwatersrand, Johannesburg, 2023-08) Kruger, Marcel Matthew Anthony; Rosman, Benjamin; James, Steven; Shipton, Jarrod
Multi-agent reinforcement learning is a prominent area of research in machine learning, extending reinforcement learning to scenarios where multiple agents concurrently learn and interact within the same environment. Most existing methods rely on centralisation during training, while others employ agent modelling. In contrast, we propose a novel method that adapts the role of entropy to assist in fully-decentralised training without explicitly modelling other agents using additional information to which most centralised methods assume access. We augment entropy to encourage more deterministic agents, and instead, we let the non-stationarity inherent in MARL serve as a mode for exploration. We empirically evaluate the performance of our method across five distinct environments, each representing unique challenges. Our assessment encompasses both cooperative and competitive cases. Our findings indicate that the approach of penalising entropy, rather than rewarding it, enables agents to perform at least as well as the prevailing standard of entropy maximisation. Moreover, our alternative approach achieves several of the original objectives of entropy regularisation in reinforcement learning, such as increased sample efficiency and potentially better final rewards. Whilst entropy has a significant role, our results in the competitive case indicate that position bias is still a considerable challenge.
Analyzing the performance and generalisability of incorporating SimCLR into Proximal Policy Optimization in procedurally generated environments
(University of the Witwatersrand, Johannesburg, 2024) Gilbert, Nikhil; Rosman, Benjamin
Multiple approaches to state representation learning have been shown to improve the performance of reinforcement learning agents substantially. When used in reinforcement learning, a known challenge in state representation learning is enabling an agent to represent environment states with similar characteristics in a manner that would allow said agent to comprehend it as such. We propose a novel algorithm that combines contrastive learning with reinforcement learning so that agents learn to group states by common physical characteristics and action preferences during training. We subsequently generalise these learnings to previously encountered environment obstacles. To enable a reinforcement learning agent to use contrastive learning within its environment interaction loop, we propose a state representation learning model that employs contrastive learning to group states using observations coupled with the action the agent chose within its current state. Our approach uses a combination of two algorithms that we augment to demonstrate the effectiveness of combining contrastive learning with reinforcement learning. The state representation model for contrastive learning is a Simple Framework for Contrastive Learning of Visual Representations (SimCLR) by Chen et al. [2020], which we amend to include action values from the chosen reinforcement learning environment. The policy gradient algorithm (PPO) is our chosen reinforcement learning approach for policy learning, which we combine with SimCLR to form our novel algorithm, Action Contrastive Policy Optimization (ACPO). When combining these augmented algorithms for contrastive reinforcement learning, our results show significant improvement in training performance and generalisation to unseen environment obstacles of similar structure (physical layout of interactive objects) and mechanics (the rules of physics and transition probabilities).
Applying Machine Learning to Model South Africa’s Equity Market Index Price Performance
(University of the Witwatersrand, Johannesburg, 2023-07) Nokeri, Tshepo Chris; Mulaudzi, Rudzani; Ajoodha, Ritesh
Policymakers typically use statistical multivariate forecasting models to forecast the reaction of stock market returns to changing economic activities. However, these models frequently result in subpar performance due to inflexibility and incompetence in modeling non-linear relationships. Emerging research suggests that machine learning models can better handle data from non-linear dynamic systems and yield outstanding model performance. This research compared the performance of machine learning models to the performance of the benchmark model (the vector autoregressive model) when forecasting the reaction of stock market returns to changing economic activities in South Africa. The vector autoregressive model was used to forecast the reaction of stock market returns. It achieved a mean absolute percentage error (MAPE) value of 0.0084. Machine learning models were used to forecast the reaction of stock market returns. The lowest MAPE value was 0.0051. The machine learning model trained on low economic data dimensions performed 65% better than the benchmark model. Machine learning models also identified key economic activities when forecasting the reaction of stock market returns. Most research focused on whole features, few models for comparison, and barely focused on how different feature subsets and reduced dimensionality change model performance, a limitation this research addresses when considering the number of experiments. This research considered various experiments, i.e., different feature subsets and data dimensions, to determine whether machine learning models perform better than the benchmark model when forecasting the reaction of stock market returns to changing economic activities in South Africa.
A comparative analysis of classic Geometrical methods and sparse regression methods for linearly unmixing hyperspectral image data
(2019) Nicolae, Aurel
This research report presents an across-the-board comparative analysis on algorithms for linearly unmixing hyperspectral image data cubes. Convex geometry based endmember extraction algorithms (EEAs) such as the pixel purity index (PPI) algorithm and N-FINDR have been commonly used to derive the material spectral signatures called endmembers from the hyperspectral images. The estimation of their corresponding fractional abundances is done by solving the related inverse problem in a least squares sense. Semi-supervised sparse regression algorithms such as orthogonal matching pursuit (OMP) and sparse unmixing algorithm via variable splitting and augmented Lagrangian (SUnSAL) bypass the endmember extraction process by employing widely available spectral libraries a priori, automatically returning the fractional abundances and sparsity estimates. The main contribution of this work is to serve as a rich resource on hyperspectral image unmixing, providing end-to-end evaluation of a wide variety of algorithms using di erent arti cial data sets.
Comparing the effectiveness of LSTM, ARIMA, and GRU algorithms for forecasting customer charging behavior in the electric mobility industry in Europe
(University of the Witwatersrand, Johannesburg, 2023) Pelwan, Robyne Chimere
Forecasting, a powerful technique for unveiling potential future events, relies on historical data and methodological approaches to provide valuable insights. This dissertation delves into the domain of electric mobility, investigating the effectiveness of three distinct algorithms—Long Short-term Memory (LSTM), Autoregressive Integrated Moving Average (ARIMA), and Gated Recurrent Unit (GRU)—for predicting customer charging behavior. Specifically, it focuses on forecasting the number of charges over a 7-day period using time-series data from European electric mobility customers. In this study, we scrutinize the interplay between algorithmic performance and the intricacies of the dataset. Root mean squared error (RMSE) serves as a metric for gauging predictive accuracy. The findings highlight the supremacy of the ARIMA model in single-variable analysis, surpassing the predictive capabilities of both LSTM and GRU models. Even when additional features are introduced to enhance LSTM and GRU predictions, the superiority of ARIMA remains pronounced. Notably, this research underscores that ARIMA is particularly well-suited for time series data of this nature due to its tailored design. It contributes valuable insights for both researchers and practitioners in the electric mobility industry, aiding in the strategic selection of forecasting methodologies.
Deep learning models for defect detection in electroluminescence images of solar PV modules
(University of the Witwatersrand, 2024-05-29) Pratt, Lawrence; Klein, Richard
This thesis introduces multi-class solar cell defect detection (SCDD) in electroluminescence (EL) images of PV modules using semantic segmentation. The research is based on experimental results from training and testing existing deep-learning models on a novel dataset developed specifically for this thesis. The dataset consists of EL images and corresponding segmentation masks for defect detection and quantification in EL images of solar PV cells from mono crystalline and multi crystalline silicon wafer-based modules. While many papers have already been published on defect detection and classification in EL images, semantic segmentation is new to this field. The prior art was largely focused on methods to improve EL image quality, classify cells into normal or defective categories, statistical methods and machine learning models for classification, object detection, and some binary segmentation of cracks specifically. This research shows that multi-class semantic segmentation models have the potential to provide accurate defect detection and quantification in both high-quality lab-based EL images and lower-quality field-based EL images of PV modules. While most EL images are collected in factory and lab settings, advancements in imaging technology will lead to an increasing number of EL images taken in the field. Thus, effective methods for SCDD must be robust to various images taken in the labs and the real world, in the same way that deep-learning models for autonomous vehicles that navigate the city streets in some parts of the world today must be robust to real-world environments. The semantic segmentation of EL images, as opposed to image classification, yields statistical data that can then be correlated to the power output for large batches of PV modules. This research evaluates the effectiveness of semantic segmentation to provide a quantitative analysis of PV module quality based on qualitative EL images. The raw EL image is translated into tabular datasets for further downstream analysis. First, we developed a dataset that included 29 classes in the ground truth masks in which each pixel was coloured according to the class. The classes were grouped into intrinsic “features” of solar cells and extrinsic “defects.” Next, a fully-supervised U-Net trained on the small dataset showed that SCDD using semantic segmentation was a viable approach. Next, additional fully-supervised deep-learning models(U-Net, PSPNet, DeepLabV3, DeepLabV3+) were trained using equal, inverse, and custom class weights to identify the best model for SCDD. A benchmark dataset was published along with benchmark performance metrics. The model performance was measured using mean recall, mean precision, and the mean intersection over union (mIoU) for a subset of the most common defects (cracks, inactive areas, and gridline defects) and features (ribbon interconnects and cell spacing) in the dataset. This work focused on developing a deep-learning method for SCDD independent of the imaging equipment, PV module design, and image quality that would be broadly applicable to EL images from any source. The initial experiment showed that semantic segmentation was a viable method for SCDD. The U-Net trained on the initial dataset with 108 images in the training dataset produced good representations of the features common to most of the cells and good representations of the defects with a reasonable sample size. Other defects with only a few examples in the training dataset were not effectively detected in this model. The U-Net results also showed that themIoU measured higher for the features compared to the defects across all models, which correlated with the size of the large features compared to the small defects that each class occupies in the images. The next set of experiments showed that the DeepLabv3+ trained with custom class weights scored the highest in terms of mIoU for the selected defects and features when compared to the alternative fully-supervised models. While the mIoU for cracks was still low (25%), the recall was high (86%). While increasing the recall substantially, the many long, narrow defects (e.g. cracks and gridlines) and features (e.g. ribbon interconnects and spacing) in the dataset were challenging to segment, especially at the borders. The custom class weights also tended to dilate the long, narrow features, which led to low precision. However, the resulting representations reliably located these defects in the complex images with both large and small objects, and the dilation proved effective at visually highlighting the long-narrow defects when the cell-level images were combined into module-level images. Therefore, the model prove useful in the context of detecting critical defects and quantifying the relative size of the defects in EL images of PV cells and modules despite the relatively low mIoU. The dataset was also published along with this paper. The final set of experiments focused on semi-supervised and self-supervised models. The results suggested that supervised training on a large out of-domain (OOD) dataset (COCO), self supervised pretraining on a large OOD dataset (ImageNet), and semi-supervised pretraining (CCT) were statistically equivalent as measured by the mIoU on a subset of critical defects and features. A new state-of-the-art (SOTA) for SCDD was achieved, exceeding the mIoU from the DeeplabV3+ with custom weights. The experiments also demonstrated that certain pretraining schemes resulted in the ability to detect and quantify underrepresented classes, such as the round ring defect. The unique contributions from this work include two benchmark datasets for multi-class semantic segmentation in EL images of solar PV cells. The smaller dataset consists of 765 images with corresponding ground truth masks. The larger dataset consists of more than 20,000 unlabelled EL images. The thesis also documents the performance metrics from various deep learning models based on fully-supervised, semi-supervised, and self-supervised architectures
Estimating skills in discrete pursuit-evasion games
(University of the Witwatersrand, Johannesburg, 2023) Gomes, Byron John; Rosman, Benjamin
Game Theory is a well-established field in mathematics, economics, and computer science, with a rich history of studying n-person, zero-sum games. Researchers have utilized the best computational power of their time to create computational players that are able to beat the best human players at complex two-player, zero-sum games such as Chess and Go. In the field of Reinforcement Learning and Robotics, these types of games are considered useful environments to conduct experiments about agent behavior and learning. In this research report we explore a subset of discrete skill-dependent pursuit-evasion games upon which we build a framework to estimate player skills. In this game environment a player’s skill determines the actions available to them in each state and the transition dynamics resulting from the chosen action. The game offers a simplified depresentation of more complex games which often have vast state and action spaces, making it difficult to model and analyze player behavior. In this game environment we find that players with incorrect assumptions about an opponent’s skill perform sub-optimally at winning games. Given that knowledge of an opponent’s skill impacts on player performance, we demonstrate that players can use Bayesian inference to estimate their opponent’s skill, based on the action outcomes of an opponent. We also demonstrate that skill estimation is a valuable exercise for players to undertake and show that the performance of players that estimate their opponent’s skill converges to the performance of players given perfect knowledge of their opponent’s skill. This research contributes to our understanding of Bayesian skill estimation in skill-dependent pursuit-evasion games which may be useful in the fields of Multi-agent Reinforcement Learning and Robotics.
Evaluating Pre-training Mechanisms in Deep Learning Enabled Tuberculosis Diagnosis
(University of the Witwatersrand, Johannesburg, 2024) Zaranyika, Zororo; Klein, Richard
Tuberculosis (TB) is an infectious disease caused by a bacteria called Mycobacterium Tuberculosis. In 2021, 10.6 million people fell ill because of TB and about 1.5 million lives are lost from TB each year even though TB is a preventable and curable disease. The latest global trends in TB death cases are shown in 1.1. To ensure a higher survival rate and prevent further transmissions, it is important to carry out early diagnosis. One of the critical methods of TB diagnosis and detection is the use of posterior-anterior chest radiographs (CXR). The diagnosis of Tuberculosis and other chest-affecting dis- eases like Pneumoconiosis is time-consuming, challenging and requires experts to read and interpret chest X-ray images, especially in under-resourced areas. Various attempts have been made to perform the diagnosis using deep learning methods such as Convolutional Neural Networks (CNN) using labelled CXR images. Due to the nature of CXR images in maintaining a consistent structure and overlapping visual appearances across different chest-affecting diseases, it is reasonable to believe that visual features learned in one disease or geographic location may transfer to a new TB classificationmodel. This would allow us to leverage large volumes of labelled CXR images available online hence decreasing the data required to build a local model. This work will explore to what extent such pre-training and transfer learning is useful and whether it may help decrease the data required for a locally trained classifier. In this research, we investigated various pre-training regimes using selected online datasets to under- stand whether the performance of such models can be generalised towards building a TB computer-aided diagnosis system and also inform us on the nature and size of CXR datasets we should be collecting. Our experiment results indicated that both supervised and self-supervised pre-training between the CXR datasets cannot significantly improve the overall performance metrics of a TB. We noted that pre-training on the ChestX-ray14, CheXpert, and MIMIC-CXR datasets resulted in recall values of over 70% and specificity scores of at least 90%. There was a general decline in performance in our experiments when we pre-trained on one dataset and fine-tuned on a different dataset, hence our results were lower than baseline experiment results. We noted that ImageNet weights initialisation yields superior results over random weights initialisation on all ex- periment configurations. In the case of self-supervised pre-training, the model reached acceptable metrics with a minimum number of labels as low as 5% when we fine-tuned on the TBX11k dataset, although slightly lower in performance compared to the super-vised pre-trained models and the baseline results. The best-performing self-supervised pre-trained model with the least number of training labels was the MoCo-ResNet-50 model pre-trained on the VinDr-CXR and PadChest datasets. These model configura- tions achieved recall scores of 81.90% and a specificity score of 81.99% on VinDr-CXR pre-trained weights while the PadChest weights scored a recall of 70.29% and a speci- ficity of 70.22%. The other self-supervised pre-trained models failed to reach scores of at least 50% on both recall or specificity with the same number of labels
Federated learning in the detection of Covid -19 in patient Ct-Scans: A practical evaluation of external generalisation
(University of the Witwatersrand, Johannesburg, 2023-08) Wapenaar, Korstiaan; Ranchod, Pravesh
This research explores the practical utility of using convolutional neural networks in a federated learning architecture for COVID-19 diagnostics using chest CT-scans, and whether federated learning models can generalise to data from healthcare facilities that did not participate in training. A model that can generalise to these healthcare facilities could provide lower-resourced or over-utilised facilities with access to supplementary diagnostic services. Eleven models are trained using a modified VGG-16. The models are trained using data from five ‘sites’: four sites are single healthcare facilities and the fifth site is a composite of data from a variety of healthcare facilities. Eleven models are trained, evaluated and compared: five ‘independent models’ are each trained with data from a single site; three ‘global models’ are trained using centrally pooled data from a variety of sites; three ‘federated models’ are trained using a federated averaging approach. The site with composite data is held-out and never included in training the federated and global models. With the exception of this composite site, all models achieve a test accuracy of at least 0.93 when evaluated using test data from the sites used in training these models. All models are then evaluated using data from the composite site. The global and federated models achieve a 0.5 to 0.6 accuracy for the composite site, indicating that the model and training regime is unable to achieve useful accuracies for sites non-participant in training. The federated models are therefore not accurate enough to motivate a healthcare facility decision maker to use the federated models as an alternative or supplementary diagnostic tool to radiographers, or to developing their own independent model. Evaluation of the results suggests that high-quality and consistent image pre-processing may be a necessary precondition for the task.
Generating Rich Image Descriptions from Localized Attention
(University of the Witwatersrand, Johannesburg, 2023-08) Poulton, David; Klein, Richard
The field of image captioning is constantly growing with swathes of new methodologies, performance leaps, datasets, and challenges. One new challenge is the task of long-text image description. While the vast majority of research has focused on short captions for images with only short phrases or sentences, new research and the recently released Localized Narratives dataset have pushed this to rich, paragraph length descriptions. In this work we perform additional research to grow the sub-field of long-text image descriptions and determine the viability of our new methods. We experiment with a variety of progressively more complex LSTM and Transformer-based approaches, utilising human-generated localised attention traces and image data to generate suitable captions, and evaluate these methods on a suite of common language evaluation metrics. We find that LSTM-based approaches are not well suited to the task, and under-perform Transformer-based implementations on our metric suite while also proving substantially more demanding to train. On the other hand, we find that our Transformer-based methods are well capable of generating captions with rich focus over all regions of the image and in a grammatically sound manner, with our most complex model outperforming existing approaches on our metric suite.
Generative Model Based Adversarial Defenses for Deepfake Detectors
(University of the Witwatersrand, Johannesburg, 2023-08) Kavilan Dhavan, Nair; Klein, Richard
Deepfake videos present a serious threat to society as they can be used to spread mis-information through social media. Convolutional Neural Networks (CNNs) have been effective in detecting deepfake videos, but they are vulnerable to adversarial attacks that can compromise their accuracy. This vulnerability can be exploited by deepfake creators to evade detection. In this study, we evaluate the effectiveness of two genera- tive adversarial defense mechanisms, APE-GAN and MagNet, in the context of deepfake detection. We use the FaceForensics++ dataset and a CNN victim model based on the XceptionNet architecture, which we attack using the iterative fast gradient sign method at two different levels of ✏, ✏ = 0.0001 and ✏ = 0.01. We find that both APE-GAN and MagNet can purify the adversarial images and restore the performance of the vic- tim model to within 10% of the model’s accuracy on benign fake inputs. However, these methods were less effective at restoring accuracy for adversarial real examples and were not able to significantly restore accuracy when the adversarial attack was aggressive (✏ = 0.01). We recommend that an adversarial defense method be used in conjunction with a deepfake detector to improve the accuracy of predictions. APE-GAN and MagNet are effective methods in the deepfake context, but their effectiveness is limited when the adversarial attack is aggressive.
Improving audio-driven visual dubbing solutions using self-supervised generative adversarial networks
(University of the Witwatersrand, Johannesburg, 2023-09) Ranchod, Mayur; Klein, Richard
Audio-driven visual dubbing (ADVD) is the process of accepting a talking-face video, along with a dubbing audio segment, as inputs and producing a dubbed video such that the speaker appears to be uttering the dubbing audio. ADVD aims to address the language barrier inherent in the consumption of video-based content caused by the various languages in which videos may be presented. Specifically, a video may only be consumed by the audience that is familiar with the spoken language. Traditional solutions, such as subtitles and audio-dubbing, hinder the viewer’s experience by either obstructing the on-screen content or introducing an unpleasant discrepancy between the speaker’s mouth movements and the input dubbing audio, respectively. In contrast, ADVD strives to achieve a natural viewing experience by synchronizing the speaker’s mouth movements with the dubbing audio. A comprehensive survey of several ADVD solutions revealed that most existing solutions achieve satisfactory visual quality and lip-sync accuracy but are limited to low-resolution videos with frontal or near frontal faces. Since this is in sharp contrast to real-world videos, which are high-resolution and contain arbitrary head poses, we present one of the first ADVD solutions trained with high-resolution data and also introduce the first pose-invariant ADVD solution. Our results show that the presented solution achieves superior visual quality while also achieving high measures of lip-sync accuracy, consequently enabling the solution to achieve significantly improved results when applied to real-world videos.
Improving Semi-Supervised Learning Generative Adversarial Networks
(University of the Witwatersrand, Johannesburg, 2023-08) Moolla, Faheem; Bau, Hairong; Van Zyl, Terence
Generative Adversarial Networks (GANs) have shown remarkable potential in generating high-quality images, with semi-supervised GANs providing a high classification accuracy. In this study, an enhanced semi supervised GAN model is proposed wherein the generator of the GAN is replaced by a pre-trained decoder from a Variational Autoencoder. The model presented outperforms regular GAN and semi-supervised GAN models during the early stages of training, as it produces higher quality images. Our model demonstrated significant improvements in image quality across three datasets - namely the MNIST, Fashion MNIST, and CIFAR-10 datasets - as evidenced by higher accuracies obtained from a Convolutional Neural Network (CNN) trained on generated images, as well as superior inception scores. Additionally, our model prevented mode collapse and exhibited smaller oscillations in the discriminator and generator loss graphs compared to baseline models. The presented model also provided remarkably high levels of classification accuracy, by obtaining 99.32% on the MNIST dataset, 92.78% on the Fashion MNIST dataset, and 83.22% on the CIFAR-10 dataset. These scores are notably robust as they improved some of the classification accuracies obtained by two state-of-the-art models, indicating that the presented model is a significantly improved semi-supervised GAN model. However, despite the high classification accuracy for the CIFAR-10 dataset, a considerable drop in accuracy was observed when comparing generated images to real images for this dataset. This suggests that the quality of those generated images can be bettered and the presented model performs better with less complex datasets. Future work could explore techniques to enhance our model’s performance with more intricate datasets, ultimately expanding its applicability across various domains.
Learning to adapt: domain adaptation with cycle-consistent generative adversarial networks
(University of the Witwatersrand, Johannesburg, 2023) Burke, Pierce William; Klein, Richard
Domain adaptation is a critical part of modern-day machine learning as many practitioners do not have the means to collect and label all the data they require reliably. Instead, they often turn to large online datasets to meet their data needs. However, this can often lead to a mismatch between the online dataset and the data they will encounter in their own problem. This is known as domain shift and plagues many different avenues of machine learning. From differences in data sources, changes in the underlying processes generating the data, or new unseen environments the models have yet to encounter. All these issues can lead to performance degradation. From the success in using Cycle-consistent Generative Adversarial Networks(CycleGAN) to learn unpaired image-to-image mappings, we propose a new method to help alleviate the issues caused by domain shifts in images. The proposed model incorporates an adversarial loss to encourage realistic-looking images in the target domain, a cycle-consistency loss to learn an unpaired image-to-image mapping, and a semantic loss from a task network to improve the generator’s performance. The task network is con-currently trained with the generators on the generated images to improve downstream task performance on adapted images. By utilizing the power of CycleGAN, we can learn to classify images in the target domain without any target domain labels. In this research, we show that our model is successful on various unsupervised domain adaptation (UDA) datasets and can alleviate domain shifts for different adaptation tasks, like classification or semantic segmentation. In our experiments on standard classification, we were able to bring the models performance to near oracle level accuracy on a variety of different classification datasets. The semantic segmentation experiments showed that our model could improve the performance on the target domain, but there is still room for further improvements. We also further analyze where our model performs well and where improvements can be made.
Modelling Cohort Specific Metabolic Syndrome and Cardiovascular Disease Risk using Supervised Machine Learning
(University of the Witwatersrand, Johannesburg, 2023-08) Ngcayiya, Paulina Genet; Ranchod, Pravesh
Cardiovascular Disease (CVD) is the leading cause of death worldwide, with Coronary Heart Disease (CHD) being the most common type of CVD. The consequences of the presence of CVD risk factors often manifest as Metabolic Syndrome (MetS). In this study, a dataset from the Framingham Heart Study (FHS) was used to develop two different kinds of CHD risk prediction models. These models were developed using Random Forests (RF) and AutoPrognosis. Performance of the Framingham Risk Score model (AUC-ROC: 0.633) on the FHS dataset was used as the benchmark. The RF model with optimized hyperparameters (AUC-ROC: 0.728) produced the best results. This was by a very small margin to the AutoPrognosis model with an ensemble pipeline (AUC-ROC: 0.714). The performance of RF against AutoPrognosis when predicting the existence of MetS was evaluated using a dataset from the National Health and Nutrition Examination Survey (NHANES). The RF model with optimized hyperparameters (AUC ROC: 0.851) produced the best results. This was by a small margin to the AutoPrognosis model with an ensemble pipeline (AUC-ROC: 0.851). Datasets, varying in size from 100 to 4900, were used to test the performance of RF against AutoPrognosis. The RF model with optimized hyperparameters had the best performance results.
MultiI-View Ranking: Tasking Transformers to Generate and Validate Solutions to Math Word Problems
(University of the Witwatersrand, Johannesburg, 2023-11) Mzimba, Rifumo; Klein, Richard; Rosman, Benjamin
The recent developments and success of the Transformer model have resulted in the creation of massive language models that have led to significant improvements in the comprehension of natural language. When fine-tuned for downstream natural language processing tasks with limited data, they achieve state-of-the-art performance. However, these robust models lack the ability to reason mathematically. It has been demonstrated that, when fine-tuned on the small-scale Math Word Problems (MWPs) benchmark datasets, these models are not able to generalize. Therefore, to overcome this limitation, this study proposes to augment the generative objective used in the MWP task with complementary objectives that can assist the model in reasoning more deeply about the MWP task. Specifically, we propose a multi-view generation objective that allows the model to understand the generative task as an abstract syntax tree traversal beyond the sequential generation task. In addition, we propose a complementary verification objective to enable the model to develop heuristics that can distinguish between correct and incorrect solutions. These two goals comprise our multi-view ranking (MVR) framework, in which the model is tasked to generate the prefix, infix, and postfix traversals for a given MWP, and then use the verification task to rank the generated expressions. Our experiments show that the verification objective is more effective at choosing the best expression than the widely used beam search. We further show that when our two objectives are used in conjunction, they can effectively guide our model to learn robust heuristics for the MWP task. In particular, we achieve an absolute percentage improvement of 9.7% and 5.3% over our baseline and the state-of-the-art models on the SVAMP datasets. Our source code can be found on https://github.com/ProxJ/msc-final.
Overlapping multidomain paired quasilinearization methods for solving boundary layer flow problems
(University of the Witwatersrand, Johannesburg, 2024) Nefale, Mpho Mendy; Otegbeye, Olumuyiwa; Oloniiju, Shina Daniel
There is a constant and continuous need to refine current numerical approaches used to solve non-linear differential equations, which are employed to model real- world problems that often do not have analytical solutions. Spectral-based techniques have proven to be one of the most efficient numerical techniques for finding solutions of differential equations. Numerous spectral-based linearization techniques have been developed, such as the spectral relaxation (SRM), the spectral local linearization (SLLM), the spectral quasilinearization (SQLM), and the paired quasilinearization (PQLM) methods, among others. Previous research suggests that the PQLM is an efficient approach for solving complex non-linear systems of ordinary (ODEs) and partial differential equations (PDEs). However, it has been observed that this method requires further enhancement when utilized for problems described over a large domain, be it temporal or spatial. This research aims to address this limitation by proposing a modified version of the PQLM called the overlapping multi-domain paired quasilinearization method (OMD-PQLM), that enhances the accuracy and convergence speed of the original approach. The new approach entails solving a system by a technique that involves decoupling the system into pairs of equations and partitioning the large domain into smaller overlapping sub-domains. A comparison between the OMD-PQLM and the PQLM is conducted by solving systems of ODEs and PDEs. The proposed numerical approach is evaluated based on the norms of the residual and convergence errors, computational time, and the influence of the number of grid points and sub-domains on the convergence speed of the iterative scheme and the accuracy of the solutions. The findings demonstrate that the OMD-PQLM remarkably improves the accuracy of the solution compared to the PQLM, suggesting that partitioning the problem domain into overlapping multiple-domains optimizes the performance of the PQLM.
Pipeline for the 3D Reconstruction of Rigid, Handheld Objects through the Use of Static Cameras
(University of the Witwatersrand, Johannesburg, 2023-04) Kambadkone, Saatwik Ramakrishna; Klein, Richard
In this paper, we develop a pipeline for the 3D reconstruction of handheld objects using a single, static RGB-D camera. We also create a general pipeline to describe the process of handheld object reconstruction. This general pipeline suggests the deconstruction of this task into three main constituents: input, where we decide our main method of data capture; segmentation and tracking, where we identify and track the relevant parts of our captured data; and reconstruction where we develop a method for reconstructing our previous information into 3D models. We successfully create a handheld object reconstruction method using a depth sensor as our input; hand tracking, depth segmentation and optical flow to retrieve relevant information; and reconstruction through the use of ICP and TSDF maps. During this process, we also evaluate other possible variations of this successful method. In one of these variations, we test the effect of using depth-estimation to generate data as- the input to our pipeline. While this experimentation helps us quantify our method’s robustness to noise in the input data, we do conclude that current depth estimation techniques do not provide adequate detail for the reconstruction of handheld objects.
Play-style Identification and Player Modelling for Generating Tailored Advice in Video Games
(University of the Witwatersrand, Johannesburg, 2023-09) Ingram, Branden Corwin; Rosman, Benjamin; Van Alten, Clint; Klein, Richard
Recent advances in fields such as machine learning have enabled the development of systems that are able to achieve super-human performance on a number of domains, specifically in complex games such as Go and StarCraft. Based on these successes, it is reasonable to ask if these learned behaviours could be utilised to improve the performance of humans on the same tasks. However, the types of models used in these systems are typically not easily interpretable, and can not be directly used to improve the performance of a human. Additionally, humans tend to develop stylistic traits based on preference which aid in solving problems or competing at high levels. This thesis looks to address these difficulties by developing an end-to-end pipeline that can provide beneficial advice tailored to a player’s style in a video game setting. Towards this end, we demonstrate the ability to firstly cluster variable length multi-dimensional gameplay trajectories with respect to play-style in an unsupervised fashion. Secondly, we demonstrate the ability to learn to model an individual player’s actions during gameplay. Thirdly we demonstrate the ability to learn policies representative of all the play-styles identified with an environment. Finally, we demonstrate how the utilisation of these components can generate advice which is tailored to the individual’s style. This system would be particularly useful for improving tutorial systems that quickly become redundant lacking any personalisation. Additionally, this pipeline serves as a way for developers to garner insights on their player base which can be utilised for more informed decision-making on future feature releases and updates. For players, they gain a useful tool which can be utilised to learn how to play better as well identify as the characteristics of their gameplay as well as opponents. Furthermore, we contend that our approach has the potential to be employed in a broad range of learning domains.

Browse

Browsing School of Computer Science and Applied Mathematics (ETDs) by Faculty "Faculty of Science"

Results Per Page

Sort Options