School of Computer Science and Applied Mathematics (ETDs)

Permanent URI for this community

https://hdl.handle.net/10539/38004

Browse

Now showing 1 - 12 of 12

A fully-decentralised general-sum approach for multi-agent reinforcement learning using minimal modelling
(University of the Witwatersrand, Johannesburg, 2023-08) Kruger, Marcel Matthew Anthony; Rosman, Benjamin; James, Steven; Shipton, Jarrod
Multi-agent reinforcement learning is a prominent area of research in machine learning, extending reinforcement learning to scenarios where multiple agents concurrently learn and interact within the same environment. Most existing methods rely on centralisation during training, while others employ agent modelling. In contrast, we propose a novel method that adapts the role of entropy to assist in fully-decentralised training without explicitly modelling other agents using additional information to which most centralised methods assume access. We augment entropy to encourage more deterministic agents, and instead, we let the non-stationarity inherent in MARL serve as a mode for exploration. We empirically evaluate the performance of our method across five distinct environments, each representing unique challenges. Our assessment encompasses both cooperative and competitive cases. Our findings indicate that the approach of penalising entropy, rather than rewarding it, enables agents to perform at least as well as the prevailing standard of entropy maximisation. Moreover, our alternative approach achieves several of the original objectives of entropy regularisation in reinforcement learning, such as increased sample efficiency and potentially better final rewards. Whilst entropy has a significant role, our results in the competitive case indicate that position bias is still a considerable challenge.
Applying Machine Learning to Model South Africa’s Equity Market Index Price Performance
(University of the Witwatersrand, Johannesburg, 2023-07) Nokeri, Tshepo Chris; Mulaudzi, Rudzani; Ajoodha, Ritesh
Policymakers typically use statistical multivariate forecasting models to forecast the reaction of stock market returns to changing economic activities. However, these models frequently result in subpar performance due to inflexibility and incompetence in modeling non-linear relationships. Emerging research suggests that machine learning models can better handle data from non-linear dynamic systems and yield outstanding model performance. This research compared the performance of machine learning models to the performance of the benchmark model (the vector autoregressive model) when forecasting the reaction of stock market returns to changing economic activities in South Africa. The vector autoregressive model was used to forecast the reaction of stock market returns. It achieved a mean absolute percentage error (MAPE) value of 0.0084. Machine learning models were used to forecast the reaction of stock market returns. The lowest MAPE value was 0.0051. The machine learning model trained on low economic data dimensions performed 65% better than the benchmark model. Machine learning models also identified key economic activities when forecasting the reaction of stock market returns. Most research focused on whole features, few models for comparison, and barely focused on how different feature subsets and reduced dimensionality change model performance, a limitation this research addresses when considering the number of experiments. This research considered various experiments, i.e., different feature subsets and data dimensions, to determine whether machine learning models perform better than the benchmark model when forecasting the reaction of stock market returns to changing economic activities in South Africa.
Generating Rich Image Descriptions from Localized Attention
(University of the Witwatersrand, Johannesburg, 2023-08) Poulton, David; Klein, Richard
The field of image captioning is constantly growing with swathes of new methodologies, performance leaps, datasets, and challenges. One new challenge is the task of long-text image description. While the vast majority of research has focused on short captions for images with only short phrases or sentences, new research and the recently released Localized Narratives dataset have pushed this to rich, paragraph length descriptions. In this work we perform additional research to grow the sub-field of long-text image descriptions and determine the viability of our new methods. We experiment with a variety of progressively more complex LSTM and Transformer-based approaches, utilising human-generated localised attention traces and image data to generate suitable captions, and evaluate these methods on a suite of common language evaluation metrics. We find that LSTM-based approaches are not well suited to the task, and under-perform Transformer-based implementations on our metric suite while also proving substantially more demanding to train. On the other hand, we find that our Transformer-based methods are well capable of generating captions with rich focus over all regions of the image and in a grammatically sound manner, with our most complex model outperforming existing approaches on our metric suite.
Generative Model Based Adversarial Defenses for Deepfake Detectors
(University of the Witwatersrand, Johannesburg, 2023-08) Kavilan Dhavan, Nair; Klein, Richard
Deepfake videos present a serious threat to society as they can be used to spread mis-information through social media. Convolutional Neural Networks (CNNs) have been effective in detecting deepfake videos, but they are vulnerable to adversarial attacks that can compromise their accuracy. This vulnerability can be exploited by deepfake creators to evade detection. In this study, we evaluate the effectiveness of two genera- tive adversarial defense mechanisms, APE-GAN and MagNet, in the context of deepfake detection. We use the FaceForensics++ dataset and a CNN victim model based on the XceptionNet architecture, which we attack using the iterative fast gradient sign method at two different levels of ✏, ✏ = 0.0001 and ✏ = 0.01. We find that both APE-GAN and MagNet can purify the adversarial images and restore the performance of the vic- tim model to within 10% of the model’s accuracy on benign fake inputs. However, these methods were less effective at restoring accuracy for adversarial real examples and were not able to significantly restore accuracy when the adversarial attack was aggressive (✏ = 0.01). We recommend that an adversarial defense method be used in conjunction with a deepfake detector to improve the accuracy of predictions. APE-GAN and MagNet are effective methods in the deepfake context, but their effectiveness is limited when the adversarial attack is aggressive.
Improving audio-driven visual dubbing solutions using self-supervised generative adversarial networks
(University of the Witwatersrand, Johannesburg, 2023-09) Ranchod, Mayur; Klein, Richard
Audio-driven visual dubbing (ADVD) is the process of accepting a talking-face video, along with a dubbing audio segment, as inputs and producing a dubbed video such that the speaker appears to be uttering the dubbing audio. ADVD aims to address the language barrier inherent in the consumption of video-based content caused by the various languages in which videos may be presented. Specifically, a video may only be consumed by the audience that is familiar with the spoken language. Traditional solutions, such as subtitles and audio-dubbing, hinder the viewer’s experience by either obstructing the on-screen content or introducing an unpleasant discrepancy between the speaker’s mouth movements and the input dubbing audio, respectively. In contrast, ADVD strives to achieve a natural viewing experience by synchronizing the speaker’s mouth movements with the dubbing audio. A comprehensive survey of several ADVD solutions revealed that most existing solutions achieve satisfactory visual quality and lip-sync accuracy but are limited to low-resolution videos with frontal or near frontal faces. Since this is in sharp contrast to real-world videos, which are high-resolution and contain arbitrary head poses, we present one of the first ADVD solutions trained with high-resolution data and also introduce the first pose-invariant ADVD solution. Our results show that the presented solution achieves superior visual quality while also achieving high measures of lip-sync accuracy, consequently enabling the solution to achieve significantly improved results when applied to real-world videos.
Improving Semi-Supervised Learning Generative Adversarial Networks
(University of the Witwatersrand, Johannesburg, 2023-08) Moolla, Faheem; Bau, Hairong; Van Zyl, Terence
Generative Adversarial Networks (GANs) have shown remarkable potential in generating high-quality images, with semi-supervised GANs providing a high classification accuracy. In this study, an enhanced semi supervised GAN model is proposed wherein the generator of the GAN is replaced by a pre-trained decoder from a Variational Autoencoder. The model presented outperforms regular GAN and semi-supervised GAN models during the early stages of training, as it produces higher quality images. Our model demonstrated significant improvements in image quality across three datasets - namely the MNIST, Fashion MNIST, and CIFAR-10 datasets - as evidenced by higher accuracies obtained from a Convolutional Neural Network (CNN) trained on generated images, as well as superior inception scores. Additionally, our model prevented mode collapse and exhibited smaller oscillations in the discriminator and generator loss graphs compared to baseline models. The presented model also provided remarkably high levels of classification accuracy, by obtaining 99.32% on the MNIST dataset, 92.78% on the Fashion MNIST dataset, and 83.22% on the CIFAR-10 dataset. These scores are notably robust as they improved some of the classification accuracies obtained by two state-of-the-art models, indicating that the presented model is a significantly improved semi-supervised GAN model. However, despite the high classification accuracy for the CIFAR-10 dataset, a considerable drop in accuracy was observed when comparing generated images to real images for this dataset. This suggests that the quality of those generated images can be bettered and the presented model performs better with less complex datasets. Future work could explore techniques to enhance our model’s performance with more intricate datasets, ultimately expanding its applicability across various domains.
Learning to adapt: domain adaptation with cycle-consistent generative adversarial networks
(University of the Witwatersrand, Johannesburg, 2023) Burke, Pierce William; Klein, Richard
Domain adaptation is a critical part of modern-day machine learning as many practitioners do not have the means to collect and label all the data they require reliably. Instead, they often turn to large online datasets to meet their data needs. However, this can often lead to a mismatch between the online dataset and the data they will encounter in their own problem. This is known as domain shift and plagues many different avenues of machine learning. From differences in data sources, changes in the underlying processes generating the data, or new unseen environments the models have yet to encounter. All these issues can lead to performance degradation. From the success in using Cycle-consistent Generative Adversarial Networks(CycleGAN) to learn unpaired image-to-image mappings, we propose a new method to help alleviate the issues caused by domain shifts in images. The proposed model incorporates an adversarial loss to encourage realistic-looking images in the target domain, a cycle-consistency loss to learn an unpaired image-to-image mapping, and a semantic loss from a task network to improve the generator’s performance. The task network is con-currently trained with the generators on the generated images to improve downstream task performance on adapted images. By utilizing the power of CycleGAN, we can learn to classify images in the target domain without any target domain labels. In this research, we show that our model is successful on various unsupervised domain adaptation (UDA) datasets and can alleviate domain shifts for different adaptation tasks, like classification or semantic segmentation. In our experiments on standard classification, we were able to bring the models performance to near oracle level accuracy on a variety of different classification datasets. The semantic segmentation experiments showed that our model could improve the performance on the target domain, but there is still room for further improvements. We also further analyze where our model performs well and where improvements can be made.
MultiI-View Ranking: Tasking Transformers to Generate and Validate Solutions to Math Word Problems
(University of the Witwatersrand, Johannesburg, 2023-11) Mzimba, Rifumo; Klein, Richard; Rosman, Benjamin
The recent developments and success of the Transformer model have resulted in the creation of massive language models that have led to significant improvements in the comprehension of natural language. When fine-tuned for downstream natural language processing tasks with limited data, they achieve state-of-the-art performance. However, these robust models lack the ability to reason mathematically. It has been demonstrated that, when fine-tuned on the small-scale Math Word Problems (MWPs) benchmark datasets, these models are not able to generalize. Therefore, to overcome this limitation, this study proposes to augment the generative objective used in the MWP task with complementary objectives that can assist the model in reasoning more deeply about the MWP task. Specifically, we propose a multi-view generation objective that allows the model to understand the generative task as an abstract syntax tree traversal beyond the sequential generation task. In addition, we propose a complementary verification objective to enable the model to develop heuristics that can distinguish between correct and incorrect solutions. These two goals comprise our multi-view ranking (MVR) framework, in which the model is tasked to generate the prefix, infix, and postfix traversals for a given MWP, and then use the verification task to rank the generated expressions. Our experiments show that the verification objective is more effective at choosing the best expression than the widely used beam search. We further show that when our two objectives are used in conjunction, they can effectively guide our model to learn robust heuristics for the MWP task. In particular, we achieve an absolute percentage improvement of 9.7% and 5.3% over our baseline and the state-of-the-art models on the SVAMP datasets. Our source code can be found on https://github.com/ProxJ/msc-final.
Pipeline for the 3D Reconstruction of Rigid, Handheld Objects through the Use of Static Cameras
(University of the Witwatersrand, Johannesburg, 2023-04) Kambadkone, Saatwik Ramakrishna; Klein, Richard
In this paper, we develop a pipeline for the 3D reconstruction of handheld objects using a single, static RGB-D camera. We also create a general pipeline to describe the process of handheld object reconstruction. This general pipeline suggests the deconstruction of this task into three main constituents: input, where we decide our main method of data capture; segmentation and tracking, where we identify and track the relevant parts of our captured data; and reconstruction where we develop a method for reconstructing our previous information into 3D models. We successfully create a handheld object reconstruction method using a depth sensor as our input; hand tracking, depth segmentation and optical flow to retrieve relevant information; and reconstruction through the use of ICP and TSDF maps. During this process, we also evaluate other possible variations of this successful method. In one of these variations, we test the effect of using depth-estimation to generate data as- the input to our pipeline. While this experimentation helps us quantify our method’s robustness to noise in the input data, we do conclude that current depth estimation techniques do not provide adequate detail for the reconstruction of handheld objects.
Rationalization of Deep Neural Networks in Credit Scoring
(University of the Witwatersrand, Johannesburg, 2023-07) Dastile, Xolani Collen; Celik, Turgay
Machine learning and deep learning, which are subfields of artificial intelligence, are undoubtedly pervasive and ubiquitous technologies of the 21st century. This is attributed to the enhanced processing power of computers, the exponential growth of datasets, and the ability to store the increasing datasets. Many companies are now starting to view their data as an asset, whereas previously, they viewed it as a by-product of business processes. In particular, banks have started to harness the power of deep learning techniques in their day-to-day operations; for example, chatbots that handle questions and answers about different products can be found on banks’ websites. One area that is key in the banking sector is the credit risk department. Credit risk is the risk of lending money to applicants and is measured using credit scoring techniques that profile applicants according to their risk. Deep learning techniques have the potential to identify and separate applicants based on their lending risk profiles. Nevertheless, a limitation arises when employing deep learning techniques in credit risk, stemming from the fact that these techniques lack the ability to provide explanations for their decisions or predictions. Hence, deep learning techniques are coined as non-transparent models. This thesis focuses on tackling the lack of transparency inherent in deep learning and machine learning techniques to render them suitable for adoption within the banking sector. Different statistical, classic machine learning, and deep learning models’ performances were compared qualitatively and quantitatively. The results showed that deep learning techniques outperform traditional machine learning models and statistical models. The predictions from deep learning techniques were explained using state-of-the-art explanation techniques. A novel model-agnostic explanation technique was also devised, and credit-scoring experts assessed its validity. This thesis has shown that different explanation techniques can be relied upon to explain predictions from deep learning and machine learning techniques.
Self Supervised Salient Object Detection using Pseudo-labels
(University of the Witwatersrand, Johannesburg, 2023-08) Bachan, Kidhar; Wang, Hairong
Deep Convolutional Neural Networks have dominated salient object detection methods in recent history. A determining factor for salient object detection network performance is the quality and quantity of pixel-wise annotated labels. This annotation is performed manually, making it expensive (time-consuming, tedious), while limiting the training data to the available annotated datasets. Alternatively, unsupervised models are able to learn from unlabelled datasets or datasets in the wild. In this work, an existing algorithm [Li et al. 2020] is used to refine the generated pseudo labels before training. This research focuses on the changes made to the pseudo label refinement algorithm and its effect on performance for unsupervised saliency object detection tasks. We show that using this novel approach leads to statistically negligible performance improvements and discuss the reasons why this is the case.
Using Machine Learning to Estimate the Photometric Redshift of Galaxies
(University of the Witwatersrand, Johannesburg, 2023-08) Salim, Shayaan; Bau, Hairong; Komin, Nukri
Machine learning has emerged as a crucial tool in the field of cosmology and astrophysics, leading to extensive research in this area. This research study aims to utilize machine learning models to estimate the redshift of galaxies, with a primary focus on utilizing photometric data to obtain accurate results. Five machine learning algorithms, including XGBoost, Random Forests, K-nearest neighbors, Artificial Neural Networks, and Polynomial Regression, are employed to estimate the redshifts, trained on photometric data derived from the Sloan Digital Sky Survey (SDSS) Data Release 17 database. Furthermore, various input parameters from the SDSS database are explored to achieve the most accurate redshift values. The research incorporates a comparative analysis, utilizing different evaluation metrics and statistical tests to determine the best-performing algorithm. The results indicate that the XGBoost algorithm achieves the highest accuracy, with an R2 value of 0.94, a Root Mean Square Error (RMSE) of 0.03, and a Mean Absolute Average Percentage (MAPE) of 12.04% when trained on the optimal feature subset. In comparison, the base model achieved an R2 of 0.84, a RMSE of 0.05, and a MAPE of 20.89%. The study contributes to the existing literature by utilizing photometric data during model training and comparing different high-performing algorithms from the literature.

Browse

Browsing School of Computer Science and Applied Mathematics (ETDs) by SDG "SDG-9: Industry, innovation and infrastructure"

Results Per Page

Sort Options