Electronic Theses and Dissertations (Masters)

Permanent URI for this collectionhttps://hdl.handle.net/10539/38006

Browse

Search Results

Now showing 1 - 8 of 8
  • Thumbnail Image
    Item
    Counting Reward Automata: Exploiting Structure in Reward Functions Expressible in Decidable Formal Languages
    (University of the Witwatersrand, Johannesburg, 2024-07) Bester, Tristan; Rosman, Benjamin; James, Steven; Tasse, Geraud Nangue
    In general, reinforcement learning agents are restricted from directly accessing the environment model. This restricts the agent’s access to the environmental dynamics and reward models, which are only accessible through repeated environmental interactions. As reinforcement learning is well suited for use in complex environments, which are challenging to model, the general assumption that the transition probabilities associated with the environment are unknown is justified. However, as agents cannot discern rewards directly from the environment, reward functions must be designed and implemented for both simulated and real-world environments. As a result, the assumption that the reward model must remain hidden from the agent is unnecessary and detrimental to learning. Previously, methods have been developed that utilise the structure of the reward function to enable more sample-efficient learning. These methods employ a finite state machine variant to facilitate reward specification in a manner that exposes the internal structure of the reward function. This approach is particularly effective when solving long-horizon tasks as it enables the use of counterfactual reasoning with off-policy learning which significantly improves sample efficiency. However, as these approaches are dependent on finite-state machines, they are only able to express a small number of reward functions. This severely limits the applicability of these approaches as they cannot model simple tasks such as “fetch a coffee for each person in the office” which involves counting – one of the numerous properties finite state machines cannot model. This work addresses the limited expressiveness of current state machine-based approaches to reward modelling. Specifically, we introduce a novel approach compatible with any reward function which can be expressed as a well-defined algorithm We present the counting reward automaton – an abstract machine capable of modelling reward functions expressible in any decidable formal language. Unlike previous approaches to state machine-based reward modelling, which are limited to the expression of tasks as regular languages, our framework allows for tasks described by decidable formal languages. It follows that our framework is an extremely general approach to reward modelling – compatible with any task specification expressible as a well-defined algorithm. This is a significant contribution as it greatly extends the class of problems which can benefit from the improved learning techniques facilitated by state machine-based reward modelling. We prove that an agent equipped with such an abstract machine is able to solve an extended set of tasks. We show that this increase in expressive power does not come at the cost of increased automaton complexity. This is followed by the introduction of several learning algorithms designed to increase sample efficiency through the exploitation of automaton structure. These algorithms are based on counterfactual reasoning with off-policy RL and use techniques from the fields of HRL and reward shaping. Finally, we evaluate our approach in several domains requiring long-horizon plans. Empirical results demonstrate that our method outperforms competing approaches in terms of automaton complexity, sample efficiency, and task completion.
  • Item
    Pricing Interest Rate Derivatives Using The Forward Market Model
    (University of the Witwatersrand, Johannesburg, 2024-10) Konaite, Tshana Tumelo; Mudavanhu, Blessing
    The IBOR are due to be discontinued and their replacements have been chosen to be the overnight rates. This change in the risk-free rate comes with challenges of how the new rates will be modelled and how the products will be priced. In this dissertation, we look to explore the classical short-rates and the new generalized Forward Market Model proposed by Andrei Lyanschenko and Fabio Mercurio in 2019. We seek to utilize this model in pricing interest rate derivatives such as caps and swaptions.
  • Thumbnail Image
    Item
    Envisioning the Future of Fashion: The Creation And Application Of Diverse Body Pose Datasets for Real-World Virtual Try-On
    (University of the Witwatersrand, Johannesburg, 2024-08) Molefe, Molefe Reabetsoe-Phenyo; Klein, Richard
    Fashion presents an opportunity for research methods to unite machine learning concepts with e-commerce to meet the growing demands of consumers. A recent development in intelligent fashion research envisions how individuals might appear in different clothes based on their selection, a process known as “virtual try-on”. Our research introduces a novel dataset that ensures multi-view consistency, facilitating the effective warping and synthesis of clothing onto individuals from any given perspective or pose. This addresses a significant shortfall in existing datasets, which struggle to recognise various views, thus limiting the versatility of virtual try-on. By fine-tuning state-of-the-art architectures on our dataset, we expand the utility of virtual try-on, making them more adaptable and robust across a diverse range of scenarios. A noteworthy additional advantage of our dataset is its capacity to facilitate 3D scene reconstruction. This capability arises from utilising a sparse collection of images captured from multiple angles, which, while primarily aimed at enriching 2D virtual try-on, inadvertently supports the simulation of 3D environments. This enhancement not only broadens the practical applications of virtual try-on in the real-world but also advances the field by demonstrating a novel application of deep learning within the fashion industry, enabling more realistic and comprehensive virtual try-on experiences. Therefore, our work heralds a novel dataset and approach for virtually synthesising clothing in an accessible way for real-world scenarios.
  • Thumbnail Image
    Item
    Double-diffusive convection in rotating fluids under gravity modulation
    (University of the Witwatersrand, Johannesburg, 2024-09) Mathunyane, Alfred Ntobeng; Duba, C. Thama; Mason, D.P.
    This study employs the method of normal modes and linear stability analysis to investigate double-diffusive convection in a horizontally layered, rotating fluid, specifically focusing on its application to oceanic dynamics. Double diffusive convection arises when opposing gradients of salinity and temperature interact within a fluid, a phenomenon known as thermohaline convection, and it is crucial for the understanding of ocean circulation and its role in climate change. With the increasing mass of water due to glaciers melting, fluid pressure variations occur, leading to slight fluctuations in gravity. We conduct both stationary and oscillatory stability analyses to determine the onset of double-diffusive convection under gravity modulation. Our analysis reveals that time-dependent periodic modulation of gravitational fields can stabilize or destabilize thermohaline convection for both stationary and oscillatory convection, with amplitude stabilizing and frequency destabilizing. The wavenumber in the y- direction also affects convection in the equatorial regions. This wavenumber exhibits destabilizing effects for large values and stabilizing effects for small values for both stationary and oscillatory convection. Rotation along with gravity modulation tends to destabilize the system for both stationary and oscillatory convection. The key difference between stationary and oscillatory convection is that oscillatory convection exhibits large values of the Rayleigh number, thus susceptible to overstability while stationary convection tends to have relatively smaller Rayleigh numbers and thus more stable. This research provides insights into the complex interplay between gravity modulation and thermohaline convection, contributing to our understanding of ocean dynamics and their implications for climate change.
  • Thumbnail Image
    Item
    BiCoRec: Bias-Mitigated Context-Aware Sequential Recommendation Model
    (University of the Witwatersrand, Johannesburg, 2024-09) Muthivhi, Mufhumudzi; van Zyl, Terence; Bau, Hairong
    Sequential recommendation models aim to learn from users’ evolving preferences. However, current state-of-the-art models suffer from an inherent popularity bias. This study developed a novel framework, BiCoRec, that adaptively accommodates users’ changing preferences for popular and niche items. Our approach leverages a co-attention mechanism to obtain a popularity-weighted user sequence representation, facilitating more accurate predictions. We then present a new training scheme that learns from future preferences using a consistency loss function. The analysis of the experimental results shows that our approach is 7% more capable of uncovering the most relevant items.
  • Thumbnail Image
    Item
    Developing a Bayesian Network Model to Predict Students’ Performance Based on the Analysis of their Higher Education Trajectory
    (University of the Witwatersrand, Johannesburg, 2024-08) Ramaano, Thabo Victor; Jadhav, Ashwini; Ajoodha, Ritesh
    The Admission Point Score (APS) metric, utilised as a response to admit prospective students for an academic course, may appear effective in determining student success. In reality, almost 50% of students admitted to a science programme in a higher education institution failed to meet all the requirements necessary to complete the programme during the period of 2008 and 2015. This had a direct impact on the overall graduation throughput. Thus, the focus of this research was geared towards the adoption of a probabilistic graphical approach to advocate its mechanism as a viable alternative to the APS metric when determining student success trajectories at a higher education level. The purpose of this approach was to provide higher education institutions with a system to monitor students’ academic performance en-route to graduation from a probabilistic and graphical point of view. This research employed a probability distribution distance metric to ascertain how close the learned models were to the true model for varying sample sizes. The significance of these results addressed the need for knowledge discovery of dependencies that existed between the students’ module results in a higher education trajectory that spans three years.
  • Thumbnail Image
    Item
    A Continuous Reinforcement Learning Approach to Self-Adaptive Particle Swarm Optimisation
    (University of the Witwatersrand, Johannesburg, 2023-08) Tilley, Duncan; Cleghorn, Christopher
    Particle Swarm Optimisation (PSO) is a popular black-box optimisation technique due to its simple implementation and surprising ability to perform well on various problems. Unfortunately, PSO is fairly sensitive to the choice of hyper-parameters. For this reason, many self-adaptive techniques have been proposed that attempt to both simplify hyper-parameter selection and improve the performance of PSO. Surveys however show that many self-adaptive techniques are still outperformed by time-varying techniques where the value of coefficients are simply increased or decreased over time. More recent works have shown the successful application of Reinforcement Learning (RL) to learn self-adaptive control policies for optimisers such as differential evolution, genetic algorithms, and PSO. However, many of these applications were limited to only discrete state and action spaces, which severely limits the choices available to a control policy, given that the PSO coefficients are continuous variables. This dissertation therefore investigates the application of continuous RL techniques to learn a self-adaptive control policy that can make full use of the continuous nature of the PSO coefficients. The dissertation first introduces the RL framework used to learn a continuous control policy by defining the environment, action-space, state-space, and a number of possible reward functions. An effective learning environment that is able to overcome the difficulties of continuous RL is then derived through a series of experiments, culminating in a successfully learned continuous control policy. The policy is then shown to perform well on the benchmark problems used during training when compared to other self-adaptive PSO algorithms. Further testing on benchmark problems not seen during training suggest that the learned policy may however not generalise well to other functions, but this is shown to also be a problem in other PSO algorithms. Finally, the dissertation performs a number of experiments to provide insights into the behaviours learned by the continuous control policy.
  • Thumbnail Image
    Item
    Evaluating Pre-training Mechanisms in Deep Learning Enabled Tuberculosis Diagnosis
    (University of the Witwatersrand, Johannesburg, 2024) Zaranyika, Zororo; Klein, Richard
    Tuberculosis (TB) is an infectious disease caused by a bacteria called Mycobacterium Tuberculosis. In 2021, 10.6 million people fell ill because of TB and about 1.5 million lives are lost from TB each year even though TB is a preventable and curable disease. The latest global trends in TB death cases are shown in 1.1. To ensure a higher survival rate and prevent further transmissions, it is important to carry out early diagnosis. One of the critical methods of TB diagnosis and detection is the use of posterior-anterior chest radiographs (CXR). The diagnosis of Tuberculosis and other chest-affecting dis- eases like Pneumoconiosis is time-consuming, challenging and requires experts to read and interpret chest X-ray images, especially in under-resourced areas. Various attempts have been made to perform the diagnosis using deep learning methods such as Convolutional Neural Networks (CNN) using labelled CXR images. Due to the nature of CXR images in maintaining a consistent structure and overlapping visual appearances across different chest-affecting diseases, it is reasonable to believe that visual features learned in one disease or geographic location may transfer to a new TB classificationmodel. This would allow us to leverage large volumes of labelled CXR images available online hence decreasing the data required to build a local model. This work will explore to what extent such pre-training and transfer learning is useful and whether it may help decrease the data required for a locally trained classifier. In this research, we investigated various pre-training regimes using selected online datasets to under- stand whether the performance of such models can be generalised towards building a TB computer-aided diagnosis system and also inform us on the nature and size of CXR datasets we should be collecting. Our experiment results indicated that both supervised and self-supervised pre-training between the CXR datasets cannot significantly improve the overall performance metrics of a TB. We noted that pre-training on the ChestX-ray14, CheXpert, and MIMIC-CXR datasets resulted in recall values of over 70% and specificity scores of at least 90%. There was a general decline in performance in our experiments when we pre-trained on one dataset and fine-tuned on a different dataset, hence our results were lower than baseline experiment results. We noted that ImageNet weights initialisation yields superior results over random weights initialisation on all ex- periment configurations. In the case of self-supervised pre-training, the model reached acceptable metrics with a minimum number of labels as low as 5% when we fine-tuned on the TBX11k dataset, although slightly lower in performance compared to the super-vised pre-trained models and the baseline results. The best-performing self-supervised pre-trained model with the least number of training labels was the MoCo-ResNet-50 model pre-trained on the VinDr-CXR and PadChest datasets. These model configura- tions achieved recall scores of 81.90% and a specificity score of 81.99% on VinDr-CXR pre-trained weights while the PadChest weights scored a recall of 70.29% and a speci- ficity of 70.22%. The other self-supervised pre-trained models failed to reach scores of at least 50% on both recall or specificity with the same number of labels