School of Computer Science and Applied Mathematics (ETDs)

Permanent URI for this communityhttps://hdl.handle.net/10539/38004

Browse

Search Results

Now showing 1 - 10 of 43

Towards Lifelong Reinforcement Learning through Temporal Logics and Zero-Shot Composition
(2024-10) Tasse, Geraud Nangue; Rosman, Benjamin; James, Steven
This thesis addresses the fundamental challenge of creating agents capable of solving a wide range of tasks in their environments, akin to human capabilities. For such agents to be truly useful and be capable of assisting humans in our day-to-day lives, we identify three key abilities that general purpose agents should have: Flexibility, Instructability, and Reliability (FIRe). Flexibility refers to the ability of agents to adapt to various tasks with minimal learning; instructability involves the capacity for agents to understand and execute task specifications provided by humans in a comprehensible manner; and reliability entails agents’ ability to solve tasks safely and effectively with theoretical guarantees on their behavior. To build such agents, reinforcement learning (RL) is the framework of choice given that it is the only one that models the agent-environment interaction. It is also particularly promising since it has shown remarkable success in recent years in various domains—including gaming, scientific research, and robotic control. However, prevailing RL methods often fall short of the FIRe desiderata. They typically exhibit poor sample efficiency, demanding millions of environment interactions to learn optimal behaviors. Task specification relies heavily on hand-designed reward functions, posing challenges for non-experts in defining tasks. Moreover, these methods tend to specialize in single tasks, lacking guarantees on the broader adaptability and behavior robustness desired for lifelong agents that need solve multiple tasks. Clearly, the regular RL framework is not enough, and does not capture important aspects of what makes humans so general—such as the use of language to specify and understand tasks. To address these shortcomings, we propose a principled framework for the logical composition of arbitrary tasks in an environment, and introduce a novel knowledge representation called World Value Functions (WVFs) that will enable agents to solve arbitrary tasks specified using language. The use of logical composition is inspired by the fact that all formal languages are built upon the rules of propositional logics. Hence, if we want agents that understand tasks specified in any formal language, we must define what it means to apply the usual logic operators (conjunction, disjunction, and negation) over tasks. The introduction of WVFs is inspired by the fact that humans seem to always seek general knowledge about how to achieve a variety of goals in their environment, irrespective of the specific task they are learning. Our main contributions include: (i) Instructable agents: We formalize the logical composition of arbitrary tasks in potentially stochastic environments, and ensure that task compositions lead to rewards minimising undesired behaviors. (ii) Flexible agents: We introduce WVFs as a new objective for RL agents, enabling them to solve a variety of tasks in their environment. Additionally, we demonstrate zero-shot skill composition and lifelong sample efficiency. (iii) Reliable agents: We develop methods for agents to understand and execute both natural and formal language instructions, ensuring correctness and safety in task execution, particularly in real-world scenarios. By addressing these challenges, our framework represents a significant step towards achieving the FIRe desiderata in AI agents, thereby enhancing their utility and safety in a lifelong learning setting like the real world.
Applications of Recurrent Neural Networks in Modeling the COVID-19 Pandemic
(University of the Witwatersrand, Johannesburg, 2024-03) Hayashi, Kentaro; Mellado, Bruce
This study attempted to introduce moving averages and a feature selection method to the forecasting model, with the aim of improving the fluctuating values and unstable accuracy of the risk index developed by the University of Witwatersrand and iThemba LABS and used by the Gauteng Department of Health. It was confirmed that the introduction of moving averages improved the fluctuation of the values, with the seven-day moving average being the most effective. For feature selection, Correlation-based Feature Selection(CFS), the simplest of the filter methods with low computational complexity, was introduced as it is not possible to spend as much time as possible on daily operations due to providing information timely. The introduction of CFS was found to enable efficient feature selection.
Counting Reward Automata: Exploiting Structure in Reward Functions Expressible in Decidable Formal Languages
(University of the Witwatersrand, Johannesburg, 2024-07) Bester, Tristan; Rosman, Benjamin; James, Steven; Tasse, Geraud Nangue
In general, reinforcement learning agents are restricted from directly accessing the environment model. This restricts the agent’s access to the environmental dynamics and reward models, which are only accessible through repeated environmental interactions. As reinforcement learning is well suited for use in complex environments, which are challenging to model, the general assumption that the transition probabilities associated with the environment are unknown is justified. However, as agents cannot discern rewards directly from the environment, reward functions must be designed and implemented for both simulated and real-world environments. As a result, the assumption that the reward model must remain hidden from the agent is unnecessary and detrimental to learning. Previously, methods have been developed that utilise the structure of the reward function to enable more sample-efficient learning. These methods employ a finite state machine variant to facilitate reward specification in a manner that exposes the internal structure of the reward function. This approach is particularly effective when solving long-horizon tasks as it enables the use of counterfactual reasoning with off-policy learning which significantly improves sample efficiency. However, as these approaches are dependent on finite-state machines, they are only able to express a small number of reward functions. This severely limits the applicability of these approaches as they cannot model simple tasks such as “fetch a coffee for each person in the office” which involves counting – one of the numerous properties finite state machines cannot model. This work addresses the limited expressiveness of current state machine-based approaches to reward modelling. Specifically, we introduce a novel approach compatible with any reward function which can be expressed as a well-defined algorithm We present the counting reward automaton – an abstract machine capable of modelling reward functions expressible in any decidable formal language. Unlike previous approaches to state machine-based reward modelling, which are limited to the expression of tasks as regular languages, our framework allows for tasks described by decidable formal languages. It follows that our framework is an extremely general approach to reward modelling – compatible with any task specification expressible as a well-defined algorithm. This is a significant contribution as it greatly extends the class of problems which can benefit from the improved learning techniques facilitated by state machine-based reward modelling. We prove that an agent equipped with such an abstract machine is able to solve an extended set of tasks. We show that this increase in expressive power does not come at the cost of increased automaton complexity. This is followed by the introduction of several learning algorithms designed to increase sample efficiency through the exploitation of automaton structure. These algorithms are based on counterfactual reasoning with off-policy RL and use techniques from the fields of HRL and reward shaping. Finally, we evaluate our approach in several domains requiring long-horizon plans. Empirical results demonstrate that our method outperforms competing approaches in terms of automaton complexity, sample efficiency, and task completion.
Pricing Interest Rate Derivatives Using The Forward Market Model
(University of the Witwatersrand, Johannesburg, 2024-10) Konaite, Tshana Tumelo; Mudavanhu, Blessing
The IBOR are due to be discontinued and their replacements have been chosen to be the overnight rates. This change in the risk-free rate comes with challenges of how the new rates will be modelled and how the products will be priced. In this dissertation, we look to explore the classical short-rates and the new generalized Forward Market Model proposed by Andrei Lyanschenko and Fabio Mercurio in 2019. We seek to utilize this model in pricing interest rate derivatives such as caps and swaptions.
Flood Susceptibility Modeling in the uMhlatuzana River Catchment using Computer Vision-Based Deep Learning Techniques
(University of the Witwatersrand, Johannesburg, 2024-10) Chirindza, Jonas; Ajoodha, Ritesh; Knight, Jasper
In this study, covolutional neural networks (CNN) models are employed for flood susceptibility modeling in the uMhalatuzana River catchment in KwaZulu-Natal, South Africa. The CNN models, including 1D-CNN, 2D-CNN, and 3D-CNN, pro-vide a detailed assessment of flood vulnerability in the region. The models use di- verse spatial information, such as topography, land use, and hydrological features, to estimate the likelihood of flooding in different areas of the catchment. The flood susceptibility maps within the uMhalatuzana River catchment, classified into five risk zones namely, ‘very low’, ‘low’, ‘moderate’, ‘high’ and ‘very high’ susceptibility zone, serve as proactive instruments for risk mitigation and disaster management. The 1D-CNN model displays strong overall performance in flood susceptibility modeling, evident in key metrics such as accuracy, precision, recall, area under curve (AUC) score, and F1-score. The results suggest that the model effectively captures patterns in the input data, emphasizing its potential for flood susceptibility modeling. Moreover, the 2D-CNN model outperforms the 1D-CNN, achieving higher values when evaluated using various performance metrics. Finally, the 3D-CNN model outperformed both the 1D-CNN and 2D-CNN, emphasizing its predictive abilities in flood susceptibility modelling. The flood susceptibility maps produced by the 1D-CNN model, shows that most of the study area exhibits very low flood susceptibility (96.4%), with localized areas of higher susceptibility, particularly in the very high-risk category (2.53%). The 2D CNN model demonstrates a more diverse risk distribution, with a substantial portion having very low susceptibility (74.19%) and significant areas of higher risk, notably in the very high-risk category (10.93%). The 3D-CNN model emphasizes a spatial pattern where a large portion has very low susceptibility (84.10%), but with a concentration of high and very high-risk areas, comprising 12.34% of the total area. Finally, the consistent identification of higher risk susceptibility areas enhances the robustness of the assessments. The models’ high accuracy and detailed risk assessments provide valuable tools for decision-makers, urban planners, and emergency response teams in the uMhalatuzana River catchment. The precision of the models facilitates informed strategies for flood risk management, including targeted interventions such as improved drainage systems and early warning systems.
Envisioning the Future of Fashion: The Creation And Application Of Diverse Body Pose Datasets for Real-World Virtual Try-On
(University of the Witwatersrand, Johannesburg, 2024-08) Molefe, Molefe Reabetsoe-Phenyo; Klein, Richard
Fashion presents an opportunity for research methods to unite machine learning concepts with e-commerce to meet the growing demands of consumers. A recent development in intelligent fashion research envisions how individuals might appear in different clothes based on their selection, a process known as “virtual try-on”. Our research introduces a novel dataset that ensures multi-view consistency, facilitating the effective warping and synthesis of clothing onto individuals from any given perspective or pose. This addresses a significant shortfall in existing datasets, which struggle to recognise various views, thus limiting the versatility of virtual try-on. By fine-tuning state-of-the-art architectures on our dataset, we expand the utility of virtual try-on, making them more adaptable and robust across a diverse range of scenarios. A noteworthy additional advantage of our dataset is its capacity to facilitate 3D scene reconstruction. This capability arises from utilising a sparse collection of images captured from multiple angles, which, while primarily aimed at enriching 2D virtual try-on, inadvertently supports the simulation of 3D environments. This enhancement not only broadens the practical applications of virtual try-on in the real-world but also advances the field by demonstrating a novel application of deep learning within the fashion industry, enabling more realistic and comprehensive virtual try-on experiences. Therefore, our work heralds a novel dataset and approach for virtually synthesising clothing in an accessible way for real-world scenarios.
Double-diffusive convection in rotating fluids under gravity modulation
(University of the Witwatersrand, Johannesburg, 2024-09) Mathunyane, Alfred Ntobeng; Duba, C. Thama; Mason, D.P.
This study employs the method of normal modes and linear stability analysis to investigate double-diffusive convection in a horizontally layered, rotating fluid, specifically focusing on its application to oceanic dynamics. Double diffusive convection arises when opposing gradients of salinity and temperature interact within a fluid, a phenomenon known as thermohaline convection, and it is crucial for the understanding of ocean circulation and its role in climate change. With the increasing mass of water due to glaciers melting, fluid pressure variations occur, leading to slight fluctuations in gravity. We conduct both stationary and oscillatory stability analyses to determine the onset of double-diffusive convection under gravity modulation. Our analysis reveals that time-dependent periodic modulation of gravitational fields can stabilize or destabilize thermohaline convection for both stationary and oscillatory convection, with amplitude stabilizing and frequency destabilizing. The wavenumber in the y- direction also affects convection in the equatorial regions. This wavenumber exhibits destabilizing effects for large values and stabilizing effects for small values for both stationary and oscillatory convection. Rotation along with gravity modulation tends to destabilize the system for both stationary and oscillatory convection. The key difference between stationary and oscillatory convection is that oscillatory convection exhibits large values of the Rayleigh number, thus susceptible to overstability while stationary convection tends to have relatively smaller Rayleigh numbers and thus more stable. This research provides insights into the complex interplay between gravity modulation and thermohaline convection, contributing to our understanding of ocean dynamics and their implications for climate change.
BiCoRec: Bias-Mitigated Context-Aware Sequential Recommendation Model
(University of the Witwatersrand, Johannesburg, 2024-09) Muthivhi, Mufhumudzi; van Zyl, Terence; Bau, Hairong
Sequential recommendation models aim to learn from users’ evolving preferences. However, current state-of-the-art models suffer from an inherent popularity bias. This study developed a novel framework, BiCoRec, that adaptively accommodates users’ changing preferences for popular and niche items. Our approach leverages a co-attention mechanism to obtain a popularity-weighted user sequence representation, facilitating more accurate predictions. We then present a new training scheme that learns from future preferences using a consistency loss function. The analysis of the experimental results shows that our approach is 7% more capable of uncovering the most relevant items.
Developing a Bayesian Network Model to Predict Students’ Performance Based on the Analysis of their Higher Education Trajectory
(University of the Witwatersrand, Johannesburg, 2024-08) Ramaano, Thabo Victor; Jadhav, Ashwini; Ajoodha, Ritesh
The Admission Point Score (APS) metric, utilised as a response to admit prospective students for an academic course, may appear effective in determining student success. In reality, almost 50% of students admitted to a science programme in a higher education institution failed to meet all the requirements necessary to complete the programme during the period of 2008 and 2015. This had a direct impact on the overall graduation throughput. Thus, the focus of this research was geared towards the adoption of a probabilistic graphical approach to advocate its mechanism as a viable alternative to the APS metric when determining student success trajectories at a higher education level. The purpose of this approach was to provide higher education institutions with a system to monitor students’ academic performance en-route to graduation from a probabilistic and graphical point of view. This research employed a probability distribution distance metric to ascertain how close the learned models were to the true model for varying sample sizes. The significance of these results addressed the need for knowledge discovery of dependencies that existed between the students’ module results in a higher education trajectory that spans three years.
3D Human pose estimation using geometric self-supervision with temporal methods
(University of the Witwatersrand, Johannesburg, 2024-09) Bau, Nandi; Klein, Richard
This dissertation explores the enhancement of 3D human pose estimation (HPE) through self-supervised learning methods that reduce reliance on heavily annotated datasets. Recognising the limitations of data acquired in controlled lab settings, the research investigates the potential of geometric self-supervision combined with temporal information to improve model performance in real-world scenarios. A Temporal Dilated Convolutional Network (TDCN) model, employing Kalman filter post-processing, is proposed and evaluated on both ground-truth and in-the-wild data from the Human3.6M dataset. The results demonstrate a competitive Mean Per Joint Position Error (MPJPE) of 62.09mm on unseen data, indicating a promising direction for self-supervised learning in 3D HPE and suggesting a viable pathway towards reducing the gap with fully supervised methods. This study underscores the value of self-supervised temporal dynamics in advancing pose estimation techniques, potentially making them more accessible and broadly applicable in real-world applications.

School of Computer Science and Applied Mathematics (ETDs)

Browse

Filters

Settings

Sort By

Results per page

Search Results