Electronic Theses and Dissertations (Masters)

Permanent URI for this collectionhttps://hdl.handle.net/10539/38006

Browse

Search Results

Now showing 1 - 10 of 49
  • Thumbnail Image
    Item
    Causal Inference in Water Distribution Networks to Quantify the Effects of Network Damage
    (University of the Witwatersrand, Johannesburg, 2025-05) Rammutloa, Katlego Lucas; Mulaudzi, Rudzani; Ajoodha, Ritesh
    Water Distribution Networks (WDNs) are engineered systems of interconnected pipes, pumps, and reservoirs that deliver potable water from treatment plants to consumers. These networks are critical to public health but are highly vulnerable to structural damage (e.g., leaks, pipe corrosion), which disrupts water flow and complicates impact prediction. Current methods for assessing damage—such as hydraulic simulations and machine learning—rely on statistical correlations or optimisation, failing to model causal relationships. This limits their ability to predict cascading effects or guide repairs under uncertainty. This study addresses these limitations by applying a causal inference framework for analysing WDNs. The framework leverages graphical causal models to represent the network’s structure and quantifies the impact of damage on water flow predictions. Using Average Treatment Effect (ATE) and Mean Squared Error (MSE) metrics, we analyse how structural damage affects prediction accuracy across different network regions. The framework focuses on three critical areas: source nodes (reservoirs and entry points), mid-network nodes (junction points and main distribution pipes), and consumer nodes (end-user connection points). Experiments on a simulated WDN reveal that damage affecting 40% or more of the network significantly compromises predictive accuracy. Mid-network and consumer nodes prove particularly vulnerable, with damage to these locations causing the greatest disruption to flow predictions. In contrast, source nodes demonstrate greater resilience due to built-in redundancies. Additionally, the study finds that treatment locations closer to outcome variables maintain predictive accuracy longer under damage conditions. By integrating causal inference into WDN analysis, this research provides network operators with a robust methodology for evaluating damage impacts and offers actionable insights for improving network resilience. The findings contribute to both infrastructure management practices and the broader application of causal inference to complex systems analysis.
  • Thumbnail Image
    Item
    Cross-domain few-shot classification for remote sensing imagery
    (University of the Witwatersrand, Johannesburg, 2025-04) Pillay, Christopher Wayne; Bau, Hairong
    Deep learning has proven highly effective for scene classification tasks when substantial quantities of labelled data are accessible. However, performance decreases when applied to domains such as remote sensing which typically possess a limited quantity of labelled data across available datasets. Few-shot learning has been developed as one of the promising solutions to this problem. It has the ability to recognise new categories with minimal labelled examples, but it assumes that the training and testing data will exhibit identical feature distributions. This assumption is unrealistic in real-world contexts where data can originate from different domains and poses a challenge when a significant domain shift exists between the training and testing data. This dissertation aims to address these limitations by proposing the Cross-Domain Attention Network (CDAN). It is a network designed specifically to solve the issues that arise when there is a limited quantity of labelled data available and a significant domain shift exists between the training and testing data. The network proposed consists of a prototypical network as the base and three additions that contribute to the accurate scene classification of remote sensing imagery. Firstly, a cross-domain data augmentation technique is proposed with few-shot learning to reduce domain shift. The cross-domain data augmentation technique facilitates enhanced knowledge transfer between domains and increases the adaptation ability of the network, whereas few-shot learning reduces the network’s reliance on large labelled datasets. Secondly, a dynamic and focused attention module is proposed to improve discriminative capacity of the network by increasing the focus on important channels and spatial regions within images during training. Thirdly, an adaptive task aware loss is proposed to further enhance the network’s adaptive capacity by leveraging information in few-shot training tasks. Extensive experiments are carried out with different remote imaging classification datasets (RSICB, AID and NWPU-RESISC45) to prove that the proposed network alleviates concerns in a cross-domain few-shot (CDFS) classification setting.
  • Thumbnail Image
    Item
    A Bayesian Approach to Maximise Photovoltaic System Output
    (University of the Witwatersrand, Johannesburg, 2025-05) Noel, Keanu; Ajoodha, Ritesh
    This thesis addresses the critical issue of optimising photovoltaic (PV) system output, an essential objective in the pursuit of efficient and scalable renewable energy solutions. As global energy demands rise and concerns over climate change intensify, solar power has emerged as a leading solution for sustainable electricity generation. However, the performance of PV systems is highly sensitive to environmental factors which can vary significantly across seasons and geographical locations. These fluctuations create a complex optimisation problem in determining the most effective system configuration that can dynamically adapt to seasonal and regional variations in solar potential. Traditional approaches often rely on fixed or rule-based models that do not adequately account for these variations, leading to suboptimal energy yields and the inefficient use of solar infrastructure. In this research, a Bayesian Network model is developed to learn the conditional dependencies between meteorological variables (such as solar irradiance, temperature, and wind speed) and PV system configuration parameters (tilt angle, orientation, inverter properties). By using a Bayesian approach, the developed model accommodates uncertainty and dynamically adjusts PV system tilt configurations to weather variations, aiming to maximise PV output. Score-based methods are employed to construct the network structure, and Maximum Likelihood Estimation (MLE) to determine the Conditional Probability Distributions (CPDs) of the network. Additionally, Maximum a Posteriori (MAP) estimation is applied to identify the optimal seasonal PV system tilt configurations in light of specific weather conditions. Key findings demonstrate the effectiveness of the model in optimising PV output by offering adaptive configuration strategies that respond to local seasonal meteorological patterns. This includes the superior performance of the Hill Climb Search algorithm compared to Simulated Annealing for structure learning, the utilisation of MAP Estimation for identifying optimal PV system tilt configurations under varying meteorological conditions, and the statistically significant advantage of dynamic configurations over fixed installations for enhancing PV system output. These results underscore the potential of Bayesian approaches for data-driven optimisation in renewable energy systems. This research provides a robust framework that enhances PV system performance and contributes to the growing body of knowledge on renewable energy optimisation through probabilistic modelling. Ultimately, this research presents a novel, data-driven methodology which informs the design and operation of more efficient PV systems, answering both the “so what” and “now what” in the context of sustainable energy advancements.
  • Thumbnail Image
    Item
    The Effects of Node Removal on Bayesian Network Resilience for ATM Network Transaction Vulnerabilities
    (University of the Witwatersrand, Johannesburg, 2025-05) Matafeni, Gcobisile; Ajoodha, Ritesh; Olukanmi, Seun
    We investigate the evaluation of influence relationships in probabilistic graphical models, focusing on the impact of node removal (mutilation) within Bayesian networks. The central problem addressed is understanding how the joint probability distribution and influence structure among interconnected variables evolve when a subset of nodes is removed, an issue relevant to various real-world systems experiencing disruptions. We model these dynamics using Bayesian learning to provide insights into network resilience and dependencies. To explore these effects, we generate synthetic Bayesian network structures that are tree-like, sparse, and dense, each representing different real-world configurations found in machine learning, sensor networks, and financial modeling. Conditional Probability Distributions (CPDs) were assigned to nodes based on the Bernoulli distribution. The Kullback-Leibler (KL) divergence quantified the deviations in influence structures post-removal, with evaluation of structure recovery employing an exact inference technique. Our findings indicate that each network type exhibits distinct responses to node removal: tree-like structures stabilize quickly with increased data, sparse structures show higher sensitivity but recover efficiently, and dense structures offer robustness through redundancy, though they demand larger datasets. These findings have significant implication for optimizing complex systems, particularly those requiring resilient network architectures. As a real-world application, we model ATM transaction networks to analyze how the removal of ATMs (due to vandalism, load shedding, or maintenance) impacts transaction flows. Our results show that high-traffic ATMs serve as critical nodes, significantly influencing neighboring ATMs when removed. By applying Bayesian structure learning, we demonstrate that optimal ATM network configurations can be identified to minimize disruption and improve financial service resilience. This study contributes to the growing field of probabilistic graphical models by introducing a novel approach to understanding influence dynamics in mutilated networks. It provides practical insights and lays a foundation for further research into complex systems where node integrity and network stability are critical for decision-making and operational efficiency.
  • Thumbnail Image
    Item
    A Computational Review of The Maximum Clique Problem: Classical vs. Quantum
    (University of the Witwatersrand, Johannesburg, 2025-01) Kassim, Shawal; Ali, Montaz; Nape, Isaac
    The Maximum Clique Problem (MCP) is a fundamental combinatorial optimization problem with wide-ranging applications. Addressing the MCP involves identifying the largest complete subset of vertices in a network graph, a task known to be NP-hard and computationally intensive. With the emergence of quantum computing (QC), there’s growing interest in leveraging quantum algorithms (QAs) to tackle combinatorial problems more efficiently. This research report aims to compare classical methods, such as Branch and Bound (B&B), Dynamic Programming (DP) and Nuclear Norm Minimization (NNM), against quantum methods, including the Variational Quantum Eigensolver (VQE), Quantum Approximation Optimization Algorithm (QAOA) and Grover’s Algorithm (GA), for solving the MCP. Specifically, we investigate whether quantum methods exhibit a computational speedup or offer improvements in solution quality compared to their classical counterparts. Our analysis utilizes IBM’s noisy quantum computers (QCs) for implementing and evaluating the performance of the quantum algorithms. By conducting a comparative study, we seek to gain insights into possible computational advantages of QCs in addressing combinatorial problems such as the MCP.
  • Thumbnail Image
    Item
    Learning Operators with NEAT for Boolean Composition in Reinforcement Learning
    (University of the Witwatersrand, Johannesburg, 2025-06) Esterhuysen, Amir; Rosman, Benjamin; James, Steven; Tasse, Geraud Nangue
    The idea of skill composition has been gaining traction within reinforcement learning research. This compositional approach promotes efficient use of knowledge and represents a realistic, human-like style of learning. Existing work has demonstrated how simple skills can be composed using Boolean operators to solve new, unseen tasks without the need for further learning. However, this approach assumes that the learned value functions for each atomic skill are optimal—an assumption that is violated in most practical cases. We thus propose a method that instead learns operators for composition using evolutionary strategies. Our approach is empirically verified first in a tabular setting and then in a high dimensional function approximation environment. Results demonstrate outperformance of existing composition methods when faced with learned, suboptimal behaviours, while also promoting the development of robust agents and allowing for fluid transfer between domains.
  • Thumbnail Image
    Item
    Generalized Task Learning for Robots: Unifying Task Hierarchies through Contrastive Learning
    (University of the Witwatersrand, Johannesburg, 2025-06) Alexander, Ryan Austin; James, Steven; Klein, Richard
    This dissertation addresses the challenge of enabling robots to generalize across unseen household tasks by learning abstract task structures from demonstration data. We develop a three-stage pipeline that translates natural language instructions and demonstrations into hierarchical task representations using large language models, clustering, and parameterized generalization. Our approach is tested and evaluated on the ALFRED benchmark [Shridhar et al. 2020]. ALFRED acts as a standardized measure used for training models to comprehend and follow instructions in natural language. It leverages first-person perspective visual input to carry out a series of actions for various household tasks. While this approach doesn’t represent the state-of-the-art, it establishes a foundation for future research to build upon.
  • Thumbnail Image
    Item
    Parameter-Efficient Fine-Tuning of Pre-trained Large Language Models for Financial Text Analysis
    (University of the Witwatersrand, Johannesburg, 2024-07) Langa, Kelly Kiba; Bau, Hairong; Okuboyejo, Olaperi
    The recent advancements in natural language processing (NLP) have been largely fueled by the emergence of large language models (LLMs), which excel in capturing the complex semantic and syntactic structures of natural language. These models have revolutionized NLP tasks by leveraging transfer learning, where pre-trained LLMs are fine-tuned on domain-specific datasets. Financial sentiment analysis poses unique challenges due to the intricate nature of financial language, often necessitating more sophisticated approaches beyond what traditional sentiment analysis methods offer. Fine-tuning LLMs holds potential for improving modeling performance within the financial domain, but the computational expense of the standard full fine-tuning poses a challenge. This study investigates the efficacy of Parameter-Efficient Fine-Tuning (PEFT) methods for fine-tuning LLMs to specific tasks, with a focus on sentiment analysis in the financial domain. Through extensive analysis of PEFT methods, including Low-Rank Adaptation (LoRA), prompt tuning, prefix tuning, and adapters, several critical insights have emerged. The results demonstrate that by employing PEFT methods, performance levels that match or surpass those of full fine-tuning can be achieved. Particularly, adapting the Open Pre-trained Transformers (OPT) model with LoRA achieved the highest modeling performance, with an accuracy of 89%, while utilizing 0.19% of the model’s total parameters. This highlights the high modularity of PEFT methods, necessitating minimal storage sizes for trainable parameters, ranging from 0.1MB to 7MB for the OPT model. Despite slower convergence rates than full fine-tuning, PEFT methods resulted in substantial reductions in Graphics Processing Unit (GPU) memory consumption, with savings of up to 80%. Small-scale fine-tuned LLMs outperformed large-scale general-purpose LLMs such as ChatGPT, emphasizing the importance of domain-specific fine-tuning. Model head fine-tuning fell short compared to PEFT methods, suggesting additional benefits from training more layers. Compared to state-of-the-art non-LLM-based deep learning models, Long Short-Term Memory (LSTM), LLMs demonstrated superiority achieving a 17% increase in accuracy, thereby validating their higher implementation costs.
  • Thumbnail Image
    Item
    The Application of Attribution Methods to Explain an End-To-End Model For Self-Driving Cars
    (University of the Witwatersrand, Johannesburg, 2024-09) Chalom, Jason Max; Klein, Richard
    There has been significant development in producing autonomous vehicles but a growing concern is in understanding how these systems work, and why certain decisions were made. This has a direct impact on the safety of people who may come into contact with these systems. This research reproduced the experimental setup for an end-to-end system by Bojarski et al. [2016b]. End-to-end self-driving AI structures are built on top of black-box machine learning techniques. The source code can be found here: https://github.com/TRex22/ masters-gradsum. An allure of end-to-end structures is that they need very little human input once trained on large datasets and therefore have a much lower barrier to entry, but they are also harder to understand and interpret as a result. Bojarski et al. [2016b] defined a reduced problem space setup for a self-driving vehicle. This task only has a forward-facing camera which generates RGB images as input, and only the vehicle’s steering angle is the output. This work expanded the setup to include six CNN model architectures over the single model used by Bojarski et al. [2016b] to compare the behaviours, outputs and performance of the varying architectures. There have been recent developments in applying attribution methods to deep neural networks in order to understand the relationship between the features present in the input data and the output. GradCAM is an example of an attribution technique which has been validated by Adebayo et al. [2018]. We devised an attribution analysis scheme called GradSUM which is applied to the models throughout their training and evaluation phases in order to explain what features of the input data are being extracted by the models. This scheme uses GradCAM and uses segmentation maps to correlate inputted semantic information using the resultant gradient maps. This produces a model profile for an epoch which can then be used to analyse that epoch. Six models were trained, and their performance compared using their MSE loss. An autonomy metric (MoA) common in literature was also used. This metric tracks where a human has to take over to stop a dangerous situation. The models produced good loss results. Two model architectures were constructed to be simple in order to compare against the more complex models. These performed well on the loss and MoA metrics for the training data subset but performed poorly on other data. They were used as a control to make sure that the proposed GradSUM scheme would adequately help compare emergent behaviours between architectures. Using GradSUM on the architectures, we found that three out of the six models were able to learn meaningful contextual information. The other three models did not learn anything meaningful. The two trained simple models’ overall activation percentages were also close to zero, indicating these simple model architectures did not learn enough useful information or became over-trained on features not useful to safely driving a vehicle.
  • Thumbnail Image
    Item
    Addressing Ambiguity in Human Robot Interaction using Compositional Reinforcement Learning and User Preference
    (University of the Witwatersrand, Johannesburg, 2024-09) Rajab, Jenalea Norma; Rosman, Benjamin; James, Steven
    The ability for social robots to integrate naturally with the lives of humans has many advantages in industry and assisted services. For effective Human Robot Interaction (HRI), social robots require communication abilities to understand an instruction from the user and perform tasks accordingly. Verbal communication is an intuitive natural interaction for non-expert users but it can also be a source of ambiguity, especially when there is also ambiguity in the environment (i.e. similar objects to be retrieved). Addressing ambiguity for task inference in HRI is an unsolved problem. Current approaches, that have been implemented in collaborative robots, include asking for clarifications from the user. Related research shows the promising results of using user preference in HRI, but no work has been found where user preference is employed specifically to address ambiguity in conjunction with clarifying questions. Additionally, these methods do not leverage knowledge learned from previous interactions with the environment and the life-long learning capabilities of Reinforcement Learning (RL) agents. Based on the related work and shortfalls, we propose a framework to address ambiguity in HRI (resulting from natural language instructions), that leverages the compositionality of learned object-attribute base-tasks in conjunction with user preference and clarifying questions for adaptive task inference. Evaluating our method in the BabyAI domain, we extensively test all components of our system and determine that our framework provides a viable solution for addressing the problem of ambiguity in HRI. We experimentally prove that our method improves user experience by decreasing the number of clarifying questions asked, while maintaining a high level of accuracy.