School of Computer Science and Applied Mathematics (ETDs)

Permanent URI for this communityhttps://hdl.handle.net/10539/38004

Browse

Search Results

Now showing 1 - 10 of 21

Counting Reward Automata: Exploiting Structure in Reward Functions Expressible in Decidable Formal Languages
(University of the Witwatersrand, Johannesburg, 2024-07) Bester, Tristan; Rosman, Benjamin; James, Steven; Tasse, Geraud Nangue
In general, reinforcement learning agents are restricted from directly accessing the environment model. This restricts the agent’s access to the environmental dynamics and reward models, which are only accessible through repeated environmental interactions. As reinforcement learning is well suited for use in complex environments, which are challenging to model, the general assumption that the transition probabilities associated with the environment are unknown is justified. However, as agents cannot discern rewards directly from the environment, reward functions must be designed and implemented for both simulated and real-world environments. As a result, the assumption that the reward model must remain hidden from the agent is unnecessary and detrimental to learning. Previously, methods have been developed that utilise the structure of the reward function to enable more sample-efficient learning. These methods employ a finite state machine variant to facilitate reward specification in a manner that exposes the internal structure of the reward function. This approach is particularly effective when solving long-horizon tasks as it enables the use of counterfactual reasoning with off-policy learning which significantly improves sample efficiency. However, as these approaches are dependent on finite-state machines, they are only able to express a small number of reward functions. This severely limits the applicability of these approaches as they cannot model simple tasks such as “fetch a coffee for each person in the office” which involves counting – one of the numerous properties finite state machines cannot model. This work addresses the limited expressiveness of current state machine-based approaches to reward modelling. Specifically, we introduce a novel approach compatible with any reward function which can be expressed as a well-defined algorithm We present the counting reward automaton – an abstract machine capable of modelling reward functions expressible in any decidable formal language. Unlike previous approaches to state machine-based reward modelling, which are limited to the expression of tasks as regular languages, our framework allows for tasks described by decidable formal languages. It follows that our framework is an extremely general approach to reward modelling – compatible with any task specification expressible as a well-defined algorithm. This is a significant contribution as it greatly extends the class of problems which can benefit from the improved learning techniques facilitated by state machine-based reward modelling. We prove that an agent equipped with such an abstract machine is able to solve an extended set of tasks. We show that this increase in expressive power does not come at the cost of increased automaton complexity. This is followed by the introduction of several learning algorithms designed to increase sample efficiency through the exploitation of automaton structure. These algorithms are based on counterfactual reasoning with off-policy RL and use techniques from the fields of HRL and reward shaping. Finally, we evaluate our approach in several domains requiring long-horizon plans. Empirical results demonstrate that our method outperforms competing approaches in terms of automaton complexity, sample efficiency, and task completion.
Flood Susceptibility Modeling in the uMhlatuzana River Catchment using Computer Vision-Based Deep Learning Techniques
(University of the Witwatersrand, Johannesburg, 2024-10) Chirindza, Jonas; Ajoodha, Ritesh; Knight, Jasper
In this study, covolutional neural networks (CNN) models are employed for flood susceptibility modeling in the uMhalatuzana River catchment in KwaZulu-Natal, South Africa. The CNN models, including 1D-CNN, 2D-CNN, and 3D-CNN, pro-vide a detailed assessment of flood vulnerability in the region. The models use di- verse spatial information, such as topography, land use, and hydrological features, to estimate the likelihood of flooding in different areas of the catchment. The flood susceptibility maps within the uMhalatuzana River catchment, classified into five risk zones namely, ‘very low’, ‘low’, ‘moderate’, ‘high’ and ‘very high’ susceptibility zone, serve as proactive instruments for risk mitigation and disaster management. The 1D-CNN model displays strong overall performance in flood susceptibility modeling, evident in key metrics such as accuracy, precision, recall, area under curve (AUC) score, and F1-score. The results suggest that the model effectively captures patterns in the input data, emphasizing its potential for flood susceptibility modeling. Moreover, the 2D-CNN model outperforms the 1D-CNN, achieving higher values when evaluated using various performance metrics. Finally, the 3D-CNN model outperformed both the 1D-CNN and 2D-CNN, emphasizing its predictive abilities in flood susceptibility modelling. The flood susceptibility maps produced by the 1D-CNN model, shows that most of the study area exhibits very low flood susceptibility (96.4%), with localized areas of higher susceptibility, particularly in the very high-risk category (2.53%). The 2D CNN model demonstrates a more diverse risk distribution, with a substantial portion having very low susceptibility (74.19%) and significant areas of higher risk, notably in the very high-risk category (10.93%). The 3D-CNN model emphasizes a spatial pattern where a large portion has very low susceptibility (84.10%), but with a concentration of high and very high-risk areas, comprising 12.34% of the total area. Finally, the consistent identification of higher risk susceptibility areas enhances the robustness of the assessments. The models’ high accuracy and detailed risk assessments provide valuable tools for decision-makers, urban planners, and emergency response teams in the uMhalatuzana River catchment. The precision of the models facilitates informed strategies for flood risk management, including targeted interventions such as improved drainage systems and early warning systems.
Envisioning the Future of Fashion: The Creation And Application Of Diverse Body Pose Datasets for Real-World Virtual Try-On
(University of the Witwatersrand, Johannesburg, 2024-08) Molefe, Molefe Reabetsoe-Phenyo; Klein, Richard
Fashion presents an opportunity for research methods to unite machine learning concepts with e-commerce to meet the growing demands of consumers. A recent development in intelligent fashion research envisions how individuals might appear in different clothes based on their selection, a process known as “virtual try-on”. Our research introduces a novel dataset that ensures multi-view consistency, facilitating the effective warping and synthesis of clothing onto individuals from any given perspective or pose. This addresses a significant shortfall in existing datasets, which struggle to recognise various views, thus limiting the versatility of virtual try-on. By fine-tuning state-of-the-art architectures on our dataset, we expand the utility of virtual try-on, making them more adaptable and robust across a diverse range of scenarios. A noteworthy additional advantage of our dataset is its capacity to facilitate 3D scene reconstruction. This capability arises from utilising a sparse collection of images captured from multiple angles, which, while primarily aimed at enriching 2D virtual try-on, inadvertently supports the simulation of 3D environments. This enhancement not only broadens the practical applications of virtual try-on in the real-world but also advances the field by demonstrating a novel application of deep learning within the fashion industry, enabling more realistic and comprehensive virtual try-on experiences. Therefore, our work heralds a novel dataset and approach for virtually synthesising clothing in an accessible way for real-world scenarios.
BiCoRec: Bias-Mitigated Context-Aware Sequential Recommendation Model
(University of the Witwatersrand, Johannesburg, 2024-09) Muthivhi, Mufhumudzi; van Zyl, Terence; Bau, Hairong
Sequential recommendation models aim to learn from users’ evolving preferences. However, current state-of-the-art models suffer from an inherent popularity bias. This study developed a novel framework, BiCoRec, that adaptively accommodates users’ changing preferences for popular and niche items. Our approach leverages a co-attention mechanism to obtain a popularity-weighted user sequence representation, facilitating more accurate predictions. We then present a new training scheme that learns from future preferences using a consistency loss function. The analysis of the experimental results shows that our approach is 7% more capable of uncovering the most relevant items.
3D Human pose estimation using geometric self-supervision with temporal methods
(University of the Witwatersrand, Johannesburg, 2024-09) Bau, Nandi; Klein, Richard
This dissertation explores the enhancement of 3D human pose estimation (HPE) through self-supervised learning methods that reduce reliance on heavily annotated datasets. Recognising the limitations of data acquired in controlled lab settings, the research investigates the potential of geometric self-supervision combined with temporal information to improve model performance in real-world scenarios. A Temporal Dilated Convolutional Network (TDCN) model, employing Kalman filter post-processing, is proposed and evaluated on both ground-truth and in-the-wild data from the Human3.6M dataset. The results demonstrate a competitive Mean Per Joint Position Error (MPJPE) of 62.09mm on unseen data, indicating a promising direction for self-supervised learning in 3D HPE and suggesting a viable pathway towards reducing the gap with fully supervised methods. This study underscores the value of self-supervised temporal dynamics in advancing pose estimation techniques, potentially making them more accessible and broadly applicable in real-world applications.
Creating an adaptive collaborative playstyle-aware companion agent
(University of the Witwatersrand, Johannesburg, 2023-09) Arendse, Lindsay John; Rosman, Benjamin
Companion characters in video games play a unique part in enriching player experience. Companion agents support the player as an ally or sidekick and would typically help the player by providing hints, resources, or even fight along-side the human player. Players often adopt a certain approach or strategy, referred to as a playstyle, whilst playing video games. Players do not only approach challenges in games differently, but also play games differently based on what they find rewarding. Companion agent characters thus have an important role to play by assisting the player in a way which aligns with their playstyle. Existing companion agent approaches fall short and adversely affect the collaborative experience when the companion agent is not able to assist the human player in a manner consistent with their playstyle. Furthermore, if the companion agent cannot assist in real time, player engagement levels are lowered since the player will need to wait for the agent to compute its action - leading to a frustrating player experience. We therefore present a framework for creating companion agents that are adaptive such that they respond in real time with actions that align with the player’s playstyle. Companion agents able to do so are what we refer to as playstyle-aware. Creating a playstyle-aware adaptive agent firstly requires a mechanism for correctly classifying or identifying the player style, before attempting to assist the player with a given task. We present a method which can enable the real time in-game playstyle classification of players. We contribute a hybrid probabilistic supervised learning framework, using Bayesian Inference informed by a K-Nearest Neighbours based likelihood, that is able to classify players in real time at every step within a given game level using only the latest player action or state observation. We empirically evaluate our hybrid classifier against existing work using MiniDungeons, a common benchmark game domain. We further evaluate our approach using real player data from the game Super Mario Bros. We out perform our comparative study and our results highlight the success of our framework in identifying playstyles in a complex human player setting. The second problem we explore is the problem of assisting the identified playstyle with a suitable action. We formally define this as the ‘Learning to Assist’ problem, where given a set of companion agent policies, we aim to determine the policy which best complements the observed playstyle. An action is complementary such that it aligns with the goal of the playstyle. We extend MiniDungeons into a two-player game called Collaborative MiniDungeons which we use to evaluate our companion agent against several comparative baselines. The results from this experiment highlights that companion agents which are able to adapt and assist different playstyles on average bring about a greater player experience when using a playstyle specific reward function as a proxy for what the players find rewarding. In this way we present an approach for creating adaptive companion agents which are playstyle-aware and able to collaborate with players in real time.
Procedural Content Generation for video game levels with human advice
(University of the Witwatersrand, Johannesburg, 2023-07) Raal, Nicholas Oliver; James, Steven
Video gaming is an extremely popular form of entertainment around the world and new video game releases are constantly being showcased. One issue with the video gaming industry is that game developers require a large amount of time to develop new content. A research field that can help with this is procedural content generation (PCG) which allows for an infinite number of video game levels to be generated based on the parameters provided. Many of the methods found in literature can generate content reliably that adhere to quantifiable characteristics such as playability, solvability and difficulty. These methods do not however, take into account the aesthetics of the level which is the parameter that makes them more reasonable levels for human players. In order to address this issue, we propose a method of incorporating high level human advice into the PCG loop. The method uses pairwise comparisons as a way in which a score can be assigned to a level based on its aesthetics. Using the score along with a feature vector describing each level, an SVR model is trained that will allow for a score to be assigned to unseen video game levels. This predicted score is used as an additional fitness function of a multi objective genetic algorithm (GA) and can be optimised as a standard fitness function would. We test the proposed method on two 2D platformer video games, Maze and Super Mario Bros (SMB), and our results show that the proposed method can successfully be used to generate levels with a bias towards the human preferred aesthetical features, whilst still adhering to standard video game characteristics such as solvability. We further investigate incorporating multiple inputs from a human at different stages of the PCG life cycle and find that it does improve the proposed method, but further testing is still required. The findings of this research is hopefully going to assist in using PCG in the video game space to create levels that are more aesthetically pleasing to a human player.
Play-style Identification and Player Modelling for Generating Tailored Advice in Video Games
(University of the Witwatersrand, Johannesburg, 2023-09) Ingram, Branden Corwin; Rosman, Benjamin; Van Alten, Clint; Klein, Richard
Recent advances in fields such as machine learning have enabled the development of systems that are able to achieve super-human performance on a number of domains, specifically in complex games such as Go and StarCraft. Based on these successes, it is reasonable to ask if these learned behaviours could be utilised to improve the performance of humans on the same tasks. However, the types of models used in these systems are typically not easily interpretable, and can not be directly used to improve the performance of a human. Additionally, humans tend to develop stylistic traits based on preference which aid in solving problems or competing at high levels. This thesis looks to address these difficulties by developing an end-to-end pipeline that can provide beneficial advice tailored to a player’s style in a video game setting. Towards this end, we demonstrate the ability to firstly cluster variable length multi-dimensional gameplay trajectories with respect to play-style in an unsupervised fashion. Secondly, we demonstrate the ability to learn to model an individual player’s actions during gameplay. Thirdly we demonstrate the ability to learn policies representative of all the play-styles identified with an environment. Finally, we demonstrate how the utilisation of these components can generate advice which is tailored to the individual’s style. This system would be particularly useful for improving tutorial systems that quickly become redundant lacking any personalisation. Additionally, this pipeline serves as a way for developers to garner insights on their player base which can be utilised for more informed decision-making on future feature releases and updates. For players, they gain a useful tool which can be utilised to learn how to play better as well identify as the characteristics of their gameplay as well as opponents. Furthermore, we contend that our approach has the potential to be employed in a broad range of learning domains.
Self Supervised Salient Object Detection using Pseudo-labels
(University of the Witwatersrand, Johannesburg, 2023-08) Bachan, Kidhar; Wang, Hairong
Deep Convolutional Neural Networks have dominated salient object detection methods in recent history. A determining factor for salient object detection network performance is the quality and quantity of pixel-wise annotated labels. This annotation is performed manually, making it expensive (time-consuming, tedious), while limiting the training data to the available annotated datasets. Alternatively, unsupervised models are able to learn from unlabelled datasets or datasets in the wild. In this work, an existing algorithm [Li et al. 2020] is used to refine the generated pseudo labels before training. This research focuses on the changes made to the pseudo label refinement algorithm and its effect on performance for unsupervised saliency object detection tasks. We show that using this novel approach leads to statistically negligible performance improvements and discuss the reasons why this is the case.
Rationalization of Deep Neural Networks in Credit Scoring
(University of the Witwatersrand, Johannesburg, 2023-07) Dastile, Xolani Collen; Celik, Turgay
Machine learning and deep learning, which are subfields of artificial intelligence, are undoubtedly pervasive and ubiquitous technologies of the 21st century. This is attributed to the enhanced processing power of computers, the exponential growth of datasets, and the ability to store the increasing datasets. Many companies are now starting to view their data as an asset, whereas previously, they viewed it as a by-product of business processes. In particular, banks have started to harness the power of deep learning techniques in their day-to-day operations; for example, chatbots that handle questions and answers about different products can be found on banks’ websites. One area that is key in the banking sector is the credit risk department. Credit risk is the risk of lending money to applicants and is measured using credit scoring techniques that profile applicants according to their risk. Deep learning techniques have the potential to identify and separate applicants based on their lending risk profiles. Nevertheless, a limitation arises when employing deep learning techniques in credit risk, stemming from the fact that these techniques lack the ability to provide explanations for their decisions or predictions. Hence, deep learning techniques are coined as non-transparent models. This thesis focuses on tackling the lack of transparency inherent in deep learning and machine learning techniques to render them suitable for adoption within the banking sector. Different statistical, classic machine learning, and deep learning models’ performances were compared qualitatively and quantitatively. The results showed that deep learning techniques outperform traditional machine learning models and statistical models. The predictions from deep learning techniques were explained using state-of-the-art explanation techniques. A novel model-agnostic explanation technique was also devised, and credit-scoring experts assessed its validity. This thesis has shown that different explanation techniques can be relied upon to explain predictions from deep learning and machine learning techniques.

School of Computer Science and Applied Mathematics (ETDs)

Browse

Filters

Settings

Sort By

Results per page

Search Results