Electronic Theses and Dissertations (PhDs)

Permanent URI for this collectionhttps://hdl.handle.net/10539/38005

Browse

Search Results

Now showing 1 - 8 of 8

Towards Lifelong Reinforcement Learning through Temporal Logics and Zero-Shot Composition
(2024-10) Tasse, Geraud Nangue; Rosman, Benjamin; James, Steven
This thesis addresses the fundamental challenge of creating agents capable of solving a wide range of tasks in their environments, akin to human capabilities. For such agents to be truly useful and be capable of assisting humans in our day-to-day lives, we identify three key abilities that general purpose agents should have: Flexibility, Instructability, and Reliability (FIRe). Flexibility refers to the ability of agents to adapt to various tasks with minimal learning; instructability involves the capacity for agents to understand and execute task specifications provided by humans in a comprehensible manner; and reliability entails agents’ ability to solve tasks safely and effectively with theoretical guarantees on their behavior. To build such agents, reinforcement learning (RL) is the framework of choice given that it is the only one that models the agent-environment interaction. It is also particularly promising since it has shown remarkable success in recent years in various domains—including gaming, scientific research, and robotic control. However, prevailing RL methods often fall short of the FIRe desiderata. They typically exhibit poor sample efficiency, demanding millions of environment interactions to learn optimal behaviors. Task specification relies heavily on hand-designed reward functions, posing challenges for non-experts in defining tasks. Moreover, these methods tend to specialize in single tasks, lacking guarantees on the broader adaptability and behavior robustness desired for lifelong agents that need solve multiple tasks. Clearly, the regular RL framework is not enough, and does not capture important aspects of what makes humans so general—such as the use of language to specify and understand tasks. To address these shortcomings, we propose a principled framework for the logical composition of arbitrary tasks in an environment, and introduce a novel knowledge representation called World Value Functions (WVFs) that will enable agents to solve arbitrary tasks specified using language. The use of logical composition is inspired by the fact that all formal languages are built upon the rules of propositional logics. Hence, if we want agents that understand tasks specified in any formal language, we must define what it means to apply the usual logic operators (conjunction, disjunction, and negation) over tasks. The introduction of WVFs is inspired by the fact that humans seem to always seek general knowledge about how to achieve a variety of goals in their environment, irrespective of the specific task they are learning. Our main contributions include: (i) Instructable agents: We formalize the logical composition of arbitrary tasks in potentially stochastic environments, and ensure that task compositions lead to rewards minimising undesired behaviors. (ii) Flexible agents: We introduce WVFs as a new objective for RL agents, enabling them to solve a variety of tasks in their environment. Additionally, we demonstrate zero-shot skill composition and lifelong sample efficiency. (iii) Reliable agents: We develop methods for agents to understand and execute both natural and formal language instructions, ensuring correctness and safety in task execution, particularly in real-world scenarios. By addressing these challenges, our framework represents a significant step towards achieving the FIRe desiderata in AI agents, thereby enhancing their utility and safety in a lifelong learning setting like the real world.
3D Human pose estimation using geometric self-supervision with temporal methods
(University of the Witwatersrand, Johannesburg, 2024-09) Bau, Nandi; Klein, Richard
This dissertation explores the enhancement of 3D human pose estimation (HPE) through self-supervised learning methods that reduce reliance on heavily annotated datasets. Recognising the limitations of data acquired in controlled lab settings, the research investigates the potential of geometric self-supervision combined with temporal information to improve model performance in real-world scenarios. A Temporal Dilated Convolutional Network (TDCN) model, employing Kalman filter post-processing, is proposed and evaluated on both ground-truth and in-the-wild data from the Human3.6M dataset. The results demonstrate a competitive Mean Per Joint Position Error (MPJPE) of 62.09mm on unseen data, indicating a promising direction for self-supervised learning in 3D HPE and suggesting a viable pathway towards reducing the gap with fully supervised methods. This study underscores the value of self-supervised temporal dynamics in advancing pose estimation techniques, potentially making them more accessible and broadly applicable in real-world applications.
Symmetry reductions and approximate solutions for heat transfer in slabs and extended surfaces
(University of the Witwatersrand, Johannesburg, 2023-06) Nkwanazana, Daniel Mpho; Moitsheki, Raseelo Joel
In this study we analyse heat transfer models prescribed by reaction-diffusion equations. The focus and interest throughout the work is on models for heat transfer in solid slabs (hot bodies) and extended surface. Different phenomena of interest are heat transfer in slabs and through fins of different shapes and profiles. Furthermore, thermal conductivity and heat transfer coefficients are temperature dependent. As a result, the energy balance equations that are produced are nonlinear. Using the theory of Lie symmetry analysis of differential equations, we endeavor to construct exact solutions for these nonlinear models. We will employ a number of symmetry techniques such as the classical Lie point symmetry methods, the nonclassical symmetry, nonlocal and nonclassical potential symmetry approach to construct the group-invariant solutions. In order to identify the forms of the heat source term that appear in the considered equation for which the principal Lie algebra (PLA) is extended by one element, we first perform preliminary group classification of the transient state problem. Also, we consider the direct group classification method. Invariant solutions are constructed after some reductions have been performed. One-dimensional Differential Transform Method (1D DTM) will be used when it is impossible to determine an exact solution. The 1D DTM has been benchmarked using some exact solutions. To solve the transient/unsteady problem, we use the two-dimensional Differential Transform Method (2D DTM). Effects of parameters appearing in the equations on the temperature distribution will be studied.
Play-style Identification and Player Modelling for Generating Tailored Advice in Video Games
(University of the Witwatersrand, Johannesburg, 2023-09) Ingram, Branden Corwin; Rosman, Benjamin; Van Alten, Clint; Klein, Richard
Recent advances in fields such as machine learning have enabled the development of systems that are able to achieve super-human performance on a number of domains, specifically in complex games such as Go and StarCraft. Based on these successes, it is reasonable to ask if these learned behaviours could be utilised to improve the performance of humans on the same tasks. However, the types of models used in these systems are typically not easily interpretable, and can not be directly used to improve the performance of a human. Additionally, humans tend to develop stylistic traits based on preference which aid in solving problems or competing at high levels. This thesis looks to address these difficulties by developing an end-to-end pipeline that can provide beneficial advice tailored to a player’s style in a video game setting. Towards this end, we demonstrate the ability to firstly cluster variable length multi-dimensional gameplay trajectories with respect to play-style in an unsupervised fashion. Secondly, we demonstrate the ability to learn to model an individual player’s actions during gameplay. Thirdly we demonstrate the ability to learn policies representative of all the play-styles identified with an environment. Finally, we demonstrate how the utilisation of these components can generate advice which is tailored to the individual’s style. This system would be particularly useful for improving tutorial systems that quickly become redundant lacking any personalisation. Additionally, this pipeline serves as a way for developers to garner insights on their player base which can be utilised for more informed decision-making on future feature releases and updates. For players, they gain a useful tool which can be utilised to learn how to play better as well identify as the characteristics of their gameplay as well as opponents. Furthermore, we contend that our approach has the potential to be employed in a broad range of learning domains.
Two-dimensional turbulent classical and momentumless thermal wakes
(University of the Witwatersrand, Johannesburg, 2023-07) Mubai, Erick; Mason, David Paul
The two-dimensional classical turbulent thermal wake and the two-dimensional momentumless turbulent thermal wake are studied. The governing partial differential equations result from Reynolds averaging the Navier-Stokes, the continuity and energy balance equations. The averaged Navier-Stokes and energy balance equations are closed using the Boussinesq hypothesis and an analogy of Fourier’s law of heat conduction. They are further simplified using the boundary layer approximation. This leads to one momentum equation with the continuity equation for an incompressible fluid and one thermal energy equation. The partial differential equations are written in terms of a stream function for the mean velocity deficit that identically satisfies the continuity equation and the mean temperature difference which vanishes on the boundary of the wake. The mixing length model and a model that assumes that the eddy viscosity and eddy thermal conductivity depend on spatial variables only are analysed. We extend the von Kármán similarity hypothesis to thermal wakes and derive a new thermal mixing length. It is shown that the kinematic viscosity and thermal conductivity play an important role in the mathematical analysis of turbulent thermal wakes. We obtain and use conservation laws and associated Lie point symmetries to reduce the governing partial differential equations to ordinary differential equations. As a result we find new analytical solutions for the two-dimensional turbulent thermal classical wake and momentumless wake. When the ordinary differential equations cannot be solved analytically we use a numerical shooting method that uses the two conserved quantities as the targets.
Rationalization of Deep Neural Networks in Credit Scoring
(University of the Witwatersrand, Johannesburg, 2023-07) Dastile, Xolani Collen; Celik, Turgay
Machine learning and deep learning, which are subfields of artificial intelligence, are undoubtedly pervasive and ubiquitous technologies of the 21st century. This is attributed to the enhanced processing power of computers, the exponential growth of datasets, and the ability to store the increasing datasets. Many companies are now starting to view their data as an asset, whereas previously, they viewed it as a by-product of business processes. In particular, banks have started to harness the power of deep learning techniques in their day-to-day operations; for example, chatbots that handle questions and answers about different products can be found on banks’ websites. One area that is key in the banking sector is the credit risk department. Credit risk is the risk of lending money to applicants and is measured using credit scoring techniques that profile applicants according to their risk. Deep learning techniques have the potential to identify and separate applicants based on their lending risk profiles. Nevertheless, a limitation arises when employing deep learning techniques in credit risk, stemming from the fact that these techniques lack the ability to provide explanations for their decisions or predictions. Hence, deep learning techniques are coined as non-transparent models. This thesis focuses on tackling the lack of transparency inherent in deep learning and machine learning techniques to render them suitable for adoption within the banking sector. Different statistical, classic machine learning, and deep learning models’ performances were compared qualitatively and quantitatively. The results showed that deep learning techniques outperform traditional machine learning models and statistical models. The predictions from deep learning techniques were explained using state-of-the-art explanation techniques. A novel model-agnostic explanation technique was also devised, and credit-scoring experts assessed its validity. This thesis has shown that different explanation techniques can be relied upon to explain predictions from deep learning and machine learning techniques.
Deep learning models for defect detection in electroluminescence images of solar PV modules
(University of the Witwatersrand, 2024-05-29) Pratt, Lawrence; Klein, Richard
This thesis introduces multi-class solar cell defect detection (SCDD) in electroluminescence (EL) images of PV modules using semantic segmentation. The research is based on experimental results from training and testing existing deep-learning models on a novel dataset developed specifically for this thesis. The dataset consists of EL images and corresponding segmentation masks for defect detection and quantification in EL images of solar PV cells from mono crystalline and multi crystalline silicon wafer-based modules. While many papers have already been published on defect detection and classification in EL images, semantic segmentation is new to this field. The prior art was largely focused on methods to improve EL image quality, classify cells into normal or defective categories, statistical methods and machine learning models for classification, object detection, and some binary segmentation of cracks specifically. This research shows that multi-class semantic segmentation models have the potential to provide accurate defect detection and quantification in both high-quality lab-based EL images and lower-quality field-based EL images of PV modules. While most EL images are collected in factory and lab settings, advancements in imaging technology will lead to an increasing number of EL images taken in the field. Thus, effective methods for SCDD must be robust to various images taken in the labs and the real world, in the same way that deep-learning models for autonomous vehicles that navigate the city streets in some parts of the world today must be robust to real-world environments. The semantic segmentation of EL images, as opposed to image classification, yields statistical data that can then be correlated to the power output for large batches of PV modules. This research evaluates the effectiveness of semantic segmentation to provide a quantitative analysis of PV module quality based on qualitative EL images. The raw EL image is translated into tabular datasets for further downstream analysis. First, we developed a dataset that included 29 classes in the ground truth masks in which each pixel was coloured according to the class. The classes were grouped into intrinsic “features” of solar cells and extrinsic “defects.” Next, a fully-supervised U-Net trained on the small dataset showed that SCDD using semantic segmentation was a viable approach. Next, additional fully-supervised deep-learning models(U-Net, PSPNet, DeepLabV3, DeepLabV3+) were trained using equal, inverse, and custom class weights to identify the best model for SCDD. A benchmark dataset was published along with benchmark performance metrics. The model performance was measured using mean recall, mean precision, and the mean intersection over union (mIoU) for a subset of the most common defects (cracks, inactive areas, and gridline defects) and features (ribbon interconnects and cell spacing) in the dataset. This work focused on developing a deep-learning method for SCDD independent of the imaging equipment, PV module design, and image quality that would be broadly applicable to EL images from any source. The initial experiment showed that semantic segmentation was a viable method for SCDD. The U-Net trained on the initial dataset with 108 images in the training dataset produced good representations of the features common to most of the cells and good representations of the defects with a reasonable sample size. Other defects with only a few examples in the training dataset were not effectively detected in this model. The U-Net results also showed that themIoU measured higher for the features compared to the defects across all models, which correlated with the size of the large features compared to the small defects that each class occupies in the images. The next set of experiments showed that the DeepLabv3+ trained with custom class weights scored the highest in terms of mIoU for the selected defects and features when compared to the alternative fully-supervised models. While the mIoU for cracks was still low (25%), the recall was high (86%). While increasing the recall substantially, the many long, narrow defects (e.g. cracks and gridlines) and features (e.g. ribbon interconnects and spacing) in the dataset were challenging to segment, especially at the borders. The custom class weights also tended to dilate the long, narrow features, which led to low precision. However, the resulting representations reliably located these defects in the complex images with both large and small objects, and the dilation proved effective at visually highlighting the long-narrow defects when the cell-level images were combined into module-level images. Therefore, the model prove useful in the context of detecting critical defects and quantifying the relative size of the defects in EL images of PV cells and modules despite the relatively low mIoU. The dataset was also published along with this paper. The final set of experiments focused on semi-supervised and self-supervised models. The results suggested that supervised training on a large out of-domain (OOD) dataset (COCO), self supervised pretraining on a large OOD dataset (ImageNet), and semi-supervised pretraining (CCT) were statistically equivalent as measured by the mIoU on a subset of critical defects and features. A new state-of-the-art (SOTA) for SCDD was achieved, exceeding the mIoU from the DeeplabV3+ with custom weights. The experiments also demonstrated that certain pretraining schemes resulted in the ability to detect and quantify underrepresented classes, such as the round ring defect. The unique contributions from this work include two benchmark datasets for multi-class semantic segmentation in EL images of solar PV cells. The smaller dataset consists of 765 images with corresponding ground truth masks. The larger dataset consists of more than 20,000 unlabelled EL images. The thesis also documents the performance metrics from various deep learning models based on fully-supervised, semi-supervised, and self-supervised architectures
Regularized Deep Neural Network for Post-Authorship Attribution
(University of the Witwatersrand, Johannesburg, 2024) Modupe, Abiodun; Celik, Turgay; Marivate, Vukosi
Post-authorship attribution is the computational process of determining the legitimate author of an online text snippet, such as an email, blog, forum post, or chat log, by employing stylometric features. The process consists of analysing various linguistic and writing patterns, such as vocabulary, sentence structure, punctuation usage, and even the use of specific words or phrases. By comparing these features to a known set of writing pieces from potential authors, investigators can make educated hypotheses about the true authorship of a text snippet. Additionally, post-authorship attribution has applications in fields like forensic linguistics and cybersecurity, where determining the source of a text can be crucial for investigations or identifying potential threats. Furthermore, in a verification procedure to proactively uncover misogynistic, misandrist, xenophobic, and abusive posts on the internet or social networks, finding a suitable text representation to adequately symbolise and capture an author’s distinctive writing from a computational linguistics perspective is typically known as a stylometric analysis. Additionally, most of the posts on social media or online are generally rife with ambiguous terminologies that could potentially compromise and influence the precision of the early proposed authorship attribution model. The majority of extracted stylistic elements in words are idioms, onomatopoeias, homophones, phonemes, synonyms, acronyms, anaphora, and polysemy, which are fundamentally difficult to interpret by most existing natural language processing (NLP) systems. These difficulties make it difficult to correctly identify the true author of a given text. Therefore, further advancements in NLP systems are necessary to effectively handle these complex linguistic elements and improve the accuracy of authorship attribution models. In this thesis, we introduce a regularised deep neural network (RDNN) model to solve the challenges that come with figuring out who wrote what after the fact. The proposed method utilises a convolutional neural network, a bidirectional long short-term memory encoder, and a distributed highway network to effectively address the post-authorship attribution problem. The neural network was utilised to generate lexical stylometric features, which were then fed into the bidirectional encoder to produce a syntactic feature vector representation. The feature vector was then fed into the distributed high-speed networks for regularisation to reduce network generalisation errors. The regularised feature vector was then given to the bidirectional decoder to learn the author’s writing style. The feature classification layer is made up of a fully connected network and a SoftMax function for prediction. The RDNN method outperformed the existing state-of-the-art methods in terms of accuracy, precision, and recall on the majority of the benchmark datasets. These results highlight the potential of the proposed method to significantly improve classification performance in various domains. Again, the introduction of an interactive system to visualise the performance of the proposed method would further enhance its usability and effectiveness in quantifying the contribution of the author’s writing characteristics in both online text snippets and literary documents. It is useful in processing the evidence that is needed to support claims or draw conclusions about the author’s writing style or intent during the pre-trial investigation by the law enforcement agent in the court of law. The incorporation of this method into the pretrial stage greatly strengthens the credibility and validity of the findings presented in court and has the potential to revolutionise the field of authorship attribution and enhance the accuracy of forensic investigations. Furthermore, it ensures a fair and just legal process for all parties involved by providing concrete evidence to support or challenge claims. We are also aware of the limitations of the proposed methods and recognise the need for additional research to overcome these constraints and improve the overall reliability and applicability of post-authorship attribution of online text snippets and literary documents for forensic investigations. Even though the proposed methods have revealed some unusual differences in author writing style, such as how influential writers, regular people, and suspected authors use language, the evidence from the results with the features extracted from the texts has shown promise for identifying authorship patterns and aiding in forensic analyses. However, much work remains to be done to validate the methodologies’ usefulness and dependability as effective authorship attribution procedures. Further research is needed to determine the extent to which external factors, such as the context in which the text was written or the author’s emotional state, may impact the identified authorship patterns. Additionally, it is crucial to establish a comprehensive dataset that includes a diverse range of authors and writing styles to ensure the generalizability of the findings and enhance the reliability of forensic analyses. Furthermore, the dataset used in this thesis does not include a diverse variety of authors and writing styles, such as impostors attempting to impersonate another author, which limits the generalizability of the conclusions and undermines the credibility of forensic analysis. More studies can be conducted to broaden the proposed strategy for detecting and distinguishing impostors’ writing styles from those of authentic authors when committing crimes on both online and literary documents. It is conceivable for numerous criminals to collaborate to perpetrate a crime, which could aid in improving the proposed methods for detecting the existence of multiple impostors or the contribution of each criminal writing style based on the person or individual they are attempting to mimic. The likelihood of numerous offenders working together complicates the investigation and necessitates advanced procedures for identifying their individual contributions, as well as both authentic and manufactured impostor contents within the text. This is especially difficult on social media, where fake accounts and anonymous profiles can make it difficult to determine the true identity of those involved, which can come from a variety of sources, including text, WhatsApps, chat images, videos, and so on, and can lead to the spread of misinformation and manipulation. As a result, promoting a hybrid approach that goes beyond text as evidence could help address some of the concerns raised above. For example, integrating audio and visual data may provide a more complete perspective of the scenario. As a result, such an approach exacerbates the restrictions indicated in the distribution of data and may necessitate more storage and analytical resources. However, it can also lead to a more accurate and nuanced analysis of the situation

Electronic Theses and Dissertations (PhDs)

Browse

Filters

Settings

Sort By

Results per page

Search Results