Logistic regression methods versus machine learning techniques in status and severity prediction of South African Covid-19 laboratory test data

Thumbnail Image

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The Covid-19 pandemic severely impacted on the lives of individuals around the world. Even now as the number of vaccinations has increased and there are fewer cases of Covid-19, knowledge of ones’ Covid-19 status remains important. It remains important as it impacts on the lives of family, friends, co-workers and the general public. Therefore, having tools such as the logistic regression and machine learning modelling techniques, in conjunction with the Reverse Transcriptase Polymerase Chain Reaction (RT-PCR), antigen and rapid Covid-19 tests only enables people to be more informed about their Covid-19 infection status. The aim of this study is to predict the Covid-19 status and severity of an individual using machine learning techniques and logistic regression methods on South African laboratory test data and determine the performance of each method. The data used in this study was supplied by the National Health Laboratory Service (NHLS) and under went cleaning and preparation phases after which the data was split into four different datasets. The datasets underwent confounding variable analysis, Principal Component Analysis (PCA) and Factor Analysis (FA) before two methods of variable selection were used to arrive at the final four datasets. Each dataset was then used to create five models (Random Forest (RF), Self-normalising Neural Network (SNN), Multinomial Logistic Regression (MLR), Ordinal Logistic Regression (OLR), and Baseline-category Logistic Regression (BLR)), these models were then used to predict the response variable given a test set of data. The performance of each model was then reviewed and discussed. The results show that the machine learning techniques outperformed the logistic regression methods. The best set of results produced for Dataset 1 was an Area Under the Curve (AUC) of 75.43% by the BLR model, an accuracy of 79.93% by the RF model, a Kappa score of 0.3385 by the SNN and a mean balanced accuracy of 60.85% achieved by the SNN. Dataset 2 saw the SNN produce the best AUC, Kappa score and mean balanced accuracy with values of 62.48%, 0.1960 and 54.66% respectively. The best accuracy score was achieved by the RF model (78.1%). Dataset 3 and Dataset 4 saw the same outcomes arise. The RF model produced the best AUC and accuracy, 71.58% and 74.5% for Dataset 3 and 63.04% and 75.51% for Dataset 4. However the SNN produced the best kappa scores and mean balanced accuracy values for both datasets, 0.3719 and 62.31% for Dataset 3 and 0.2576 and 57.56% for Dataset 4 The results of the study show that the machine learning techniques outperform the logistic regression methods in status and severity prediction of South African Covid-19 laboratory test data and that the best performing machine learning technique was the self-normalising neural network. Overall the models and networks performed the best when using Dataset 3. The results provide evidence that the machine learning techniques can be used as an indicative tool for Covid-19 status and severity prediction rather than a confirmation too

Description

A research report submitted in fulfilment of the requirements for the degree of Master of Science to the Faculty of Science, School of Statistics and Actuarial Science, University of the Witwatersrand, Johannesburg, 2023

Keywords

Covid-19, Laboratory test data, South Africa

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By