Logistic regression methods versus machine learning techniques in status and severity prediction of South African Covid-19 laboratory test data

dc.contributor.authorStrickett, Mark
dc.date.accessioned2024-01-26T09:56:45Z
dc.date.available2024-01-26T09:56:45Z
dc.date.issued2024
dc.descriptionA research report submitted in fulfilment of the requirements for the degree of Master of Science to the Faculty of Science, School of Statistics and Actuarial Science, University of the Witwatersrand, Johannesburg, 2023
dc.description.abstractThe Covid-19 pandemic severely impacted on the lives of individuals around the world. Even now as the number of vaccinations has increased and there are fewer cases of Covid-19, knowledge of ones’ Covid-19 status remains important. It remains important as it impacts on the lives of family, friends, co-workers and the general public. Therefore, having tools such as the logistic regression and machine learning modelling techniques, in conjunction with the Reverse Transcriptase Polymerase Chain Reaction (RT-PCR), antigen and rapid Covid-19 tests only enables people to be more informed about their Covid-19 infection status. The aim of this study is to predict the Covid-19 status and severity of an individual using machine learning techniques and logistic regression methods on South African laboratory test data and determine the performance of each method. The data used in this study was supplied by the National Health Laboratory Service (NHLS) and under went cleaning and preparation phases after which the data was split into four different datasets. The datasets underwent confounding variable analysis, Principal Component Analysis (PCA) and Factor Analysis (FA) before two methods of variable selection were used to arrive at the final four datasets. Each dataset was then used to create five models (Random Forest (RF), Self-normalising Neural Network (SNN), Multinomial Logistic Regression (MLR), Ordinal Logistic Regression (OLR), and Baseline-category Logistic Regression (BLR)), these models were then used to predict the response variable given a test set of data. The performance of each model was then reviewed and discussed. The results show that the machine learning techniques outperformed the logistic regression methods. The best set of results produced for Dataset 1 was an Area Under the Curve (AUC) of 75.43% by the BLR model, an accuracy of 79.93% by the RF model, a Kappa score of 0.3385 by the SNN and a mean balanced accuracy of 60.85% achieved by the SNN. Dataset 2 saw the SNN produce the best AUC, Kappa score and mean balanced accuracy with values of 62.48%, 0.1960 and 54.66% respectively. The best accuracy score was achieved by the RF model (78.1%). Dataset 3 and Dataset 4 saw the same outcomes arise. The RF model produced the best AUC and accuracy, 71.58% and 74.5% for Dataset 3 and 63.04% and 75.51% for Dataset 4. However the SNN produced the best kappa scores and mean balanced accuracy values for both datasets, 0.3719 and 62.31% for Dataset 3 and 0.2576 and 57.56% for Dataset 4 The results of the study show that the machine learning techniques outperform the logistic regression methods in status and severity prediction of South African Covid-19 laboratory test data and that the best performing machine learning technique was the self-normalising neural network. Overall the models and networks performed the best when using Dataset 3. The results provide evidence that the machine learning techniques can be used as an indicative tool for Covid-19 status and severity prediction rather than a confirmation too
dc.description.librarianTL (2024)
dc.facultyFaculty of Science
dc.identifier.urihttps://hdl.handle.net/10539/37447
dc.language.isoen
dc.schoolStatistics and Actuarial Science
dc.subjectCovid-19
dc.subjectLaboratory test data
dc.subjectSouth Africa
dc.titleLogistic regression methods versus machine learning techniques in status and severity prediction of South African Covid-19 laboratory test data
dc.typeDissertation

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
1849246 Mark Strickett - MSc Dissertation Final.pdf
Size:
6.44 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.43 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections