Statistical and deep learning methods in causal inference

dc.contributor.authorWhata, Albert
dc.date.accessioned2022-07-29T07:56:33Z
dc.date.available2022-07-29T07:56:33Z
dc.date.issued2021
dc.descriptionA thesis submitted to the Faculty of Science, University of the Witwatersrand, in fulfilment of the requirements for the degree of Doctor of Philosophy, 2021en_ZA
dc.description.abstractMachine learning (ML) algorithms are excellent at predicting outcomes rather than explaining causality. On the other hand, deep learning algorithms such as deep neural networks (DNN) are especially good at uncovering some hidden patterns in large data sets, but they struggle when it comes to making simple causal inferences. Causal inference is a statistical tool that can be used by machine learning and deep learning to measure the causal effects of multiple variables. This research was carried out to show researchers that it is very important to start incorporating causal inference into machine learning systems and not to just focus on predicting outcomes. A propensity scores-potential outcomes framework was used to evaluate machine learning and statistical causal inference. Using the propensity scores-potential outcomes framework ,it was successfully demonstrated that a deep learning algorithm such as DNN can be adapted and used for the classification tasks. In addition, the results in this thesis have shown that using DNN, one can success-fully estimate propensity scores, and also reduce absolute bias in the treatment effects that are estimated using these propensity scores. A hybrid model that consisted of a long-short term memory autoencoder (LSTMAE) and the kernel quantile estimator (KQE) algorithm was also successfully developed to detect change-points. Additionally, a multivariate regression discontinuity design (MRDD) was effectively employed to evaluate the statistical causal effect using two assignment variables. Also, the study demonstrated the importance of accompanying every conventional or multivariate regression discontinuity design with supplementary analyses to give more credibility to the causal estimates. A hybrid deep learning algorithm that uses a convolutional neural network (CNN) as well as a bidirectional long-short term memory (Bi-LSTM) neural network was developed for the classification of the severe acute respiratory syndrome coronavirus 2 (SARS CoV-2) among Coronaviruses. The model achieved impressive on metrics such as classification accuracy, area under curve receiver operating characteristic (AUC ROC), and Cohen’s Kappa. The results show that deep learning algorithms can be used as alternative avenues to detect SARS CoV-2 among Coronavirusesen_ZA
dc.description.librarianCK2022en_ZA
dc.facultyFaculty of Scienceen_ZA
dc.identifier.urihttps://hdl.handle.net/10539/33077
dc.language.isoenen_ZA
dc.phd.titlePhDen_ZA
dc.schoolSchool of Statistics and Actuarial Scienceen_ZA
dc.titleStatistical and deep learning methods in causal inferenceen_ZA
dc.typeThesisen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
PHD Thesis_ver__5_1_A_Whata.pdf
Size:
2.4 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections