Statistical and deep learning methods in causal inference

Date
2021
Authors
Whata, Albert
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Machine learning (ML) algorithms are excellent at predicting outcomes rather than explaining causality. On the other hand, deep learning algorithms such as deep neural networks (DNN) are especially good at uncovering some hidden patterns in large data sets, but they struggle when it comes to making simple causal inferences. Causal inference is a statistical tool that can be used by machine learning and deep learning to measure the causal effects of multiple variables. This research was carried out to show researchers that it is very important to start incorporating causal inference into machine learning systems and not to just focus on predicting outcomes. A propensity scores-potential outcomes framework was used to evaluate machine learning and statistical causal inference. Using the propensity scores-potential outcomes framework ,it was successfully demonstrated that a deep learning algorithm such as DNN can be adapted and used for the classification tasks. In addition, the results in this thesis have shown that using DNN, one can success-fully estimate propensity scores, and also reduce absolute bias in the treatment effects that are estimated using these propensity scores. A hybrid model that consisted of a long-short term memory autoencoder (LSTMAE) and the kernel quantile estimator (KQE) algorithm was also successfully developed to detect change-points. Additionally, a multivariate regression discontinuity design (MRDD) was effectively employed to evaluate the statistical causal effect using two assignment variables. Also, the study demonstrated the importance of accompanying every conventional or multivariate regression discontinuity design with supplementary analyses to give more credibility to the causal estimates. A hybrid deep learning algorithm that uses a convolutional neural network (CNN) as well as a bidirectional long-short term memory (Bi-LSTM) neural network was developed for the classification of the severe acute respiratory syndrome coronavirus 2 (SARS CoV-2) among Coronaviruses. The model achieved impressive on metrics such as classification accuracy, area under curve receiver operating characteristic (AUC ROC), and Cohen’s Kappa. The results show that deep learning algorithms can be used as alternative avenues to detect SARS CoV-2 among Coronaviruses
Description
A thesis submitted to the Faculty of Science, University of the Witwatersrand, in fulfilment of the requirements for the degree of Doctor of Philosophy, 2021
Keywords
Citation
Collections