An empirical analysis and application of the expectation-maximization and matrix completion algorithms for varying degrees of missing data
dc.contributor.author | Thulare, Evans Molahlegi | |
dc.date.accessioned | 2023-08-21T13:43:56Z | |
dc.date.available | 2023-08-21T13:43:56Z | |
dc.date.issued | 2020 | |
dc.description | A dissertation submitted in partial fulfilment of the requirements for the degree of Master of Science, School of Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg, 2020 | |
dc.description.abstract | Incomplete data sets have been a problem in most studies, however, few studies have come to realise that imputation is a solution to this problem. Incomplete data can have a significant effect on the conclusion drawn and decision made. To solve the problem of incomplete data, one should use techniques to recover those missing values, depending on how much the data is missing, how big is the data, how the data has gone missing, etc. In this report, we aimed to compare the performance of the EM algorithm and matrix completion when imputing the missing values for varying degrees of missing data. Kullback-Leibler (KL) divergence was used as an evaluation metric to observe the performance of Expectation-Maximization (EM) algorithm and matrix completion when estimating missing values relative to the ground-truth distribution. The findings of this research shows that the EM algorithm outperformed matrix completion in both the theoretical (the simulated scenarios of learning from varying degrees of missing data) and the application (the application of theoretical model on realworld data on credit card fraud) models. Few similarities of the algorithms were observed when recovering missing values such as the increasing trend of error as missing values increases and the impact of increasing number of variables in a data set. Matrix completion only performed better when missing values were beyond approximately 77%. Therefore, from our findings, we conclude that when less than 50% of the data is missing, EM algorithm produces accurate predictions. The EM algorithm performed better compared to the matrix completion since it first learned the data itself and used maximum likelihood procedures to estimate the parameters of the model while the matrix completion analysed the existing pattern from rows and columns and imputes them using the pattern learned in the data. | |
dc.description.librarian | NG (2023) | |
dc.faculty | Faculty of Science | |
dc.format.extent | Online resource (42 leaves) | |
dc.identifier.citation | Thulare, Evans Molahlegi (2020) An empirical analysis and application of the expectation-maximization and matrix completion algorithms for varying degrees of missing data, University of the Witwatersrand, Johannesburg, <http://hdl.handle.net/10539/35819> | |
dc.identifier.uri | https://hdl.handle.net/10539/35819 | |
dc.language.iso | en | |
dc.school | School of Computer Science and Applied Mathematics | |
dc.subject.lcsh | Expectation-maximization algorithms | |
dc.subject.lcsh | Estimation theory | |
dc.title | An empirical analysis and application of the expectation-maximization and matrix completion algorithms for varying degrees of missing data | |
dc.type | Dissertation |