3. Electronic Theses and Dissertations (ETDs) - All submissions

Permanent URI for this communityhttps://wiredspace.wits.ac.za/handle/10539/45

Browse

Search Results

Now showing 1 - 1 of 1
  • Thumbnail Image
    Item
    An empirical analysis and application of the expectation-maximization and matrix completion algorithms for varying degrees of missing data
    (2020) Thulare, Evans Molahlegi
    Incomplete data sets have been a problem in most studies, however, few studies have come to realise that imputation is a solution to this problem. Incomplete data can have a significant effect on the conclusion drawn and decision made. To solve the problem of incomplete data, one should use techniques to recover those missing values, depending on how much the data is missing, how big is the data, how the data has gone missing, etc. In this report, we aimed to compare the performance of the EM algorithm and matrix completion when imputing the missing values for varying degrees of missing data. Kullback-Leibler (KL) divergence was used as an evaluation metric to observe the performance of Expectation-Maximization (EM) algorithm and matrix completion when estimating missing values relative to the ground-truth distribution. The findings of this research shows that the EM algorithm outperformed matrix completion in both the theoretical (the simulated scenarios of learning from varying degrees of missing data) and the application (the application of theoretical model on realworld data on credit card fraud) models. Few similarities of the algorithms were observed when recovering missing values such as the increasing trend of error as missing values increases and the impact of increasing number of variables in a data set. Matrix completion only performed better when missing values were beyond approximately 77%. Therefore, from our findings, we conclude that when less than 50% of the data is missing, EM algorithm produces accurate predictions. The EM algorithm performed better compared to the matrix completion since it first learned the data itself and used maximum likelihood procedures to estimate the parameters of the model while the matrix completion analysed the existing pattern from rows and columns and imputes them using the pattern learned in the data.
Copyright Ownership Is Guided By The University's

Intellectual Property policy

Students submitting a Thesis or Dissertation must be aware of current copyright issues. Both for the protection of your original work as well as the protection of another's copyrighted work, you should follow all current copyright law.