Categorical data imputation using non-parametric or semi-parametric imputation methods

dc.contributor.authorKhosa, Floyd Vukosi
dc.date.accessioned2016-05-11T06:26:06Z
dc.date.available2016-05-11T06:26:06Z
dc.date.issued2016-05-11
dc.descriptionA research report submitted to the Faculty of Science, University of the Witwatersrand, for the degree of Master of Science by Coursework and Research Report.en_ZA
dc.description.abstractResearchers and data analysts often encounter a problem when analysing data with missing values. Methods for imputing continuous data are well developed in the literature. However, methods for imputing categorical data are not well established. This research report focuses on categorical data imputation using non-parametric and semi-parametric methods. The aims of the study are to compare different imputation methods for categorical data and to assess the quality of the imputation. Three imputation methods are compared namely; multiple imputation, hot deck imputation and random forest imputation. Missing data are created on a complete data set using the missing completely at random mechanism. The imputed data sets are compared with the original complete data set, and the imputed values which are the same as the values in the original data set are counted. The analysis revealed that the hot deck imputation method is more precise, compared to random forest and multiple imputation methods. Logistic regression is fitted on the imputed data sets and the original data set and the resulting models are compared. The analysis shows that the multiple imputation method affects the model fit of the logistic regression negatively.en_ZA
dc.identifier.urihttp://hdl.handle.net/10539/20380
dc.language.isoenen_ZA
dc.subject.lcshData integrity.
dc.subject.lcshQuality control.
dc.subject.lcshNonparametric statistics.
dc.subject.lcshMathematical statistics.
dc.titleCategorical data imputation using non-parametric or semi-parametric imputation methodsen_ZA
dc.typeThesisen_ZA

Files

Original bundle

Now showing 1 - 4 of 4
No Thumbnail Available
Name:
Research_Report_ Final_Final.pdf
Size:
1.06 MB
Format:
Adobe Portable Document Format
Description:
No Thumbnail Available
Name:
Declaration form MCWRR.pdf
Size:
79.96 KB
Format:
Adobe Portable Document Format
Description:
No Thumbnail Available
Name:
Final submission - ETD form 15 Feb.pdf
Size:
1.59 MB
Format:
Adobe Portable Document Format
Description:
No Thumbnail Available
Name:
List of corrections.pdf
Size:
1.07 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections