The impact of missing data imputation on HIV classification

Hlalele, Nthabiseng Unathi
Journal Title
Journal ISSN
Volume Title
Missing data are a part of research and data analysis that often cannot be ignored. Although a number of methods have been developed in handling and imputing missing data, this problem is, for the most part, still unsolved with many researchers still struggling with its existence. Due to the availability of software and the advancement of computational power, maximum likelihood and multiple imputations as well as neural networks and genetic algorithms (AANN-GA) have been introduced as solutions to the missing data problem. Although these methods have given considerable results in this domain, the impact that missing data and missing data imputation has on decision making has, until recently, not been assessed. This dissertation contributes to this knowledge by first introducing a new computational intelligent model that integrates Neuro-Fuzzy (N-F) modeling, Principal Component Analysis and the genetic algorithms to impute missing data. The performance of this model is then compared to that of the AANN-GA as well as the independent use of the N-F architecture. In order to determine if the data are predictable and also to assist in processing the data for training, an analysis on the HIV sero-prevalence data is performed. Two classification decision making frameworks are then presented in order to assess the effect of missing data. These decision frameworks are trained to classify between two conditions when presented with a set of data variables. The first is the use of a Bayesian neural network which is statistical in nature and the second is based on the fuzzy ARTMAP (FAM) classifier which has incremental abilities. The two methods are used and compared in order to assess the degree in which missing data, and the imputation thereof, has on decision making. The effect of missing data differs for the two frameworks; while the Bayesian neural network fails in the presence of missing data, the FAM classifier attempts to classify with a decreased accuracy. This work has shown that although missing data and the imputation thereof has an effect on decision making, the degree of that effect is dependent on the decision making framework and on the model used for data imputation.