3. Electronic Theses and Dissertations (ETDs) - All submissions

Permanent URI for this communityhttps://wiredspace.wits.ac.za/handle/10539/45

Browse

Search Results

Now showing 1 - 1 of 1
  • Item
    Knowledge extraction in population health datasets: an exploratory data mining approach
    (2018) Khangamwa, Gift
    Thereisagrowingtrendintheutilizationofmachinelearninganddataminingtechniques for knowledge extraction in health datasets. In this study, we used machine learning methods for data exploration and model building and we built classifier models for anemia. Anemia is recognized as a crucial public health challenge that leads to poor health for mothers and infants and one of its main causes is malaria. WeusedadatasetfromMalawiwheretheprevalenceofthesetwohealthchallenges of malaria and anemia remains high. We employed machine learning algorithms for the task of knowledge extraction on these demographic and health data sets for Malawi for the survey years 2004 and 2010. We followed the cross-industry standard processfordataminingmethodologytoguideourstudy. Thedatasetwasobtained, cleaned and prepared for experimentation. Unsupervised machine learning methods were used to understand the nature of the data set and the natural groupings in it. On the other hand,supervised machine learning methods were used to build predictive models for anemia. Specifically, we used principal component analysis and clustering algorithms in our unsupervised machine learning experiments. Support vector machines and decision trees were used in the supervised machine learning experiments. Unsupervised ML methods revealed that there was no significant separation of clustering according to both malaria and anemia attributes. However, attributes such as age, economic status, health practices attributes and number of children a woman has, were clustered insignificantly different ways,i.e.,young and old women went to different clusters. Moreover, PCA results confirmed these findings. Supervised methods, on the other hand, revealed that anemia classifiers could be developed using SVM and DTs for the dataset. The best performing models attained accuracy of 86%, ROC area score of 86%, mean absolute error of 0.27, and kappaof 0.78,which was built using an SVM model having C = 100, γ = 10−18. On the other hand, DTs produced the best model having accuracy 73%, ROC area score 74%, mean absolute error 0.36 and Kappa statistic of 0.449. In conclusion, we successfullybuiltagoodanemiaclassifierusingSVMandalsoshowedtherelationship between important attributes in the classification of anemia.
Copyright Ownership Is Guided By The University's

Intellectual Property policy

Students submitting a Thesis or Dissertation must be aware of current copyright issues. Both for the protection of your original work as well as the protection of another's copyrighted work, you should follow all current copyright law.