Knowledge extraction in population health datasets: an exploratory data mining approach

dc.contributor.authorKhangamwa, Gift
dc.date.accessioned2019-05-14T05:51:05Z
dc.date.available2019-05-14T05:51:05Z
dc.date.issued2018
dc.descriptionA thesis submitted in fulfillment of the requirements for the degree of Master of Computer Science in the, School of Computer Science and Applied Mathematics, 2018en_ZA
dc.description.abstractThereisagrowingtrendintheutilizationofmachinelearninganddataminingtechniques for knowledge extraction in health datasets. In this study, we used machine learning methods for data exploration and model building and we built classifier models for anemia. Anemia is recognized as a crucial public health challenge that leads to poor health for mothers and infants and one of its main causes is malaria. WeusedadatasetfromMalawiwheretheprevalenceofthesetwohealthchallenges of malaria and anemia remains high. We employed machine learning algorithms for the task of knowledge extraction on these demographic and health data sets for Malawi for the survey years 2004 and 2010. We followed the cross-industry standard processfordataminingmethodologytoguideourstudy. Thedatasetwasobtained, cleaned and prepared for experimentation. Unsupervised machine learning methods were used to understand the nature of the data set and the natural groupings in it. On the other hand,supervised machine learning methods were used to build predictive models for anemia. Specifically, we used principal component analysis and clustering algorithms in our unsupervised machine learning experiments. Support vector machines and decision trees were used in the supervised machine learning experiments. Unsupervised ML methods revealed that there was no significant separation of clustering according to both malaria and anemia attributes. However, attributes such as age, economic status, health practices attributes and number of children a woman has, were clustered insignificantly different ways,i.e.,young and old women went to different clusters. Moreover, PCA results confirmed these findings. Supervised methods, on the other hand, revealed that anemia classifiers could be developed using SVM and DTs for the dataset. The best performing models attained accuracy of 86%, ROC area score of 86%, mean absolute error of 0.27, and kappaof 0.78,which was built using an SVM model having C = 100, γ = 10−18. On the other hand, DTs produced the best model having accuracy 73%, ROC area score 74%, mean absolute error 0.36 and Kappa statistic of 0.449. In conclusion, we successfullybuiltagoodanemiaclassifierusingSVMandalsoshowedtherelationship between important attributes in the classification of anemia.en_ZA
dc.description.librarianXL2019en_ZA
dc.format.extentOnline resource (xix, 109 leaves)
dc.identifier.citationKhangamwa, Gift (2018) Knowledge extraction in population health datasets: an exploratory data mining approach, University of the Witwatersrand, Johannesburg, https://hdl.handle.net/10539/26890
dc.identifier.urihttps://hdl.handle.net/10539/26890
dc.language.isoenen_ZA
dc.subject.lcshEstimation theory
dc.subject.lcshData mining
dc.subject.lcshDatabase management
dc.subject.lcshMissing observations (Statistics)
dc.titleKnowledge extraction in population health datasets: an exploratory data mining approachen_ZA
dc.typeThesisen_ZA

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
MSc_0718976m.pdf
Size:
991.51 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections