Netshitungulu, Ndikhwine2025-10-012024-10Netshitungulu, Ndikhwine. (2024). Information and Knowledge Discovery from Undergraduate STEM datasets: Educational Data Mining at a South African University. [Master's dissertation, University of the Witwatersrand, Johannesburg]. WIReDSpace. https://hdl.handle.net/10539/46723https://hdl.handle.net/10539/46723A dissertation submitted in fulfilment of the requirements for the Degree of Master of Science, to the Faculty of Science, School of Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg, 2024.South Africa is experiencing an acute shortage of highly trained and skilled professionals with much-needed qualifications in the disciplines of Science, Technology, Engineering and Mathematics (STEM). There is a need to accelerate the training and development of professionals in these critical, scarce-skill areas. Our universities and other institutions play an important role in supporting and facilitating the development of these much-needed professionals. The primary focus of this data mining based study was to examine a number of factors related to first time first year BSc. students enrolled in critical-skill STEM degree programmes at the University of the Witwatersrand (also known as Wits). These factors included ethnicity, gender, funding status, English language proficiency, prior computer programming experience, mathematical ability and student profiling. Wits and other Higher Education Institutions (HEIs) collect and accumulate large amounts of data about their students. Using different data mining techniques this data can be processed, analysed and leveraged to obtain meaningful and useful information about these students. This study was motivated by the need to obtain information from the longitudinal data (2015-2019) collected about first time first year students enrolled in STEM degree disciplines at Wits University. In order to conduct the study in a more structured manner, we followed the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology, which is increasingly being adopted as a project implementation standard by data mining practitioners working in different disciplines, including education. We applied different statistical and data mining techniques against our various datasets. These included Analysis of Variance (ANOVA), classification, feature selection, prediction, clustering and association rule mining techniques. Several important results emerged from our study. The result from the two-way ANOVA experiment showed the interaction between gender and ethnicity not to be significant. However, the ethnicity factor had a statistically significant effect on the APS. We also found the interaction between ethnicity and funding status to be significant. Individually, these factors were statistically significantly related to the APS. Regarding English language proficiency, the difference between English Home Language (ENA) students and English First Additional Language (ENB) students was significant. Amongst the NSC Grade 12 subjects, mathematics had the most significant relationship with the Admission Point Score (APS). In two first semester Computer Science I courses, students with prior computer programming experience significantly outperformed their peers, who lacked computer programming experience. We also found that collectively, the minimum admission requirement subjects for admission to the first year STEM disciplines were suitable predictors of the overall, first year outcome. However, the result we obtained using these subjects as predictors of degree attainment was not encouraging. This wide ranging study has shown that data mining techniques can be used effectively in educational settings to leverage the data universities collect on their students, about whom insights and information can be obtained, information that can be used for well-informed decision making. Beyond our specific educational context, there is also much to learn from this study and its findings by other, higher educational institutions with STEM students in a situation similar to ours.en©2024 University of the Witwatersrand, Johannesburg. All rights reserved. The copyright in this work vests in the University of the Witwatersrand, Johannesburg. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of University of the Witwatersrand, Johannesburg.Educational Data MiningData Mining TechniquesSTEMCRISP-DMImbalanced Class DistributionPerformance Evaluation MetricsConfusion MatrixReceiver Operating CharacteristicsStudent CharacteristicsEnglish Language ProficiencyFeature SelectionUCTDInformation and Knowledge Discovery from Undergraduate STEM datasets: Educational Data Mining at a South African UniversityDissertationUniversity of the Witwatersrand, JohannesburgSDG-9: Industry, innovation and infrastructureSDG-4: Quality education