Educational data mining (EDM) in a South African University: a longitudinal study of factors that affect the academic performance of computer science I students
dc.contributor.author | Mashiloane, Lebogang | |
dc.date.accessioned | 2016-01-22T09:21:08Z | |
dc.date.available | 2016-01-22T09:21:08Z | |
dc.date.issued | 2016-01-22 | |
dc.description | Degree of Master of Science by research only: A Dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Master of Science. Signed on September 10, 2015 in Johannesburg | en_ZA |
dc.description.abstract | The past few years have seen an increase in the number of first year students registering in the School of Computer Science at Wits University. These students come from different backgrounds both academically and socially. As do many other institutions, Wits University collects and stores vast amounts of data about the students they enrol and teach. However this data is not always used after being stored. The area of Educational Data Mining (EDM) focuses on using this stored data to find trends and patterns that could enhance the knowledge about the student’s behavior, their academic performance and the learning environment. This longitudinal study focuses on the application of EDM techniques to obtain a better understanding of some of the factors that influence the academic performance of first year computer science students at the University of the Witwatersrand. Knowledge obtained using these techniques could assist in increasing the number of students who complete their studies successfully and identifying students who are at risk of failing and ensuring that early intervention processes can be put into place. A modified version of the CRISP-DM (CRoss-Industry Standard Process for Data Mining) was used, with three data mining techniques, namely: Classification, Clustering and Association Rule Mining. Three algorithms were compared in the first two techniques while only one algorithm was used in the Association Rule Mining. For the classification technique, the three algorithms that were compared were the J48 Classifier, Decision Table and Na¨ıve Bayes algorithm. The clustering algorithms used included the Simple K-means, Expectation Maximization (EM) and the Farthest First algorithm. Finally, the Predictive Apriori algorithm was selected as the Association Rule Mining technique. Historical Computer Science I data, from 2006 to 2011, was used as the training data. This set of data was used to find relationships within the data that could assist with predictive modeling. For each of the selected techniques a model was created using the training data set. These models were incorporated in a tool, the Success or Failure Determiner (SOFD), that was created specifically as part of this research. Thereafter, the test data set was put through the SOFD tool in the testing phase. Test data sets usually contain a variable whose value is predicted using the models built during the training phase. The 2012 Computer Science I data instances were used during the testing phase. The investigations brought forth both expected and interesting results. A good relationship was found between academic performance in Computer Science and three of the factors investigated: Mathematics I, mid-year mark and the module perceived to be the most difficult in the course. The relationship between Mathematics and Computer Science was expected, However, the other two factors (mid-year mark and most difficult module) are new, and may need to be further investigated in other courses or in future studies. An interesting finding from the Mathematics investigation was the better relationship between Computer Science and Algebra rather than Calculus. Using these three factors to predict Computer Science performance could assist in improving throughput and retention rates by identifying students at risk of failing, before they write their final examinations. The Association Rule Mining technique assisted in identifying the selection of courses that could yield the best academic performance overall, in first year. This finding is important, since the information obtained could be used during the registration process to assist students in making the correct decisions when selecting the courses they would like to do. The overall results show that using data mining techniques and historical data collected atWits University about first year Computer Science (CS-1) students can assist in obtaining meaningful information and knowledge, from which a better unii derstanding of present and future generations of CS-1 students can be derived, and solutions found to some of the academic problems and challenges facing them. Additionally this can assist in obtaining a better understanding of the students and factors that influence their academic performance. This study can be extended to include more courses withinWits University and other higher educational institutions. Keywords. Educational Data Mining, CRISP-DM, Classification, Clustering, Association Rule Mining, J48 Classifier, Decision Table, Na¨ıve Bayes, Simple K-means, Expectation Maximization, Farthest First, Predictive Apriori | en_ZA |
dc.identifier.uri | http://hdl.handle.net/10539/19374 | |
dc.language.iso | en | en_ZA |
dc.subject.lcsh | Computer science students -- South Africa. | |
dc.subject.lcsh | Academic achievement -- South Africa. | |
dc.subject.lcsh | University of the Witwatersrand -- Students. | |
dc.subject.lcsh | Data mining -- South Africa. | |
dc.title | Educational data mining (EDM) in a South African University: a longitudinal study of factors that affect the academic performance of computer science I students | en_ZA |
dc.type | Thesis | en_ZA |
Files
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: