Educational data mining (EDM) in a South African University: a longitudinal study of factors that affect the academic performance of computer science I students
Date
2016-01-22
Authors
Mashiloane, Lebogang
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The past few years have seen an increase in the number of first year students registering in the School
of Computer Science at Wits University. These students come from different backgrounds both academically
and socially. As do many other institutions, Wits University collects and stores vast amounts of
data about the students they enrol and teach. However this data is not always used after being stored. The
area of Educational Data Mining (EDM) focuses on using this stored data to find trends and patterns that
could enhance the knowledge about the student’s behavior, their academic performance and the learning
environment.
This longitudinal study focuses on the application of EDM techniques to obtain a better understanding
of some of the factors that influence the academic performance of first year computer science students
at the University of the Witwatersrand. Knowledge obtained using these techniques could assist in increasing
the number of students who complete their studies successfully and identifying students who
are at risk of failing and ensuring that early intervention processes can be put into place. A modified
version of the CRISP-DM (CRoss-Industry Standard Process for Data Mining) was used, with three data
mining techniques, namely: Classification, Clustering and Association Rule Mining. Three algorithms
were compared in the first two techniques while only one algorithm was used in the Association Rule
Mining. For the classification technique, the three algorithms that were compared were the J48 Classifier,
Decision Table and Na¨ıve Bayes algorithm. The clustering algorithms used included the Simple
K-means, Expectation Maximization (EM) and the Farthest First algorithm. Finally, the Predictive Apriori
algorithm was selected as the Association Rule Mining technique.
Historical Computer Science I data, from 2006 to 2011, was used as the training data. This set of data
was used to find relationships within the data that could assist with predictive modeling. For each of the
selected techniques a model was created using the training data set. These models were incorporated in
a tool, the Success or Failure Determiner (SOFD), that was created specifically as part of this research.
Thereafter, the test data set was put through the SOFD tool in the testing phase. Test data sets usually
contain a variable whose value is predicted using the models built during the training phase. The 2012
Computer Science I data instances were used during the testing phase. The investigations brought forth
both expected and interesting results. A good relationship was found between academic performance in
Computer Science and three of the factors investigated: Mathematics I, mid-year mark and the module
perceived to be the most difficult in the course. The relationship between Mathematics and Computer
Science was expected, However, the other two factors (mid-year mark and most difficult module) are
new, and may need to be further investigated in other courses or in future studies. An interesting finding
from the Mathematics investigation was the better relationship between Computer Science and Algebra
rather than Calculus. Using these three factors to predict Computer Science performance could assist
in improving throughput and retention rates by identifying students at risk of failing, before they write
their final examinations. The Association Rule Mining technique assisted in identifying the selection of
courses that could yield the best academic performance overall, in first year. This finding is important,
since the information obtained could be used during the registration process to assist students in making
the correct decisions when selecting the courses they would like to do. The overall results show that using
data mining techniques and historical data collected atWits University about first year Computer Science
(CS-1) students can assist in obtaining meaningful information and knowledge, from which a better unii
derstanding of present and future generations of CS-1 students can be derived, and solutions found to
some of the academic problems and challenges facing them. Additionally this can assist in obtaining a
better understanding of the students and factors that influence their academic performance. This study
can be extended to include more courses withinWits University and other higher educational institutions.
Keywords. Educational Data Mining, CRISP-DM, Classification, Clustering, Association Rule Mining,
J48 Classifier, Decision Table, Na¨ıve Bayes, Simple K-means, Expectation Maximization, Farthest
First, Predictive Apriori
Description
Degree of Master of Science by research only:
A Dissertation submitted to the Faculty of Science, University of
the Witwatersrand, Johannesburg, in fulfilment of the
requirements for the degree of Master of Science.
Signed on September 10, 2015 in Johannesburg