ETD Collection

Permanent URI for this collectionhttps://wiredspace.wits.ac.za/handle/10539/104


Please note: Digitised content is made available at the best possible quality range, taking into consideration file size and the condition of the original item. These restrictions may sometimes affect the quality of the final published item. For queries regarding content of ETD collection please contact IR specialists by email : IR specialists or Tel : 011 717 4652 / 1954

Follow the link below for important information about Electronic Theses and Dissertations (ETD)

Library Guide about ETD

Browse

Search Results

Now showing 1 - 1 of 1
  • Thumbnail Image
    Item
    Partially automated grading of short free-text responses in computer science through sentence embedding and clustering
    (2024) Philip, Sheena
    A significant portion of educators‘ time is spent marking assessments, which could be better utilized for teaching and research to enhance the overall education experience. To assess higher-order thinking, questions that require short text answers are necessary. However, automatically grading these questions is much more complex since computers need to understand the underlying semantic meaning of the text. Furthermore, the dataset available for grading is limited to a few hundred responses due to the smaller size of lecture classes, which is not sufficient for evaluating most NLP and machine learning methods. To address this, this research aims to partially automate the grading of short free-text responses in computer science by grouping similar responses and manually marking specific submissions that best represent the group. It will explore various sentence embedding techniques, clustering techniques, and sampling techniques, and evaluate the Enhancement of Clustering by Iterative Classification (ECIC) algorithm, which improves cluster quality. The study found that Agglomerative clustering combined with Universal Sentence Encoder (USE) and a sampling strategy that marks submissions based on their distance to the center of the cluster produced the best results, balancing time saved and meeting the performance criteria. This combination resulted in a 65% reduction in the time it takes to grade a question. However, the ECIC algorithm was not effective on datasets that comprises a few hundred data points.