Spatio-temporal reasoning for estimating student’s learning affect: an approach to strengthen e-learning
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Spatio-temporal reasoning has acquired traction in recent decades owing to its use in diverse application domains. It is an aspect of reasoning that involves drawing logical conclusions about changes in objects or entities over space and time. Reasoning technology has potential in various aspects of computer science, and the education sector is no exception. In a conventional classroom, a teacher reasons and canvases a student’s comprehension by analyzing exhibited cues (via body language, speech, eye gaze, and facial emotions) and text-based feedback questions & answers (Q&A) to estimate the learning affect experienced. However, the limited existing work does not explore the usage of a combination of the above modalities in estimating comprehension. To date, researchers have focused on using facial cues exhibited by students to estimate learning affect, as existing research emphasizes that facial cues hold and convey relevant and meaningful information concerning a student’s experienced learning affect.
We believe that supplementing the above framework by incorporating reasoning based communication through question and answers has the potential to enhance the existing e-learning platforms and assist both teachers and students. We propose a framework that includes affect analysis through facial emotion and affective state estimation, mapping of above estimates to learning affect as experienced by students, report generation for both students and teachers outlining the analysis by the model on students’ comprehension of the lecture session and further feedback through natural text-based question and answering module. The proposed framework includes a Convolutional Neural Network-Bidirectional Long Short-Term Memory (CNN-BiLSTM) cascade to extract facial features and classify basic facial emotions and affective states. CNN-BiLSTM cascade handles and analyses video streams that capture a student’s facial movements (temporal in nature) during a lecture session. The emotion estimator trained and tested on samples from Extended Denver Intensity of Spontaneous Facial Action (DISFA+) dataset annotated with seven basic emotions (i.e., anger, disgust, fear, happy, neutral, surprised and sadness) reported an accuracy of 92% and an average F1 score of 88% on a sample size of 2274. The mapping module then mapped the estimated emotions onto learning affects such as positive, negative, and neutral, based on mappings found in the literature. In similar lines, the affective state estimator trained and tested on the Dataset for the Affective States in E-Environment (DAiSEE) annotated with four affective states (i.e., boredom, confusion, engagement and frustration) reported an accuracy of 86% and an average F1 score of 87% on a sample size of 4,305. The aim of the affective state estimator was to classify emotion samples from DISFA+ and to use the existing affective state to learning affect mappings in literature to verify and confirm the proposed emotion to learning affect mappings. The next stage of the proposed framework involves narrative report generation and feedback via questions and answers (Q&A). The narrative report generation module utilized lecture information (i.e., length of the lecture session, video annotations) and the estimates (estimated facial emotions, affective states, and learning affect) to generate comprehensive reports. In this study, report generation modules were embedded for both students and teachers (lecture report). An interactive and live environment was further enabled using a Bidirectional Encoder Representation Transformer (BERT)-based Q&A language model. BERT-based Q&A model takes in the generated narrative report and a text input (pre-defined question from an existing database) either from a student or a teacher and produces a text output (answer) by incorporating basic reasoning. The BERT-based Q&A model helps provide relevant feedback based on the context of the question. Live testing of the framework was carried out using student participants, and survey feedback was used to test the narrative report generation and Q&A module in a laboratory setting. 72.55% of the participants reported that their own experience of the lecture (i.e. comprehension and learning affect experienced) is in line with the model’s feedback.
This study found that when students are actively involved in learning, they can exhibit a variety of emotions and affective states that are connected to their learning, and when analysed, can serve as feedback to both teachers and students. This study was based on recent studies in reasoning principles, experiments using machine learning and natural language processing mechanisms, live testing, and questionnaires. Additionally, research demonstrated the link between emotions, affective states, and learning affect as well as how these factors can shed light on a student’s learning affect. The examination of affect in e-learning through facial cues estimates, mapping of estimates onto learning affects, generating narrative reports and reasoning using narrative feedback and Q&A for teachers and students is anticipated to facilitate the evaluation of students’ learning. We envisage that the proposed framework and future work in this direction will enhance the e-learning platforms and human-machine interaction.
Description
A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy to the Faculty of Science, School of Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg, 2023
Keywords
Spatio-temporal, Neutral learning, E-learning platforms