Using social context for person re-identification

Mizrahi, Liron
Journal Title
Journal ISSN
Volume Title
Facial Recognition is currently a popular field in computer vision. It has many applications such as biometrics, Facebook’s automatic photo tagging and automatic attendance monitoring. These systems all work in the same manner by creating a dataset of face images and training a deep learning model to recognise those faces. This is the standard way in which these systems are implemented and it is effective. However, there is usually more information in the domain besides just the facial features of subjects. For example, in Facebook’s automatic photo tagging system, knowing who a subject of interest usually appears alongside in photos in addition to knowing the subject’s visual appearance should provide more information to recognise them better. To demonstrate this idea, a labelled dataset of students in lectures was created. With their informed consent, students were filmed in their lectures over a period of a few months. Because of the large number of students in the classes, an automatic labelling system using Augmented Reality Markers was developed. This allowed extracted face images to be quickly and easily labelled. In addition to the face images, the seat positions of the students were also captured. It has been shown that students tend to sit in preferred seats during lectures. We show that using the face images in conjunction with the student’s seat position, the system is able to identify the student better. Several methods of injecting the seat information into the system were developed and analysed. The facial recognition component was trained using a Convolutional Neural Network (CNN) with Triplet Loss and Quadruplet Loss. Triplet loss is a standard facial recognition framework. Quadruplet loss attempts to improve on triplet loss however we show that there are some issues with quadruplet loss. The results show that this approach is not worth the extra computation cost. A series of adaptive margin strategies for potentially improving these loss functions were developed. The network creates low dimensional representations of the face images instead of performing the classification. Once these representations were generated, several classical classification techniques were used, such as Support Vector Machines, Logistic Regression, boosting methods and mixture models. These representations are forced to lie on the boundary of a hypersphere which makes the data curved. A mixture model which can fit to this spherical data was developed and if the data is curved enough, it outperforms standard mixture models. Finally, we show that in the inclusion of the extra metadata can improve recognition accuracy. However, if the metadata is too consistent then simply classifying on the metadata alone would outperform classifying on the face images. The network trained with standard triplet loss performed the most consistently across several different classifiers
A dissertation submitted for the degree of Master of Science in Computer Science at the University of the Witwatersrand, Faculty of Science, 2020