Galaxy classification using machine learning
Date
2021
Authors
Variawa, Mohamed Zayyan
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
An important area of study is galaxy classification, as the type and formation of galaxies often offer insights into the origin and evolution of the universe. The majority of classifications comes from human experts manually inspecting and labelling images of galaxies. Owing to the increased availability of images of galaxies, re-searchers coupled machine learning with crowd-sourced labels to automate the process of galaxy classification to save time spent by astronomers performing manual classification. However, studying the generalisation of these crowd-sourced labels to more expert classification systems like the Hubble tuning fork is essential. Multiple ResNet50 models are trained on the crowd-sourced Galaxy Zoo 1 and 2 datasets as well as the expertly labelled EFIGI catalogue to classify galaxies according to their Hubble types. To study the generalisation of the models trained on crowd-sourced data against the models trained on expert data, the expert Revised Shapley-Ames catalogue is used an unseen test set. Deep Metric Learning techniques are used to fine-tune classification models to improve on the current state-of-the-art results for classifying galaxies. The results show that Transfer Learning coupled with the ResNet50 outperforms self-defined rules for galaxy classification, indicating the effectiveness of machine learning for galaxy classification. The results further demonstrate an improvement on the current state-of-the-art accuracy for both the Galaxy Zoo 2 and EFIGI data, using Transfer Learning with the ResNet-50 model. The mean average precision values for both the crowd-sourced and expert models indicated that the models are comparable. However, confusion matrices reveal that the models trained on the expert dataset outperformed the models trained on the crowd-sourced data in terms of actual vs. predicted labels. The result highlights the need for caution when utilising crowd-sourced labels. The results further show that a model that has been pre-trained on crowd-sourced data using Label Smoothing Cross-Entropy can be fine-tuned using Deep Metric Learning to achieve the state-of-the-art performance in galaxy morphology classification. Finally, Transfer Learning from crowd-sourced labelled data to expert-labelled data leads to significant improvement in classification accuracy
Description
A research report submitted to the School of Computer Science and Applied Mathematics, Faculty of Science, University of Witwatersrand, in partial fulfillment of the requirements for the degree of Master of Science, 2021