Galaxy classification using machine learning

Thumbnail Image

Date

2021

Authors

Variawa, Mohamed Zayyan

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

An important area of study is galaxy classification, as the type and formation of galaxies often offer insights into the origin and evolution of the universe. The majority of classifications comes from human experts manually inspecting and labelling images of galaxies. Owing to the increased availability of images of galaxies, re-searchers coupled machine learning with crowd-sourced labels to automate the process of galaxy classification to save time spent by astronomers performing manual classification. However, studying the generalisation of these crowd-sourced labels to more expert classification systems like the Hubble tuning fork is essential. Multiple ResNet50 models are trained on the crowd-sourced Galaxy Zoo 1 and 2 datasets as well as the expertly labelled EFIGI catalogue to classify galaxies according to their Hubble types. To study the generalisation of the models trained on crowd-sourced data against the models trained on expert data, the expert Revised Shapley-Ames catalogue is used an unseen test set. Deep Metric Learning techniques are used to fine-tune classification models to improve on the current state-of-the-art results for classifying galaxies. The results show that Transfer Learning coupled with the ResNet50 outperforms self-defined rules for galaxy classification, indicating the effectiveness of machine learning for galaxy classification. The results further demonstrate an improvement on the current state-of-the-art accuracy for both the Galaxy Zoo 2 and EFIGI data, using Transfer Learning with the ResNet-50 model. The mean average precision values for both the crowd-sourced and expert models indicated that the models are comparable. However, confusion matrices reveal that the models trained on the expert dataset outperformed the models trained on the crowd-sourced data in terms of actual vs. predicted labels. The result highlights the need for caution when utilising crowd-sourced labels. The results further show that a model that has been pre-trained on crowd-sourced data using Label Smoothing Cross-Entropy can be fine-tuned using Deep Metric Learning to achieve the state-of-the-art performance in galaxy morphology classification. Finally, Transfer Learning from crowd-sourced labelled data to expert-labelled data leads to significant improvement in classification accuracy

Description

A research report submitted to the School of Computer Science and Applied Mathematics, Faculty of Science, University of Witwatersrand, in partial fulfillment of the requirements for the degree of Master of Science, 2021

Keywords

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By