Exploring the efficacy of popular clustering techniques on gene expression data

No Thumbnail Available

Date

2020

Authors

Batista, S. TKS

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

High throughput data has presented a wealth of genomic information, but as of yet a golden standard has not been presented and tested as means for the analysis of this data. Posing the question of whether biological function can be inferred solely from gene expression data of a host at different states. In-light of the lack of information that exits on the procedure to be employed in a true gene expression data exploratory process, a robust methodology was implemented. This included the use of a wide array of clustering algorithms along with numerous validation indices to attempt to discover the natural biological classes that existed within significantly unannotated data. While not being the most novel of the machine-learning techniques proposed for such data analysis, the k-means algorithm outperformed other methods when validated using known model validation techniques. The testing of the functional biological validity of these results were found to present a sufficiently accurate image of the underlying biological functions. These results while promising would require further validation via experimental methods to ensure the accuracy of the biological inferences

Description

A dissertation submitted in fulfilment of the requirements for the degree Master of Science, in the School of Computer Science and Applied Mathematics, Faculty of Science, University of the Witwatersrand, Johannesburg, 2020

Keywords

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By