Machine learning and soft computing approaches to microarray differential expression analysis and feature selection.

DSpace/Manakin Repository

Show simple item record

dc.contributor.author Perez, Meir
dc.date.accessioned 2012-09-12T06:50:40Z
dc.date.available 2012-09-12T06:50:40Z
dc.date.issued 2012-09-12
dc.identifier.uri http://hdl.handle.net/10539/11932
dc.description.abstract Differential expression analysis and feature selection is central to gene expression microarray data analysis. Standard approaches are flawed with the arbitrary assignment of cut-off parameters and the inability to adapt to the particular data set under analysis. Presented in this thesis are three novel approaches to microarray data feature selection and differential expression analysis based on various machine learning and soft computing paradigms. The first approach uses a Separability Index to select ranked genes, making gene selection less arbitrary and more data intrinsic. The second approach is a novel gene ranking system, the Fuzzy Gene Filter, which provides a more holistic and adaptive approach to ranking genes. The third approach is based on a Stochastic Search paradigm and uses the Population Based Incremental Learning algorithm to identify an optimal gene set with maximum inter-class distinction. All three approaches were implemented and tested on a number of data sets and the results compared to those of standard approaches. The Separability Index approach attained a K-Nearest Neighbour classification accuracy of 92%, outperforming the standard approach which attained an accuracy of 89.6%. The gene list identified also displayed significant functional enrichment. The Fuzzy Gene Filter also outperformed standard approaches, attaining significantly higher accuracies for all of the classifiers tested, on both data sets (p < 0.0231 for the prostate data set and p < 0.1888 for the lymphoma data set). Population Based Incremental Learning outperformed Genetic Algorithm, identifying a maximum Separability Index of 97.04% (as opposed to 96.39%). Future developments include incorporating biological knowledge when ranking genes using the Fuzzy Gene Filter as well as incorporating a functional enrichment assessment in the fitness function of the Population Based Incremental Learning algorithm. en_ZA
dc.language.iso en en_ZA
dc.title Machine learning and soft computing approaches to microarray differential expression analysis and feature selection. en_ZA
dc.type Thesis en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search WIReDSpace


Browse

My Account

Statistics