Machine learning and soft computing approaches to microarray differential expression analysis and feature selection.

dc.contributor.authorPerez, Meir
dc.date.accessioned2012-09-12T06:50:40Z
dc.date.available2012-09-12T06:50:40Z
dc.date.issued2012-09-12
dc.description.abstractDifferential expression analysis and feature selection is central to gene expression microarray data analysis. Standard approaches are flawed with the arbitrary assignment of cut-off parameters and the inability to adapt to the particular data set under analysis. Presented in this thesis are three novel approaches to microarray data feature selection and differential expression analysis based on various machine learning and soft computing paradigms. The first approach uses a Separability Index to select ranked genes, making gene selection less arbitrary and more data intrinsic. The second approach is a novel gene ranking system, the Fuzzy Gene Filter, which provides a more holistic and adaptive approach to ranking genes. The third approach is based on a Stochastic Search paradigm and uses the Population Based Incremental Learning algorithm to identify an optimal gene set with maximum inter-class distinction. All three approaches were implemented and tested on a number of data sets and the results compared to those of standard approaches. The Separability Index approach attained a K-Nearest Neighbour classification accuracy of 92%, outperforming the standard approach which attained an accuracy of 89.6%. The gene list identified also displayed significant functional enrichment. The Fuzzy Gene Filter also outperformed standard approaches, attaining significantly higher accuracies for all of the classifiers tested, on both data sets (p < 0.0231 for the prostate data set and p < 0.1888 for the lymphoma data set). Population Based Incremental Learning outperformed Genetic Algorithm, identifying a maximum Separability Index of 97.04% (as opposed to 96.39%). Future developments include incorporating biological knowledge when ranking genes using the Fuzzy Gene Filter as well as incorporating a functional enrichment assessment in the fitness function of the Population Based Incremental Learning algorithm.en_ZA
dc.identifier.urihttp://hdl.handle.net/10539/11932
dc.language.isoenen_ZA
dc.titleMachine learning and soft computing approaches to microarray differential expression analysis and feature selection.en_ZA
dc.typeThesisen_ZA

Files

Original bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
Meir Perez PhD Thesis.pdf
Size:
2.81 MB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
Meir Perez PhD Thesis Abstract.pdf
Size:
46.39 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections