Dynamic protein classification: Adaptive models based on incremental learning strategies

dc.contributor.authorMohamed, Shakir
dc.date.accessioned2008-03-18T09:27:41Z
dc.date.available2008-03-18T09:27:41Z
dc.date.issued2008-03-18T09:27:41Z
dc.description.abstractAbstract One of the major problems in computational biology is the inability of existing classification models to incorporate expanding and new domain knowledge. This problem of static classification models is addressed in this thesis by the introduction of incremental learning for problems in bioinformatics. The tools which have been developed are applied to the problem of classifying proteins into a number of primary and putative families. The importance of this type of classification is of particular relevance due to its role in drug discovery programs and the benefit it lends to this process in terms of cost and time saving. As a secondary problem, multi–class classification is also addressed. The standard approach to protein family classification is based on the creation of committees of binary classifiers. This one-vs-all approach is not ideal, and the classification systems presented here consists of classifiers that are able to do all-vs-all classification. Two incremental learning techniques are presented. The first is a novel algorithm based on the fuzzy ARTMAP classifier and an evolutionary strategy. The second technique applies the incremental learning algorithm Learn++. The two systems are tested using three datasets: data from the Structural Classification of Proteins (SCOP) database, G-Protein Coupled Receptors (GPCR) database and Enzymes from the Protein Data Bank. The results show that both techniques are comparable with each other, giving classification abilities which are comparable to that of the single batch trained classifiers, with the added ability of incremental learning. Both the techniques are shown to be useful to the problem of protein family classification, but these techniques are applicable to problems outside this area, with applications in proteomics including the predictions of functions, secondary and tertiary structures, and applications in genomics such as promoter and splice site predictions and classification of gene microarrays.en
dc.format.extent13384281 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10539/4678
dc.language.isoenen
dc.subjectbioinformaticsen
dc.subjectprotein classificationen
dc.subjectneural networksen
dc.subjectfuzzy ARTMAPen
dc.subjectincremental learningen
dc.titleDynamic protein classification: Adaptive models based on incremental learning strategiesen
dc.typeThesisen
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
thesis.pdf
Size:
12.76 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
96 B
Format:
Item-specific license agreed upon to submission
Description:
Collections