ETD Collection

Permanent URI for this collectionhttps://wiredspace.wits.ac.za/handle/10539/104


Please note: Digitised content is made available at the best possible quality range, taking into consideration file size and the condition of the original item. These restrictions may sometimes affect the quality of the final published item. For queries regarding content of ETD collection please contact IR specialists by email : IR specialists or Tel : 011 717 4652 / 1954

Follow the link below for important information about Electronic Theses and Dissertations (ETD)

Library Guide about ETD

Browse

Search Results

Now showing 1 - 1 of 1
  • Item
    Parallelisation of EST clustering
    (2006-03-23) Ranchod, Pravesh
    The field of bioinformatics has been developing steadily, with computational problems related to biology taking on an increased importance as further advances are sought. The large data sets involved in problems within computational biology have dictated a search for good, fast approximations to computationally complex problems. This research aims to improve a method used to discover and understand genes, which are small subsequences of DNA. A difficulty arises because genes contain parts we know to be functional and other parts we assume are non-functional as there functions have not been determined. Isolating the functional parts requires the use of natural biological processes which perform this separation. However, these processes cannot read long sequences, forcing biologists to break a long sequence into a large number of small sequences, then reading these. This creates the computational difficulty of categorizing the short fragments according to gene membership. Expressed Sequence Tag Clustering is a technique used to facilitate the identification of expressed genes by grouping together similar fragments with the assumption that they belong to the same gene. The aim of this research was to investigate the usefulness of distributed memory parallelisation for the Expressed Sequence Tag Clustering problem. This was investigated empirically, with a distributed system tested for speed against a sequential one. It was found that distributed memory parallelisation can be very effective in this domain. The results showed a super-linear speedup for up to 100 processors, with higher numbers not tested, and likely to produce further speedups. The system was able to cluster 500000 ESTs in 641 minutes using 101 processors.