Parallelisation of EST clustering

dc.contributor.authorRanchod, Pravesh
dc.date.accessioned2006-03-23T12:12:56Z
dc.date.available2006-03-23T12:12:56Z
dc.date.issued2006-03-23
dc.descriptionMaster of Science - Scienceen
dc.description.abstractThe field of bioinformatics has been developing steadily, with computational problems related to biology taking on an increased importance as further advances are sought. The large data sets involved in problems within computational biology have dictated a search for good, fast approximations to computationally complex problems. This research aims to improve a method used to discover and understand genes, which are small subsequences of DNA. A difficulty arises because genes contain parts we know to be functional and other parts we assume are non-functional as there functions have not been determined. Isolating the functional parts requires the use of natural biological processes which perform this separation. However, these processes cannot read long sequences, forcing biologists to break a long sequence into a large number of small sequences, then reading these. This creates the computational difficulty of categorizing the short fragments according to gene membership. Expressed Sequence Tag Clustering is a technique used to facilitate the identification of expressed genes by grouping together similar fragments with the assumption that they belong to the same gene. The aim of this research was to investigate the usefulness of distributed memory parallelisation for the Expressed Sequence Tag Clustering problem. This was investigated empirically, with a distributed system tested for speed against a sequential one. It was found that distributed memory parallelisation can be very effective in this domain. The results showed a super-linear speedup for up to 100 processors, with higher numbers not tested, and likely to produce further speedups. The system was able to cluster 500000 ESTs in 641 minutes using 101 processors.en
dc.format.extent325670 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10539/281
dc.language.isoen
dc.subjectclusteringen
dc.subjectesten
dc.subjectParalleisationen
dc.titleParallelisation of EST clusteringen
dc.typeThesisen
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Thesis.pdf
Size:
318.04 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.8 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections