Parallelisation of EST clustering

DSpace/Manakin Repository

Show simple item record

dc.contributor.author Ranchod, Pravesh
dc.date.accessioned 2006-03-23T12:12:56Z
dc.date.available 2006-03-23T12:12:56Z
dc.date.issued 2006-03-23
dc.identifier.uri http://hdl.handle.net/10539/281
dc.description Master of Science - Science en
dc.description.abstract The field of bioinformatics has been developing steadily, with computational problems related to biology taking on an increased importance as further advances are sought. The large data sets involved in problems within computational biology have dictated a search for good, fast approximations to computationally complex problems. This research aims to improve a method used to discover and understand genes, which are small subsequences of DNA. A difficulty arises because genes contain parts we know to be functional and other parts we assume are non-functional as there functions have not been determined. Isolating the functional parts requires the use of natural biological processes which perform this separation. However, these processes cannot read long sequences, forcing biologists to break a long sequence into a large number of small sequences, then reading these. This creates the computational difficulty of categorizing the short fragments according to gene membership. Expressed Sequence Tag Clustering is a technique used to facilitate the identification of expressed genes by grouping together similar fragments with the assumption that they belong to the same gene. The aim of this research was to investigate the usefulness of distributed memory parallelisation for the Expressed Sequence Tag Clustering problem. This was investigated empirically, with a distributed system tested for speed against a sequential one. It was found that distributed memory parallelisation can be very effective in this domain. The results showed a super-linear speedup for up to 100 processors, with higher numbers not tested, and likely to produce further speedups. The system was able to cluster 500000 ESTs in 641 minutes using 101 processors. en
dc.format.extent 325670 bytes
dc.format.mimetype application/pdf
dc.language.iso en
dc.subject clustering en
dc.subject est en
dc.subject Paralleisation en
dc.title Parallelisation of EST clustering en
dc.type Thesis en


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search WIReDSpace


Browse

My Account

Statistics