A tree-structured index algorithm for expressed sequence tags clustering

DSpace/Manakin Repository

Show simple item record

dc.contributor.author Kumwenda, Benjamin
dc.date.accessioned 2009-02-04T09:40:35Z
dc.date.available 2009-02-04T09:40:35Z
dc.date.issued 2009-02-04T09:40:35Z
dc.identifier.uri http://hdl.handle.net/10539/6001
dc.description.abstract Abstract Expressed sequence tags (ESTs) are complementary deoxyribonucleic acid (cDNA) fragments, which are reverse transcribed from mature ribonucleic acid (mRNA), a direct gene transcript. ESTs are a readily rich information source of complete expressed gene sequences. They reveal the type and number of genes being expressed in an organism. Joining ESTs into complete gene sequences is computationally expensive because they are numerous, erroneous, redundant and mixed up. ESTs that originate from the same gene are grouped together. This enables efficient consensus sequences generation, which reveals underlying gene sequences and their possible alternative splicings. EST clustering enables efficient discovery of expressed genes based on which several fields rely such as: disease diagnostics, drug discovery, genetic engineering, alternative splicing and many others. Most clustering algorithms developed so far are quadratic and their running time is prohibitively high. A tree-structured index algorithm has been developed to efficiently cluster ESTs with respect to running time and quality of generated clusters. The algorithm clusters ESTs in a pseudometric space by recursively partitioning a data set of EST windows into two disjointed sets. Performance of the algorithm was tested with respect to running time and quality of generated clusters. Further experiments were performed to investigate the effectiveness of the triangle inequality, which was implemented to reduce distance computations during clustering. Experimental results show that the algorithm has a running time closer to linear with a 100% specificity, but it fluctuates in sensitivity. Implementation of the triangle inequality did not significantly improve the performance of the algorithm. en
dc.language.iso en en
dc.title A tree-structured index algorithm for expressed sequence tags clustering en
dc.type Thesis en


Files in this item

This item appears in the following Collection(s)

  • ETD Collection
    Thesis (Ph.D.)--University of the Witwatersrand, 1972.

Show simple item record

Search WIReDSpace


Browse

My Account

Statistics