Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix

No Thumbnail Available

Date

2014

Authors

Ndhlovu, Andrew

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Selective pressures at the DNA level shape genes into pro les consisting of patterns of rapidly evolving sites and sites withstanding change. These pro les remain detectable even when protein sequences become extensively diverged. It has been hypothesised that these patterns can be used as gene identi ers. A common task in molecular biology is to infer functional, structural or evolutionary relationships by querying a database using an algorithm. However, problems arise when sequence similarity is low. The problem is that the algorithm produces numerous false positives when highly conserved datasets are aligned. To increase the sensitivity of the algorithm, the evolutionary rate based approach was reimplemented and coupled with a conventional BLOSUM substitution matrix to produce a new implementation called BLOSUM-FIRE. The two approaches are combined in a dynamic scoring function, which uses the selective pressure to score aligned residues. Analysis of quality of alignments produced, revealed that the new implementation of the FIRE algorithm performs as well as conventional algorithms. In addition, the Evolutionary rate Database (EvoDB), which is a compilation of evolutionary rate pro les of all the members of the PFAM-A protein domain database has been developed. The EvoDB database can be queried using FIRE to infer protein domain functions. The utility of this algorithm and database was tested by inferring the domain functions of the Hepatitis B X protein. Results show that the BLOSUM-FIRE algorithm was able to accurately identify the domain function of HBx as a trans-activation protein using EvoDB. The biological relevance of these results was not validated and requires further interrogation; however, these proteins share vital roles in viral replication. This study demonstrates the utility of an evolutionary rate based approach and demonstrates that such an approach is robust when coupled with an amino acid substitution matrix yielding results comparable to conventional algorithms. EvoDB is a catalogue of the evolutionary rate pro les and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identi er data. The BLOSUM-FIRE software and user manual including the EvoDB at le database and release notes have been made freely available at www.bioinf.wits.ac.za/software/fire. The BLOSUM-FIRE algorithm and EvoDB database present a tier of information untapped by current databases and tools.
A dissertation submitted to the Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, in ful lment of the requirements of the degree of Master of Science (Medicine).

Description

Keywords

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By