Support vector machine prediction of HIV-1 drug resistance using The Viral Nucleotide patterns
No Thumbnail Available
Date
2007-02-23T12:45:30Z
Authors
Araya, Seare Tesfamichael
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Drug resistance of the HI virus due to its fast replication and error-prone mutation is a key factor
in the failure to combat the HIV epidemic. For this reason, performing pre-therapy drug
resistance testing and administering appropriate drugs or combination of drugs accordingly is
very useful. There are two approaches to HIV drug resistance testing: phenotypic (clinical)
and genotypic (based on the particular virus’s DNA). Genotyping tests HIV drug resistance by
detecting specific mutations known to confer drug resistance. It is cheaper and can be computerised.
However, it requires being able to know or learn what mutations confer drug resistance.
Previous research using pattern recognition techniques has been promising, but the performance
needs to be improved. It is also important for techniques that can quickly learn new rules when
faced with new mutations or drugs.
A relatively recent addition to these techniques is the Support Vector Machines (SVMs).
SVMs have proved very successful in many benchmark applications such as face recognition,
text recognition, and have also performed well in many computational biology problems where
the number of features targeted is large compared to the number of available samples. This
paper explores the use of SVMs in predicting the drug resistance of an HIV strain extracted
from a patient based on the genetic sequence of those parts of the viral DNA encoding for the
two enzymes, Reverse Transcriptase or Protease, which are critical for the replication of the
HIV virus. In particular, it is the aim of this reseach to design the model without incorporating
the biological knowledge at hand to enable the resulting classifier accommodate new drugs and
mutations.
To evaluate the performance of SVMs we used cross validation technique to measure the
unbiased estimate on 2045 data points. The accuracy of classification and the area under the receiver
operating characteristics curve (AUC) was used as a performance measure. Furthermore,
to compare the performance of our SVMs model we also developed other prediction models
based on popular classification algorithms, namely neural networks, decision trees and logistic
regressions.
The results show that SVMs are a highly successful classifier and out-perform other techniques
with performance ranging between (94.13%–96.33%) accuracy and (81.26% - 97.49%)
AUC. Decision trees were rated second and logistic regression performed the worst.
Description
Student Number : 0213068F -
MSc Dissertation -
School of Computer Science -
Faculty of Science
Keywords
SVM, HIV, mutation, bioinformatics classification