Development of an analysis pipeline for HLA genotyping using illumina short reads

Bird, James
Journal Title
Journal ISSN
Volume Title
Human leukocyte antigens are highly polymorphic loci located on chromosome six. This region is the most polymorphic region within the human genome, and as such, genotyping alleles in this region is problematic. Furthermore, the required resolution of genotyping is dependent on the application. For instance, organ transplants require two-digit resolution for kidney, and a minimum of four-digit resolution for bone marrow, while population disease related studies often require six-digit resolution. As specialized HLA genotyping tools have been developed which utilize NGS data, the aim of this study was to compare four HLA genotyping tools, namely - BWAkit, xHLA, Kourami and HISAT-Genotype, and to evaluate whether population-specific HLA variability would affect their accuracy. The accuracy of the tools were compared to Sanger sequenced HLA data, where exons 2 and 3 were sequenced for HLA class I. As exons 2 and 3 were available as a reference from the Sanger sequencing, an accurate allele call was determined on its similarity to the reference data. It was found that at the two- and four-digit resolution, xHLA was the most accurate, which was due to the inclusion of a nucleotide-to-protein alignment step in the algorithm. Kourami was the most accurate at the six-digit resolution due to the use of alternate loci, in the alignment step. To further identify possible error trends, the allele sequences produced by the tools were analyzed. It was found that the majority of errors occurred at heterozygous positions, where false homozygous positions were identified. It was also noted that, with the exception of HISAT-Genotype, each tool was most accurate at HLA-B, and least accurate at HLA-C. From evaluating HLA population-specific variability, it was found that the four super-populations tested African, Asian, European and South American, did not significantly vary, in regards to HLA variability. It was, however, found that the different loci differed significantly from each other. Therefore, in conclusion, future improvements include varying the parameters when genotyping different loci. Currently, however, a consensus approach using xHLA and Kourami should be utilized
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, in fulfillment of the requirements for the degree of Master of Science. Johannesburg March, 2019
Bird, James Andrew (2019) Development of an analysis pipeline for HLA genotyping using illumina short reads, University of the Witwatersrand, Johannesburg, <>