The effect of ascertainment bias on detecting signatures of selection

dc.contributor.authorWillemse, Marla
dc.date.accessioned2020-10-08T13:00:03Z
dc.date.available2020-10-08T13:00:03Z
dc.date.issued2019
dc.descriptionA Dissertation submitted to the Faculty of Health Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Master of Science in Medicine. Johannesburg, 2019en_ZA
dc.description.abstractGenotyping arrays have been broadly used to identify signatures of selection with genome-wide scans. It has been reported that the markers contained in arrays don’t accurately represent the variation in full sequence data, especially in non-European populations, and that this may affect the results of selection studies. The availability of whole genome sequence (WGS) data from various African populations has enabled the analysis of the extent to which ascertainment bias affects the detection of selection signals on this continent. Seven commonly used genotyping arrays were represented by creating in silico single nucleotide polymorphism (SNP) panels from WGS data of the African Genome Variation Project (AGVP) Baganda, Ethiopia and Zulu samples. Four types of selection scans (FST, iHS, XP-EHH and Tajima’s D) were performed on both the array and WGS datasets, and the accuracy of selection signals identified from array data was assessed in relation to the WGS results. It was found that selection scans performed with array data produced a significant proportion of false positives and false negative signals. The EHH-based methods were least affected by ascertainment bias and arrays with higher marker density generally produced more accurate results. The two arrays ascertained from African populations out-performed a more European-based array of similar size. Variation in marker density across the genome was found to underlie discrepancies between array and WGS selection signals, as genomic regions in array data containing fewer markers were less likely to be detected as selection signals. Of the selection signals identified from WGS but not array data, most were missed due to insufficient SNP density. To investigate the extent to which the selection signals from one Southeastern Bantu-speaking (SEB) group is shared by another SEB group, selection scans on two independent SEB groups, namely the Bt20 and AGVP Zulu samples. The overlap in selection signals between the samples was found to be limited, concurring with differential KhoeSan gene flow into these groups. It was found that various selection scan methods are differentially affected by ascertainment bias, and additionally, limited concordance was observed between the selection signals identified by different methods. A comparison of selection signals between the three AGVP populations revealed high population specificity of signals. Regions displaying signatures of selection were annotated for gene names and functionality, and both canonical and less well-established selection candidates were identified. These included genes associated with infectious diseases, cancer, metabolism, pigmentation, neuro-motor functions and high altitude adaptation.en_ZA
dc.description.librarianMT 2020en_ZA
dc.facultyFaculty of Health Sciencesen_ZA
dc.identifier.urihttps://hdl.handle.net/10539/29792
dc.language.isoenen_ZA
dc.titleThe effect of ascertainment bias on detecting signatures of selectionen_ZA
dc.typeThesisen_ZA
Files
Original bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
Dissertation_final.pdf
Size:
3.1 MB
Format:
Adobe Portable Document Format
Description:
No Thumbnail Available
Name:
appendix_corrected.pdf
Size:
5.84 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections