Correlations between self-reported ethnicity and language with genetic clustering in Kenya

dc.contributor.authorWolberg, Yonatan Ariel
dc.date.accessioned2024-02-22T13:21:48Z
dc.date.available2024-02-22T13:21:48Z
dc.date.issued2024
dc.descriptionA research report submitted in partial fulfilment of the requirement for the degree of Master of Science in Medicine (Genomic Medicine) to the Faculty of Health Sciences, University of the Witwatersrand, School of Human and Community Development, Johannesburg, 2023
dc.description.abstractKenya is a highly diverse country, where a combination of recent local migrations and admixture have contributed to a complex population structure. This structure creates a dilemma when trying to assess the allele frequencies of disease-associated variants within the country as different groups will show different frequencies. Additionally, ethnic groups in genetic studies are often defined on the basis of self-reported identity but certain individuals may align genetically to another ethnic group. It is necessary to properly characterize Kenyan diversity for population level risk estimation and the implementation of public health approaches. This study aimed to determine how self-reported ethnicity correlates to genetic clustering in a Kenyan cohort. The effect of the discordance between the two on the frequencies of key malaria- and trypanosomiasis-associated variants was then determined. This study leveraged Kenya AWIGen dataset, comprising 1,703 individuals (of the Kikuyu, Kamba, Luhya, Luo, Kisii and Somali ethnic groups) recruited in Nairobi. Combining a bootstrap approach for allele frequency estimation and centroid-based filtering, this study was able to show that small discordances are able to significantly impact allele frequencies of disease-associated variants. More robust approaches to compare genetic- and ethnicity-based clustering might reveal further differences. Overall, the results indicate that while self-reported identity can provide reasonably reliable categorization for the Kenyan dataset, inclusion of additional variables, such as language, geographic origin, and both parental and grandparental identity, might be necessary for more accurate estimates.
dc.description.librarianTL (2024)
dc.description.sponsorshipNational Research Foundation (NRF)
dc.facultyFaculty of Health Sciences
dc.identifier.urihttps://hdl.handle.net/10539/37710
dc.language.isoen
dc.schoolHuman and Community Development
dc.subjectKenya
dc.subjectEthnic group
dc.subjectGenetic studies
dc.subjectPublic health
dc.titleCorrelations between self-reported ethnicity and language with genetic clustering in Kenya
dc.typeDissertation
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Kenya AWI-Gen Research Report YoniAriWolberg_Final_1 (1).pdf
Size:
3.28 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.43 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections