Genetic variation in Khoisan-speaking populations from 
southern Africa 
 
 
 
BY 
 
 
 
Carina Maria Schlebusch 
 
 
 
 
 
 
A thesis submitted to the Faculty of Health Sciences, University of the Witwatersrand, 
Johannesburg, in fulfillment of the requirements for the degree of Doctor of Philosophy. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Johannesburg, 2010 
 
 
 
 
 ii 
DECLARATION 
 
I declare that this thesis is my own unaided work. It is being submitted for the Degree of 
Doctor of Philosophy in Human Genetics at the University of the Witwatersrand, 
Johannesburg. It has not been submitted before for any degree or examination at any other 
university. I declare that this work has been approved by the Ethics Committee of the 
University of the Witwatersrand for Research on Human Subjects, and the certificate 
numbers are M050902 and M980553. 
 
 
 
 
 
 
 
________________________________                         ____________________ 
Carina M. Schlebusch                                                      Date 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 iii 
ABSTRACT 
 
The San and Khoe people currently represent remnant groups of a much larger and widely 
distributed population of hunter gatherers and cattle herders, respectively, who had 
exclusive occupation of southern Africa before the arrival of Bantu-speaking groups in the 
past 1,200 years and sea-borne immigrants within the last 350 years. This project made 
use of mitochondrial DNA (mtDNA), Y-chromosome DNA and autosomal DNA markers to 
examine the population structure of various San and Khoe groups and to reconstruct their 
prehistory. The groups included in the study consists of six different Khoe-San groups 
(?Khomani, Nama, Khwe, !Xun, /Gui + //Gana + Kgalagari and Ju\?hoansi), four different 
Coloured groups and five other population groups that were included in the comparative 
analysis. 
 
For the mtDNA study a minisequencing technique was successfully developed which 
allowed the assignment of mtDNA lineages into the 10 global mtDNA macro-haplogroups. 
Haplogroups were further resolved using control region sequence data obtained from both 
hypervariable regions (HVR I and HVR II). Using this approach 538 individuals (both males 
and females) were screened and their mtDNA types were resolved into 18 haplogroups 
encompassing 245 unique haplotypes. In addition, 353 males were examined for Y-
 chromosome DNA variation using 46 bi-allelic Y-chromosome markers and 12 Y-STR 
markers. The Y-chromosomes in the sample were assigned into 29 haplogroups (using bi-
 allelic variation) following the nomenclature initially recommended by the Y-chromosome 
Consortium and resolved into 268 unique haplotypes (Y-STR variation). To assess the 
level of autosomal variation, 220 genome wide autosomal SNPs were typed in 352 
individuals. These SNPs were combined in different datasets and analysed using two 
different approaches allowing for genotype and haplotype analyses. Data from these three 
marker systems were analysed using different analytical methods (distance based 
phylogenetic analysis, network analysis, dating of lineages, principal components analysis, 
phylogeographic analysis, AMOVA analysis, population structure analysis, and population 
genetic summary statistics) to asses the ancestral associations and the genetic affinities of 
the various San, Khoe and Coloured populations. 
 
 iv 
The most striking observation from this study was the high frequencies of the oldest 
mtDNA haplogroups (L0d and L0k) and Y-chromosome haplogroups (haplogroups A and 
B) found in Khoe-San and Coloured groups. The sub-haplogroups were, however, 
differentially distributed in the different Khoe-San and Coloured groups which suggested 
different demographic histories.  
 
The current distribution of Khoe-San groups comprises a wide geographic region extending 
from southern Angola in the north to the Cape Province (South Africa) in the south. 
Linguistically Khoe-San groups are also divided into northern Khoisan-speaking groups (Ju 
division) and southern Khoisan-speaking groups (Tuu division) with an additional linguistic 
group (Khoe) associated with some Khoe-speaking San groups in Botswana and the Khoe 
herders of South Africa and Namibia (such as the Nama). For all three genetic marker 
systems, northern groups (Ju speaking - !Xun, Ju\?hoansi and Khoe-speaking San - /Gui + 
//Gana) grouped into one cluster and southern groups (historically Tuu speaking - 
?Khomani and Coloured groups) grouped into a second cluster with the Khoe group 
(Nama) clustering with the southern Khoe-San and Coloured groups.  
 
The Khwe genetic profile was very different from the other Khoe-San groups. Although high 
proportions of Bantu-speaking admixture were identified in the Khwe group, they also 
contained a unique distribution of other mtDNA and Y-chromosome lineages. A previously 
published theory suggested  that, based on the presence of a specific E-M35 Y-
 chromosome haplotype, the Khwe might be descendants of an east African pastoralist 
group that introduced the pastoralist culture to a region located in the present day northern 
Botswana. This pattern also mirrors what archaeologists have found with respect to the 
introduction of pastoralism to southern Africa. The theory was further supported and 
elaborated on in the present thesis. Considering the frequency and distribution of E-M35, 
the highest frequency (46%) was found in the Khwe with a present-day distribution in 
northern Botswana and southern Angola while a decrease in frequency is observed 
towards the south with low frequencies (<10%) in the Karoo Coloured groups. Conversely, 
none of the mtDNA (female) L0k and L0d lineages observed in the Khwe group were 
observed in the southern Khoe-San and Coloured groups. From these observations a 
theory was proposed that after introduction into the region of northern Botswana, the 
 v 
southwards spread of pastoralism was not a clear-cut demic or cultural diffusion. Rather 
some male individuals integrated with the southern tribes and took with them the pastoralist 
practice and likely also their Khoe-language.  
 
Altogether this thesis presented new insights into the multifaceted demographic history that 
shaped the existing genetic landscape of the Khoe-San and Coloured populations of 
southern Africa. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 vi 
 
 
 
 
 
 
 
 
To: My parents 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 vii 
ACKNOWLEDGEMENTS 
 
I am grateful to all subjects who participated in this research project and would like to thank 
them for their contributed blood and saliva samples for DNA extraction and subsequent 
analyses. I would also like to thank Professor Trefor Jenkins and colleagues in the Division 
of Human Genetics for assistance with fieldwork and the processing of samples. In 
addition, I am thankful for the support and mediation provided by the South African San 
Council and the Working Group of Indigenous Minorities in Southern Africa (WIMSA). 
 
I am particularly appreciative to Prof. Fourie Joubert (Bioinformatics and Computational 
Biology Unit, University of Pretoria (UP)) for accommodating and assisting me in running 
analyses requiring intensive computational time on their cluster computer system at UP. 
 
During my studies I was supported by a National Research Foundation Prestigious 
Doctoral Scholarship. Travel grants from the University of the Witwatersrand and the 
National Research Foundation allowed me to present some of this work at an international 
conference. This research was supported by grants awarded to me by the NHLS Research 
Trust and to Professor Himla Soodyall by the NHLS, University of the Witwatersrand, the 
NRF and the MRC. 
 
My sincere appreciation goes to Professor Himla Soodyall, my supervisor, for her  
assistance and guidance throughout this study, and for reading several drafts of this thesis. 
I also wish to express my gratitude to my colleagues in the HGDDRU (TJ Naidoo, Heeran 
Makkan, Akashnie Maharaj, Raj Mahabeer, Christoff Erasmus and Pareen Patel) for their 
help, friendship and motivation. 
 
Finally, I would like to express my deepest gratitude to my parents, family and friends, for 
their constant support and encouragement. Most of all, my appreciation goes to my 
husband Ronnie who always have been an incredible source of help, love and 
encouragement throughout the years. 
 
 viii 
TABLE OF CONTENTS 
 
DECLARATION .................................................................................................................... ii 
ABSTRACT ......................................................................................................................... iii 
ACKNOWLEDGEMENTS................................................................................................... vii 
TABLE OF CONTENTS .................................................................................................... viii 
LIST OF FIGURES ............................................................................................................ xiii 
LIST OF TABLES ............................................................................................................... xx 
LIST OF ABBREVIATIONS ..............................................................................................xxiii 
Note on terminology adopted in thesis ............................................................................ xxvi 
1. INTRODUCTION ..............................................................................................................1 
1.1 Khoe-San today ..........................................................................................................5 
1.1.1 Group classification ..............................................................................................8 
1.1.1.1 The Ju ......................................................................................................12 
1.1.1.1.1 The !Xun ............................................................................................12 
1.1.1.1.2 The Ju\?hoansi ...................................................................................13 
1.1.1.1.3 The ?X?ao//??esi .................................................................................13 
1.1.1.2 Khoe-speaking San groups ......................................................................14 
1.1.1.2.1 The Tshua and Shua of eastern Botswana........................................14 
1.1.1.2.2 The Khwe of northern Botswana and southern Angola......................15 
1.1.1.2.3 The /Gui and //Gana of the central Kalahari.......................................15 
1.1.1.2.4 The Naro............................................................................................17 
1.1.1.2.5 The Hai\\om .......................................................................................18 
1.1.1.3 The Kwadi ................................................................................................18 
1.1.1.4 The Khoe..................................................................................................18 
1.1.1.4.1 The Korana ........................................................................................19 
1.1.1.4.2 The Cape Khoe..................................................................................19 
1.1.1.4.3 The Nama..........................................................................................19 
1.1.1.5 The !X?? and the ?H?? (Tuu division)......................................................20 
1.1.1.6 Remnants and descendants of Khoe and San groups living in South Africa
 .............................................................................................................................21 
1.1.1.6.1 N//? people (?Mountain Bushmen?)....................................................22 
 ix 
1.1.1.6.2 The //Xegwi........................................................................................22 
1.1.1.5.3 /Xam descendants .............................................................................23 
The Karretjie people......................................................................................24 
1.1.1.6.4 The ?Khomani ...................................................................................25 
1.1.1.5.5 South African Khoe descendant groups.............................................26 
1.1.1.5.6 The !Xun and Khwe of Platfontein .....................................................26 
1.2 Khoe-San history ......................................................................................................29 
1.2.1 Linguistics, Archaeology and Ethnography.........................................................29 
1.2.1.1 Khoisan Linguistic Family.........................................................................29 
1.2.1.2 Khoe-San History according to Linguistics ...............................................30 
1.2.1.3 Khoe-San History according to Archaeology and Ethnography................32 
1.2.1.4 Khoe-San history according to Physical Anthropology .............................40 
1.2.2 Khoe-San history according to molecular genetic studies ..................................42 
1.2.2.1 Serological studies ...................................................................................42 
1.2.2.1.1 Differences between San and Khoe ..................................................42 
1.2.2.1.2 Differences between Khoe-San subgroups........................................43 
1.2.2.1.3 Commonalities between Hadza, Sandawe and Khoe-San.................48 
1.2.2.1.4 Khoe-San admixture into other population groups.............................49 
1.2.2.2 Mitochondrial DNA studies .......................................................................49 
1.2.2.3 Y-chromosome studies.............................................................................58 
Y-chromosome tree structure ...........................................................................58 
The age of the Y-chromosome tree ..................................................................66 
Y-chromosome studies in the Khoe-San ..........................................................66 
Y-chromosome and mtDNA comparative studies .............................................68 
1.2.2.4 Autosomal DNA studies............................................................................70 
1.3 Aims..........................................................................................................................78 
2. SUBJECTS AND METHODS .........................................................................................83 
2.1 Subjects ....................................................................................................................83 
2.2 Methods ....................................................................................................................87 
2.2.1 DNA extraction ...................................................................................................87 
2.2.2 MtDNA methods .................................................................................................88 
2.2.2.1 MtDNA minisequencing method ...............................................................88 
 x 
2.2.2.1.1 PCR-multiplex amplification ...............................................................91 
2.2.2.1.2 Minisequencing reaction ....................................................................92 
2.2.2.2 HVS amplification and sequencing...........................................................96 
2.2.2.3 MtDNA data analysis ................................................................................98 
2.2.3 Y-chromosome methods ..................................................................................102 
2.2.3.1 Y-chromosome RFLP.............................................................................105 
2.2.3.2 Y-chromosome minisequencing .............................................................111 
2.2.3.3 Y-chromosome STR...............................................................................113 
2.2.3.4 Y-chromosome data analysis .................................................................114 
2.2.4 Autosomal SNP methods .................................................................................116 
2.2.4.1 Autosomal SNP data analysis (Genotypic).............................................118 
2.2.4.2 Autosomal SNP data analysis (Haplotypic) ............................................121 
3. MITOCHONDRIAL-DNA STUDIES ..............................................................................124 
3.1 Minisequencing .......................................................................................................125 
3.2 HVS-I and II variation ..............................................................................................130 
3.3 Haplogroup assignment and structure ....................................................................133 
3.3.1 Haplogroup L0d/k .............................................................................................136 
3.3.2 Khoe-San associated haplogroups L0d and L0k ? Further analysis.................142 
3.3.3 Discussion of analyses of Khoe-San associated haplogroups L0d and L0k.....151 
L0k..................................................................................................................151 
L0d..................................................................................................................155 
L0d3................................................................................................................157 
L0d1 and L0d2................................................................................................161 
L0d1................................................................................................................161 
L0d1a..............................................................................................................162 
L0d1b..............................................................................................................164 
L0d1c..............................................................................................................166 
L0d2................................................................................................................168 
L0d2a..............................................................................................................168 
L0d2b..............................................................................................................171 
L0d2d..............................................................................................................172 
L0d2c..............................................................................................................172 
 xi 
L0dx................................................................................................................173 
3.3.4 Summary of haplogroup histories .....................................................................174 
3.3.5 Haplogroup contributions from neighboring population groups ........................175 
3.4 Mitochondrial genetic relationships between different Khoe, San, Coloured and 
neighboring groups .......................................................................................................176 
3.4.1 Summary: Genetic Affinities between the Khoe-San and Coloured groups as 
inferred from mtDNA analysis....................................................................................194 
4. Y-CHROMOSOME STUDIES.......................................................................................197 
4.1 Haplogroup allocation and geographic distribution .................................................197 
4.2 Haplogroup diversity ...............................................................................................200 
4.3 African haplogroup analyses and discussion ..........................................................201 
Haplogroup A ?  Internal structure ..................................................................201 
Haplogroup A - Discussion .............................................................................206 
Haplogroup B ? Internal structure ...................................................................208 
Haplogroup B - Discussion .............................................................................212 
Haplogroup E ? Internal structure ...................................................................213 
Haplogroup E-M75..........................................................................................213 
Haplogroup E-M2............................................................................................216 
Haplogroup E-M35..........................................................................................219 
Haplogroup E - Discussion .............................................................................223 
4.4 Eurasian haplogroups .............................................................................................228 
Haplogroup R ? Internal structure...................................................................228 
Eurasian haplogroups - Discussion ................................................................228 
4.5 Analyses of Y-chromosome genetic relationships between different Khoe, San, 
Coloured and neighbouring groups...............................................................................231 
4.5.1 Discussion on the genetic affinities between Khoe-San and Coloured populations 
from southern Africa ..................................................................................................242 
5. AUTOSOMAL DNA STUDIES......................................................................................245 
5.1 Results and discussion (Genotypes).......................................................................245 
5.1.1 Heterozygosity..................................................................................................245 
5.1.2 STRUCTURE analyses ....................................................................................248 
5.1.3 Variation across STRUCTURE datasets ..........................................................256 
 xii 
5.1.4 Distance based analysis of unlinked SNP sets.................................................259 
5.1.5 AMOVA analysis...............................................................................................269 
5.2 Results and discussion (Haplotypes) ......................................................................272 
5.2.1 Inferred haplotypes...........................................................................................272 
5.2.2 Distance analysis..............................................................................................274 
5.3 Summary of autosomal results................................................................................285 
6. GENERAL DISCUSSION .............................................................................................286 
7. CONCLUSION..............................................................................................................293 
8. REFERENCES .............................................................................................................296 
9. APPENDICES ..............................................................................................................309 
Appendix A: Ethics approval .........................................................................................310 
Appendix B: Recipes for reagents and solutions used..................................................313 
Appendix C: Physical distance matrix (in km) between Khoe-San and Coloured groups
 ......................................................................................................................................316 
Appendix D: Details of SNP used in autosomal analyses .............................................317 
Appendix E: Haplotype list of HVR I and HVR II variation.............................................322 
Appendix F: Graphs ? Physical vs. Genetic distance (L0d/k sequences and L0d 
sequences) ...................................................................................................................338 
Appendix G: Haplotype list of 12 marker Y-STR panel .................................................339 
Appendix H: Bar charts showing haplotype frequencies for 44 inferred short haplotypes
 ......................................................................................................................................347 
 
 
 
 
 
 
 
 
 
 
 
 xiii 
LIST OF FIGURES 
 
 
 Page 
 
  
Figure 1.1   Map indicating the current distribution of Khoe-San groups 7 
Figure 1.2   Map representing the historical geographic spread of the 
Khoe-San according to their language groups 
10 
Figure 1.3 A   Cluster analysis of distance matrix data from Jenkins (1986) 46 
Figure 1.3 B Principal Component Analysis of distance matrix data from 
Jenkins (1986) 
47 
Figure 1.4   Tree showing global mtDNA macro-haplogroups according to 
the nomenclature of Behar et al., (2008) 
51 
Figure 1.5   Haplogroups within the L0 macro-haplogroup according to the 
nomenclature of Behar et al., (2008) 
53 
Figure 1.6   Sub-haplogroups within the L0d haplogroup according to the 
nomenclature of Behar et al., (2008) 
54 
Figure 1.7   Tree showing global Y-chromosome macro-haplogroups 
according to the nomenclature of Karafet et al., (2008) 
59 
Figure 1.8   Sub-haplogroups within haplogroup A according to the 
nomenclature of Karafet et al., (2008) 
60 
Figure 1.9   Sub-haplogroups within haplogroup B according to the 
nomenclature of Karafet et al., (2008) 
61 
Figure 1.10   Sub-haplogroups within haplogroup E according to the 
nomenclature of Karafet et al., (2008) 
65 
Figure 1.11   Distribution of Pygmies according to Cavalli-Sforza (1986) 77 
Figure 2.1   Map indicating the place of origin for the Coloured and Khoe-
 San individuals who participated in the study     
85 
Figure 2.2   Tree showing the 10 mtDNA macro-haplogroups that are 
distinguished by typing 14 SNPs  
90 
 
 
 
 
 
 
 
 
 
 xiv 
 
 
Page 
 
 
 
Figure 2.3   The Y-chromosome haplogroup tree with nomenclature 
according to Karafet et al., (2008) indicating the branch-
 defining mutations screened for by using SNaPshot 
minisequencing panels and RFLP assays in the HGDDRU 
laboratory 
104 
Figure 2.4   SNP selection strategy illustrated on a chromosome  116 
Figure 2.5   Diagram illustrating how STRUCTURE results for 100 SNP 
sets were condensed into one consensus run 
119 
Figure 3.1   A 2% agarose gel showing the six amplified fragments that 
result from the multiplex PCR 
126 
Figure 3.2   Electropherogram examples showing peak profiles of 
haplogroups L0, L1, L3 and M  
127 
Figure 3.3   Mitochondrial haplogroup tree with nomenclature according to 
Behar et al., (2008), listing haplogroup frequencies in the 
different populations in the study group 
131 
Figure 3.4   Graphical illustration of percentage mitochondrial haplogroup 
assignment in the populations used in comparative population 
analysis 
132 
Figure 3.5a   Maximum likelihood tree representing the substructure of L1 
to L5  
134 
Figure 3.5b   Maximum likelihood tree showing the relationships of the 
different mtDNA haplotypes within haplogroup L0  
135 
Figure 3.6   Median joining network representing L0 substructure in the 
different populations of the study group 
137 
Figure 3.7   L0d structure as published in Behar et al., (2008) with 
suggested changes  
138 
Figure 3.8   Graphical illustration of percentage L0d/k sub-haplogroup 
assignment in the populations used in comparative population 
analysis 
139 
 
 
 
 xv 
 
 
Page 
 
 
 
Figure 3.9   Graphic representation of coalescent times and times of 
divergence of the mtDNA sub-haplogroups of L0d and L0k 
141 
Figure 3.10   Bar-graph indicating the clinal distribution of the L0d/k 
subgroups 
142 
Figure 3.11   Contour plots indicating the frequency distributions of L0d/k 
subgroups 
143 
Figure 3.12 Contour plots of L0d1c split into two subgroups, L0d1c1 and 
the remaining L0d1c sequences (L0d1c-) 
144 
Figure 3.13   Mismatch distributions of L0d/k sub-haplogroups and 
comparative groups 
146 
Figure 3.14   Bayesian Skyline plots of haplogroups showing changes in Ne 
through time 
150 
Figure 3.15   L0d3 branch after adding comparative published sequences 159 
Figure 3.16   Principal component analysis of Fst values between different 
populations in the study group 
178 
Figure 3.17   Cluster analysis tree representing mitochondrial Fst values 
between different populations in the study group 
179 
Figure 3.18   Pairwise comparisons between physical geographic distance 
(X-axis) and mitochondrial Fst genetic distance (Y-axis) 
181 
Figure 3.19   Principal component analysis of L0d/k Fst values between 
different populations in the study group 
183 
Figure 3.20    Cluster analysis tree representing L0d/k Fst values between 
different populations in the study group 
184 
Figure 3.21   Principal component analysis of L0d Fst values between 
different populations in the study group 
186 
Figure 3.22   Cluster analysis tree representing L0d Fst values between 
different populations in the study group  
187 
Figure 3.23   Mismatch distributions of populations in the study group 191 
 
 
 
 
 
 
 xvi 
 
 
Page 
 
 
 
Figure 4.1     Y-chromosome haplogroup tree with nomenclature according 
to Karafet et al., (2008), listing haplogroup frequencies in the 
different populations in the study group 
198 
Figure 4.2   Graphical illustration of percentage Y-chromosome 
haplogroup assignment in the populations used in 
comparative population analysis 
199 
Figure 4.3   Contour plots indicating the frequency distributions of Y-
 chromosome haplogroups in the Khoe-San and Coloured 
populations 
199 
Figure 4.4   Neighbour Joining tree representing the substructure of 
Haplogroup A 
203 
Figure 4.5   Median joining network representing Haplogroup A 
substructure in the different populations of the study group  
204 
Figure 4.6   MDS plot visualizing the ??2 distance matrix for haplogroup A  205 
Figure 4.7   Neighbour Joining tree representing the substructure of 
Haplogroup B  
209 
Figure 4.8   Median joining network representing Haplogroup B 
substructure in the different populations of the study group 
210 
Figure 4.9   MDS plot visualizing the ??2 distance matrix for haplogroup B  211 
Figure 4.10   Neighbour Joining tree representing the substructure of 
Haplogroup E-M75 
214 
Figure 4.11   Median joining network representing Haplogroup E-M75 
substructure in the different populations of the study group 
215 
Figure 4.12   Neighbour Joining tree representing the substructure of 
Haplogroup E-M2 
217 
Figure 4.13   Median joining network representing Haplogroup E-M2 
substructure in the different populations of the study group 
218 
Figure 4.14   Neighbour Joining tree representing the substructure of 
Haplogroup E-M35 
220 
 
 
 
 xvii 
 
 
Page 
 
 
 
Figure 4.15   Median joining network representing Haplogroup E-M35 
substructure in the different populations of the study group 
221 
Figure 4.16   MDS plot visualizing the ??2 distance matrix for haplogroup E-
 M35  
222 
Figure 4.17   Neighbour Joining tree representing the substructure of 
Haplogroup R 
229 
Figure 4.18   Median joining network representing Haplogroup R 
substructure in the different populations of the study group 
230 
Figure 4.19   Principal Component Analysis of Y-chromosome Fst values 
between different populations in the study group 
233 
Figure 4.20   Principal Component Analysis of Y-chromosome Rst values 
between different populations in the study group 
234 
Figure 4.21   Cluster analysis tree representing Y-chromosome Fst values 
between different populations in the study group 
235 
Figure 4.22   Cluster analysis tree representing Y-chromosome Rst values 
between different populations in the study group 
237 
Figure 4.23   Pairwise comparisons between physical geographic distance 
(X-axis) and Y-chromosome Fst and Rst genetic distance (Y-
 axis) 
238 
Figure 4.24   Graphical illustration of percentage Y-chromosome haplotype 
for Khoe-San associated haplogroups in the Khoe-San and 
Coloured groups 
238 
Figure 4.25    Principal component analysis of Y-chromosome Rst values 
(excluding Eurasian and BS associated haplogroups) between 
Khoe-San and Coloured groups 
239 
Figure 4.26    Cluster analysis tree representing Y-chromosome Rst values 
(excluding Eurasian and BS associated haplogroups) between 
Khoe-San and Coloured groups 
240 
Figure 5.1   Scatter plot of heterozygosities in the 14 populations and the 
total sample set for each of the 100 sample sets 
247 
 xviii 
 
 
Page 
 
 
 
Figure 5.2   Correlation between heterozygosity and the variation 
observed between the 100 datasets 
247 
Figure 5.3    Averaged results of the Structure runs of the 100 different 
SNP sets 
253 
Figure 5.4   Triangle plot of individual cluster assignment at K=3 with the 
Khoe-San, non-African and BS associated clusters on the 
three different corners of the triangle 
255 
Figure 5.5   Graphical representation of the variation between the 
population cluster assignments across the 100 runs 
258 
Figure 5.6a   The Majority Rule consensus tree constructed from a 100 NJ 
trees  
261 
Figure 5.6b The consensus tree constructed from the average of 100 
distance matrices  
262 
Figure 5.7   Principal component analysis of autosomal genotypic 
distances between different populations in the study group  
263 
Figure 5.8   Principal component analysis of the average individual 
distance matrix 
266 
Figure 5.9   Pairwise comparisons between physical geographic distance 
(X-axis) and autosomal genotypic distance (Y-axis) 
269 
Figure 5.10   Bar charts of inferred haplotypes and their frequencies in each 
of the 14 populations 
273 
Figure 5.11   Principal Component Analysis of autosomal haplotype 
distance values between different populations in the study 
group 
275 
Figure 5.12   Principal Component Analysis of autosomal haplotype 
distance values between different individuals in the study 
group 
276 
Figure 5.13   Principal Component Analysis of autosomal representative 
haplotype distance values between different populations in the 
study group 
280 
 xix 
 
 
Page 
 
 
 
Figure 5.14   Cluster analysis tree illustrating autosomal representative 
haplotype distance values between different populations in the 
study group 
281 
Figure 5.15   Splits decomposition network showing the different trees that 
explain the relationships between the representative 
composite haplotypes of the different populations 
282 
Figure 5.16   Pairwise comparisons between physical geographic distance 
(X-axis) and autosomal haplotype genetic distance (Y-axis) 
284 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 xx 
LIST OF TABLES 
 
 
 
Page 
 
 
 
Table 1.1 Internal classification of southern African Khoisan linguistic 
group  
9 
 
Table 1.2   MtDNA haplogroup frequencies in San populations studied to 
date 
53 
 
Table 1.3 Published mtDNA sub-haplogroup frequencies in San 
populations as fractions of the total number of L0d/k 
haplotypes in the sample group 
55 
 
 
Table 1.4   Y-chromosome haplogroup frequencies (%) of Khoe-San 
populations studied to date 
68 
 
Table 2.1   Number of individuals in which mtDNA, Y-Chromosome and 
autosomal variation were examined, their group and group-
 code, and place of sampling and origin 
84 
 
 
Table 2.2    Primer sequences, binding sites, amplicon sizes and 
concentrations for multiplex PCR amplification of 6 fragments 
92 
 
Table 2.3    Minisequencing primers used to distinguish haplogroups L0-
 L6, M, N and R 
94 
 
Table 2.4    Chromatogram band profile for identifying haplogroups L0-L6, 
M, N and R 
95 
 
Table 2.5    Sequences of primers used to amplify and sequence HVS-I 
and II 
96 
 
Table 2.6    PCR ingredients and cycling conditions for amplification and 
sequencing of HVS-I and II. Final concentrations of 
ingredients are shown 
97 
 
 
Table 2.7    SNPs typed in RFLP assays to determine Y-chromosome 
haplogroup 
106 
 
Table 2.8    Conditions and concentrations used during Y-chromosome 
RFLP typing 
107 
 
 xxi 
 
 
Page 
 
 
 
Table 2.9   Information on the seven Y-chromosome minisequencing 
panels used to resolve haplogroups according to Figure 2.3 
112 
 
Table 2.10 Y-STR PCR Thermal Cycler Conditions 113 
Table 3.1   Results of the minisequencing screening and classification of 
699 sequences compared to classification based on HVS 
sequences 
129 
 
 
Table 3.2   TMRCA calculated for the L0d/k subgroups. Four different 
mutation rates are applied 
140 
 
Table 3.3   Mismatch distribution statistics (haplogroups) 145 
Table 3.4   Diversity statistics and neutrality tests of L0d/k subgroups and 
comparative haplogroups 
147 
 
Table 3.5   Mitochondrial population pairwise Fst values 177 
Table 3.6   Results from mitochondrial AMOVA analysis using different 
groupings on the first level 
188 
 
Table 3.7   Mismatch distribution statistics (Groups) 192 
Table 3.8    Diversity statistics and neutrality tests for populations in the 
study group 
193 
 
Table 4.1   Pairwise genetic distances between the 15 study groups 
calculated from Y-chromosome data 
232 
 
Table 4.2   Results from Y-chromosome AMOVA analysis using different 
groupings on the first level 
242 
 
Table 5.1   Average proportion of polymorphic loci, heterozygosities and 
gene diversities in each population over the 100 different SNP 
datasets 
246 
 
 
Table 5.2    Averaged population cluster assignments of the STRUCTURE 
runs from the 100 different SNP sets 
254 
 
Table 5.3   Average likelihood and delta-K scores across the 100 runs 255 
Table 5.4   Average population distance matrix of autosomal genotypic 
data 
260 
 
 
 
 
 xxii 
 
 
Page 
 
 
 
Table 5.5   Results from autosomal genotypic AMOVA analysis using 
different groupings on the first level 
269 
 
Table 5.6   Maximum composite likelihood population distances of 
individual haplotypes 
274 
 
Table 5.7   Maximum composite likelihood population distances of 
population representative haplotypes 
279 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 xxiii 
LIST OF ABBREVIATIONS 
 
AFE AFR + EUR 
AFR Afrikaner 
aLRT approximate likelihood-ratio test 
AMOVA  analysis of molecular variance 
ASD  average square distance 
Ave average 
BP before present  
bp  base pairs 
BS Bantu-speakers  
BSA  bovine serum albumin 
BSP Bayesian skyline plot  
CAC Cape Coloured 
CEPH Centre d'Etude du Polymorphisme Humain  
(Center for the Study of Human Polymorphisms) 
CI  confidence interval 
CKGR Central Kalahari Game Reserve  
CNC Northern Cape Coloured 
COL  Karoo Coloured 
CRS  Cambridge reference sequence 
ddH2O  deionised distilled water 
ddNTP  dideoxyribonucleotide triphosphate 
del deletion 
DNA  deoxyribonucleic acid 
dNTP  deoxyribonucleotide triphosphate 
DRC Manyanga 
DUM Duma San 
EDTA  ethylene-diamine-tetra-acetic acid 
ESA Earlier Stone Age  
EUR European 
F forward 
FNLA Frente Nacional de Liberta??o de Angola 
(National Front for the Liberation of Angola) 
FPO First Peoples of the Kalahari  
g  gram 
Gd Gene Diversity 
GTR general time-reversible 
GUG /Gui, //Gana and Kgalagari 
Hd Haplogroup Diversity 
HER Herero 
Het  Heterozygosity 
HG  haplogroup 
HGDDRU  Human Genomic Disease and Diversity Research Unit 
HGDP Human Genome Diversity Project 
HPLC high performance liquid chromatography  
 xxiv 
Ht haplotype 
HVS-I hypervariable segments I  
HVS-II hypervariable segments II 
IND  Indian 
ins insertion 
JOH Ju\?hoansi 
KAR Karretjie people 
kb  kilobase 
KHO ?Khomani 
km  kilometre 
KSC Khoe-San + Coloured 
KWE Khwe 
LGM Last Glacial Maximum 
LSA Later Stone Age  
m migraton rate 
M  molar 
MALDI-TOF  matrix-assisted laser desorption/ionisation-time of flight mass 
spectrometry 
Mb megabase 
MCMC Markov Chain Monte Carlo 
MDS multi dimentional scaling 
mg milligrams 
MgCl2  magnesium chloride 
min  minutes 
ml  millilitre 
mM  millimolar 
MP maximum parsimony 
MRC  Medical Research Council 
MRCA most recent common ancestor  
MSA Middle Stone Age  
mtDNA mitochondrial DNA  
n number 
NaCl sodium chloride 
NAM  Nama 
NAR Naro 
Ne effective population size 
NEAN Neanderthal 
ng nanogram 
NGO non-governmental organization 
NHLS National Health Laboratory Service 
NJ  neighbour joining 
NRF  National Research Foundation 
?C  degrees Centigrade 
OTH Other 
P probability 
PC principal component 
PCA principal component analysis  
PCR  polymerase chain reaction 
 xxv 
qt quartile 
R reverse 
r  correlation co-efficient 
RE  restriction enzyme 
RFLP restriction fragment length polymorphism 
RNA ribonucleic acid 
s seconds 
SA South Africa 
SADF South African Defense Force  
SASC South African San Council 
SDS sodium dodecyl sulfate 
SEB south-eastern Bantu-speakers  
Seq sequence 
SNP single nucleotide polymorphism 
SOT Sotho, Tswana 
SSD sum of squared differences 
STD standard deviation 
STR  short tandem repeat 
subHG  subhaplogroup 
SWB south-western Bantu-speakers  
SWZ Swazi 
T time 
Taq  Thermus aquaticus 
TBE tris borate-EDTA 
TE Tris EDTA 
TMRCA time to most recent common ancestor  
TOT total 
? mutation rate 
U  units 
?g  microgram 
?l microlitre 
?M micromolar 
UV ultraviolet 
v  version 
WIMSA Working Group of Indigenous Minorities in Southern Africa 
XEG //Xegwi 
XUN !Xun 
YAP  Y Alu polymorphism 
YCC  Y-chromosome Consortium 
ZUX Zulu, Xhosa 
pi Nucleotide Diversity 
? Tau 
AFE AFR + EUR 
AFR Afrikaner 
 
 
 
 xxvi 
Note on terminology adopted in thesis 
 
The term ?Khoisan? was first used by Leonard Schultze in 1928 (Schultze, 1928) and was 
intended to be used as a biological label. It was further popularised by Isaac Schapera in 
the 1930s (Schapera, 1930). The term has a collective meaning for two groups of people, 
the Khoi (old Nama word) or Khoe (modern Nama word), who were traditionally the 
pastoralist groups, and the San, who were hunter-gatherers. This grouping was introduced 
by European scholars who used mode of subsistence to distinguish the two groups. More 
recently this division has been challenged by present-day San and Khoe communities and 
there still debate as to whether this grouping presents a true reflection of subdivision. The 
word ?Khoi? or ?Khoe? means ?person? in Nama. Two surviving pastoralist groups, the 
Nama and Korana, use the word ?Khoenkhoen?, meaning ?people of the people?. The word 
?San? is the Khoe word for ?foragers? or ?bushmen? (Barnard, 1992).  
 
In 2002, at a meeting attended by the Working Group of Indigenous Minorities in Southern 
Africa (WIMSA) and the South African San Council (SASC), the San people decided that 
they wanted to be referred to by their individual community names (!Xun, ?Khomani, etc.) 
or collectively as San. When collectively referring to the San and the Khoe, the term Khoe-
 San was suggested (Crawhall, 2006). 
 
In this thesis individual groups will be referred to by their preferred community names. The 
application and spelling of the community names are in accordance with the usage in the 
book ?Voices of the San? (le Roux and White, 2004); a book compiled by young 
representatives from San communities. The collective word ?San? and ?Khoe? will be 
adopted for the traditional hunter-gatherers and pastoralist groups, respectively, while 
?Khoe-San? will be adopted for the Khoe and the San populations. When referring to the 
linguistic grouping, the term Khoisan-speaking will be adopted. The use of the word 
Khoisan is in no way meant to be derogatory and is used only for the sake of continuity 
with current linguistic classification. When referring to the sub-grouping of the Khoisan 
linguistic group the nomenclature suggested by G?ldemann (G?ldemann, In Press) was 
followed. 
 1 
1. INTRODUCTION 
 
Finally, we believe that identifying genetic differences between races and ethnic groups, ?, is 
scientifically appropriate. What is not scientific is a value system attached to any such findings. 
Great abuse has occurred in the past with notions of ?genetic superiority? of one particular group 
over another. The notion of superiority is not scientific, only political, and can only be used for 
political purpose? 
We need to value our diversity rather than fear it. Ignoring our differences, even with the best 
intensions, will ultimately lead to disservice of those who are in the minority. 
 
Neil Risch et al. (2002) 
Genome Biology 3:1-12 
 
The study of genetic differences between individuals has a variety of implications and 
benefits that influence how we see our past and shape our future. We are entering a new 
era where the field of medicine, involving prevention and treatment, are becoming 
increasingly more customizable at an individual level. Methods of identifying individual 
differences in disease susceptibility and individual responses to drug treatment are 
developing rapidly. Various studies have shown that the human population is not 
homogenous in terms of disease risk and response to treatment (Jorde et al., 2001; Risch 
et al., 2002; Bamshad et al., 2004). For the effective planning of prevention and treatment 
strategies the goal is to characterize risks both at individual as well as population levels.  
 
While the medical field is still developing and our knowledge and technology is not yet 
capable of individual based risk assessments, we are forced to rely on risk assessments 
within population groups. A ?race-neutral? approach in the biomedical field would not be 
advantageous to all groups of people. Instead, such an approach may in the end be 
disadvantageous to minority groups (Cavalli-Sforza et al., 1994; Jorde et al., 2001; Risch et 
al., 2002; Bamshad et al., 2004; Jobling et al., 2004a). Various studies have shown that 
there are genetic substructure in the human population and that individuals within a certain 
group are genetically more similar to each other than to individuals within another group 
(Cavalli-Sforza et al., 1994; Rosenberg et al., 2002; Jakobsson et al., 2008; Li et al., 2008; 
Tishkoff et al., 2009).   
 
Sub-structure within the human population has largely resulted as a consequence of 
genetic drift and migration of sub-groups of humans, which led to isolation. The isolation 
 2 
between sub-groups caused non-random mating which in turn resulted in genetic 
divergence. The field of human evolutionary genetics studies these genetic differences in 
order to unravel the history of humans. By employing different molecular genetic 
techniques, population subdivision, population expansion dynamics and human migration 
patterns are investigated.  
 
There is only one true history of humankind and scholars have adopted several methods to 
reconstruct this past. In addition to molecular evolutionary genetic approaches, various 
other fields have been, and are still, actively studying human history and evolution. History 
in the form of recorded text goes back only as far as 4 000 years before present (BP). To 
study history older than this, other methods of investigation are required. Historical 
linguistics investigates the history of languages and their relationships to one another. 
Languages spoken by different groups of people retain evidence of their origin and are 
related to other languages in a measurable fashion. Language, however, also has a 
relative shallow time-history and linguists have suggested that languages do not retain 
evidence of their origin for more than 10 000 years (Jobling et al., 2004a). Archaeology has 
a greater time depth and studies human history captured in physical remains, such as 
bones, stone tools, pottery, waste deposits and dwellings left over by past human groups. 
Palaeontology investigates the very deep ancestors of humans by investigating fossilized 
remains. The use of molecular genetics is a recent addition to the methods of studying 
human history (Cavalli-Sforza et al., 1994; Jobling et al., 2004a). Within the present thesis 
various molecular genetic markers and analyses techniques are utilized to aid in the 
inference of African history. 
 
The first study that illustrated genetic differentiation between groups was a study on the 
ABO blood groups at the beginning of the 20th century (Landsteiner, 1901). The magnitude 
of this genetic variation only became apparent in the 1950s to 1960s when individual 
differences in proteins could be systematically studied (Cavalli-Sforza et al., 1994). The 
study of protein variation was merely the beginning. When analysis methods for the 
hereditary material itself, DNA, became available, genetic variation could be studied 
directly and the field of evolutionary genetics expanded rapidly (Cavalli-Sforza et al., 1994; 
Jobling et al., 2004a). 
 3 
 
Until recently, most studies that investigated the origin and dispersal of anatomically 
modern humans concentrated on two haploid compartments of the human genome, 
namely, the mitochondrial DNA and the Y-chromosome (Jobling and Tyler-Smith, 2000; 
Jobling and Tyler-Smith, 2003; Forster, 2004; Torroni et al., 2006; Underhill and Kivisild, 
2007). A few studies did investigate autosomal variation.  These studies, however, were 
usually on particular genes that were under investigation due to their influence on a specific 
phenotypic property or disease risk. The variation therefore would have been subject to 
selection pressures.  Recent advances in the human genome project have allowed us 
access to large amounts of information on neutral genetic variation that would give a more 
complete insight into human evolutionary history (Cavalli-Sforza, 1998; Przeworski et al., 
2000; Garrigan and Hammer, 2006). In this thesis neutral autosomal variation as well as 
the haploid mitochondrial genome and Y-chromosome were used in a three-pronged 
approach to study the evolutionary history of selected groups of southern African 
individuals. 
 
All studies to date provide substantial support for an African origin of modern humans. The 
greatest genetic variation is present within African populations and variation outside of 
Africa is a subset of the African diversity (Jobling and Tyler-Smith, 2003; Garrigan and 
Hammer, 2006; Torroni et al., 2006; Underhill and Kivisild, 2007). Africa has remarkable 
cultural, linguistic and genetic diversity and more than 2 000 distinct ethnic groups and 
languages exist on the continent (Gordon, 2005). Despite the pivotal role that Africa has 
played in the evolution of humankind and main residence of Homo sapiens for most of their 
existence, the history and population dynamics within the continent remain poorly 
understood. The present thesis try to contribute to the understanding of the history of the 
African continent by using molecular markers in selected groups of aboriginal human 
inhabitants of southern Africa.  
 
The majority of sub-Saharan Africans (>200 million people) speak one of ~500 very closely 
related languages, even though they are distributed over an area of ~500 000 km2. These 
languages are collectively referred to as Bantu languages, based on the word meaning 
 4 
people (Bleek, 1862). The current distribution of these groups is largely a consequence of 
the movement of people (demic diffusion) rather than a diffusion of only language (Ehret 
and Posnansky, 1982; Huffman, 1982). This expansion is commonly referred to as the 
Bantu Expansion (Greenberg, 1963) and is thought to be due to the development and 
spread of agriculture and, possibly, the use of iron (Greenberg, 1972; Phillipson, 1993; 
Newman, 1995). The Bantu Expansion began ~3 000 - 5 000 years BP (Ehret, 1982; 
Vansina, 1990) and originated in the Cross River Valley, in the region of current eastern 
Nigeria and western Cameroon (Johnston, 1913; Greenberg, 1972; Huffman, 1982; Vogel, 
1994). 
 
To a certain extent the expansions of Bantu-speaking groups masked the earlier history of 
non-Bantu-speaking African populations. Groups that existed all over the African continent 
before the Bantu-expansions were replaced and/or assimilated by the Bantu-speaking 
groups. Traces of these pre-Bantu groups might still be found in the genetic variation, 
language and cultural practices of various Bantu-speaking groups where they have been 
incorporated/ assimilated. Very few sub-Saharan African ethnic groups have retained a 
cultural, linguistic and genetic identity that distinguishes them from the Bantu-speaking 
groups. Examples of such groups of people are the Hadza and Sandawe from East Africa, 
the Khoe-San populations from southern Africa and the Pygmy populations from central 
Africa. These populations (excluding the Khoe) did not adopt an agricultural lifestyle but 
instead kept a hunter-gatherer lifestyle. Their cultural practices, lifestyle and language (for 
the Khoe-San, Hadza and Sandawe) distinguish them from Bantu-speakers.  
 
This distinction is also visible in the comparative genetic analysis of these populations in 
relation to the Bantu-speakers. In both Y-chromosome and mitochondrial DNA studies, 
these populations tend to carry unique and older lineages than the lineages associated 
with the Bantu-speaking people. In fact the deepest clades known among modern humans 
for both the Y-chromosome and mitochondria are found commonly and at their highest 
frequencies in the Khoe-San people (Behar et al., 2008; Karafet et al., 2008). Additionally, 
in autosomal studies Khoe-San people group in a distinct cluster from that of Bantu-
 5 
speakers (Cavalli-Sforza et al., 1994; Rosenberg et al., 2002; Jakobsson et al., 2008; Li et 
al., 2008; Tishkoff et al., 2009). Thus, these unique relict populations of hunter-gatherers 
who carry genetic variation belonging to the deepest clades known among modern humans 
are crucial links to the past. It is important to extensively study their genetic contribution to 
the human gene pool. This is becoming increasingly difficult as the Khoe-San groups are 
losing their cultural identities, lifestyles and languages and are integrating into surrounding 
groups. In the current thesis the genetic variation from various Khoe-San groups are 
examined and analysed using multiple methodologies. The analyses are used to make 
inferences about the relatedness of the different Khoe-San groups, their affinities to 
neighboring groups and their place in African history.   
 
To fully understand and interpret the genetic relatedness between the different Khoe-San 
groups included in this study it is important to review their present geographical distribution 
and demographics. Furthermore one must consider their relationship to neighbouring 
Khoe-San groups and neighbours from other population groups. Another important factor 
to take into consideration is the classification system used to classify the various Khoe-San 
groups. The following sections will review and summarise these different aspects. 
 
1.1 Khoe-San today  
 
The Khoe-San people of southern Africa consist of a collection of small diverse groups of 
people who share common cultural, linguistic and genetic features. Some of the groups are 
pastoralists, while others are hunter-gatherers or fishermen. Most Khoe-San individuals 
today, however, work as herdsman or laborers for members of other ethnic groups 
(Barnard, 1992; Smith et al., 2000; le Roux and White, 2004). 
 
Almost all Khoe-San groups are affected by social ills such as economic dependency, 
alcoholism, malnutrition, and societal breakdown. Many of these problems are because 
policies regarding the Khoe-San were developed without their participation and the 
recognition of their cultural legitimacy. On a continent that was and still is being rapidly 
colonized for its resources, their egalitarian values have left the San groups especially 
 6 
vulnerable. With their land and food resources been taken away by surrounding groups 
and governments, their freedom has been restricted and their cultures and traditions have 
deteriorated. Only now, have certain groups began to reclaim their culture and basic 
human rights. This happened after the international community brought attention to the 
struggle of these marginalized indigenous people. Today, various organisations represent, 
handle and concentrate on land rights and ownership, political recognition and 
representation, and cultural rights and development projects involving Khoe and San 
groups (Broyhill et al., Current). 
 
Different San and Khoe groups are distributed throughout southern Africa where they live 
among and to some extent are admixed with the various Bantu-speaking populations 
surrounding them (See Figure 1.1) (Barnard, 1992; Smith et al., 2000; le Roux and White, 
2004). Today, the greatest proportion and the largest diversity of Khoe-San people can be 
found in Botswana followed by Namibia. Small groups of San people are also found in the 
southern parts of Angola and to a lesser extent southern Zambia and eastern Zimbabwe. 
The San people of South Africa have to a large extent lost their identities and have 
integrated into other populations. The Khoe groups still extant today are mainly found in 
Namibia while descendants of various mixed Khoe and San groups found in South Africa 
are known as the Coloured population (Barnard, 1992; Smith et al., 2000; le Roux and 
White, 2004). 
 
 
 
 
 
 
 
 
 
 
 
 
 7 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 1.1  Map indicating the current distribution of Khoe-San groups 
 8 
1.1.1 Group classification 
 
To classify Khoe-San groups into their individual ethnic groups is, in many ways, 
problematic. Different words and spellings have been used to refer to the same groups of 
people over the years. Linguistic classification is the method most commonly used to 
identify different groups.  
 
As mentioned previously, historical inference based on language has a shallow time depth, 
maximum 10 000 years. These are very short times relative to historical inferences made 
from investigating genetic lineages such as mitochondrial genomes and Y-chromosomes. 
Relationships between these molecular markers can go back to over 100 000 years. When 
a hierarchical classification of possibly related groups such as the Khoe-San is made using 
a linguistic system, it will not necessarily reflect group classifications that can be made 
based on genetic information. It is one of the aims of this thesis to see if some of the group 
relationships inferred from linguistics can also be observed in the genetic analysis of Khoe-
 San groups. It is therefore necessary to first review the linguistic classification of the 
different Khoe-San groups, investigate how this classification is used to infer the 
relatedness between the different Khoe-San groups and finally how linguistics are used to 
infer the history of Khoe-San groups.  
 
Table 1.1 shows the linguistic groupings (G?ldemann, In Press) and Figure 1.2 the 
historical geographic spread of the Khoe-San groups based on languages and dialects. 
 
The main Khoe-San language families include Ju (Northern Khoisan), Khoe-Kwadi (Central 
Khoisan) and Tuu (Southern Khoisan). The Khoe-Kwadi group includes Kwadi, the extinct 
language of Angola and the Khoe language branch. The Khoe language branch includes 
the people, known commonly today, as the Khoe (linguistic branch ?KhoeKhoe?) as well as 
the San groups that speak languages more closely related to Khoe languages than to other 
San languages (linguistic branch ?Kalahari?) (G?ldemann, In Press) (Table 1.1 and Figure 
1.2).  
 
 
 9 
 
 
 
 
 
Table 1.1 Internal classification of southern African Khoisan linguistic group (G?ldemann, In preparation) 
Lineages and branches Languages and dialects Remarks 
   
Ju-?H?a 
  
     ?H?a Single language Newly affiliated to Ju 
     Ju (= Northern Khoisan)   
             Northwest !'O!X?u, !X?u  
             Southeast Ju/'hoan, ?Kx'au//'e  
 
  
Khoe-Kwadi 
 Possibly related to Sandawe 
     Kwadi Single language Newly affiliated to Khoe 
     Khoe (= Central Khoisan)   
         KhoeKhoe   
             North Eini, Nama-Damara, Hai//om  
             South !Ora, Cape varieties  
         Kalahari   
             East   
                  Shua Cara, Deti, /Xaise, Danisi, Tsixa, etc.  
                  Tshwa Kua, Cua, Tsua, etc.  
             West   
                  Kxoe Khwe, //Ani, Buga, G/anda, etc.  
                  G//ana G//ana, G/ui, ?Haba, etc.  
                  Naro Naro, etc.  
   
   
Tuu (= Southern Khoisan) 
  
     Taa-Lower Nossob   
         Taa   
             West N/u//'en, West !X?o  
             East  'N/ohan, N/amani, East !X?o, Kakia  
         Lower Nossob /'Auni, /Haasi  
     !Ui N//ng; ?Ungkue; /Xam; //Xegwi  
   
 
Bold ? Independent lineage;  Underlined ? Earlier classification unit 
 
 
 
 
 
 
 
 10 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 1.2  Map representing the historical geographic spread of the Khoe-San according to their 
language groups 
 11 
Ju groups are linguistically split into a Northwest and Southeast division (Table 1.1). The 
Northwest groups include the !Xun groups of Angola and Northern Namibia while the 
southwest groups include the Ju\?hoansi of northern Botswana and northeastern Namibia 
and the ?X?ao//??esi (Auen) of western Botswana and northeastern Namibia. The ?H??, is 
a south-eastern Botswana San group that may represent a linguistic intermediary of Ju and 
Tuu  speakers (G?ldemann, In Press) (Figure 1.2). 
 
The only distinct Khoe group (speaking the KhoeKhoe language grouping) living today is 
the Nama of Namibia. The Korana (!Ora) and Cape Khoe (Cape KhoeKhoe) of South 
Africa represent extinct groupings of Khoe language and culture but their descendants live 
in the Coloured population of South Africa (Figure 1.2). The Hai\\om of north Namibia also 
speak a KhoeKhoe language, however, this group is thought to have originated as result of 
contact between the Nama and the !Xun of northern Namibia (Barnard, 1992; Smith et al., 
2000; le Roux and White, 2004).  
 
The Khoe-speaking San groups (speaking the Kalahari Khoe language grouping) are the 
most numerous and culturally diverse of the San language groups (Table 1.1). They inhabit 
the central and northern parts of Botswana, including the central Kalahari Desert and 
Okavango swamps, the southern parts of Angola and the Caprivi Strip of Namibia (Figure 
1.2). Groups included into this language group are the Naro of western Botswana, the /Gui, 
//Gana and Deti of central Botswana, the ?river Bushmen? of northern Botswana and 
southern Angola (the different Khwe groups), the Tshua and Shua of eastern Botswana 
and the Tyua of western Zimbabwe (Figure 1.2) (Barnard, 1992; Smith et al., 2000; le Roux 
and White, 2004).  
 
The Tuu language branch is divided into three groups, namely, the Taa, Lower Nossob and 
!Ui (Table 1.1). Most of the groups belonging to the Tuu language division have lost their 
language and cultural identity completely but their descendents are found in other 
population groups and in many cases classify themselves as ?Coloured?. The ?Lower 
Nossob? language group is extinct and the only remaining Taa group is the !X??, who live 
in the south central Kalahari of Botswana. Remaining !Ui (!Wi) speakers consist of a few 
remaining San groups that are geographically scattered throughout South Africa. What is 
 12 
presently known about the San peoples of South Africa derived from studies on very few 
remnant populations that survived into the 1700-1800s. These include the //Xegwi who 
lived in the Lake Chrissie area of the now Mpumalanga province of South Africa, the 
?mountain Bushman? or N//? or ?People of the Eland? who lived in Lesotho, KwaZulu Natal 
and the Eastern Cape, the ?Khomani who lived in the northern part of the Northern Cape 
province that borders with Botswana and Namibia (roughly where the Kalahari Gemsbok 
park is located now) and the /Xam who occupied the Karoo area of the Western, Northern 
and Eastern Cape provinces (Barnard, 1992; Smith et al., 2000; le Roux and White, 2004); 
(Traill, 1973; Westphal, 1974); (G?ldemann, In Press). 
 
In the following sections the individual San and Khoe populations within the linguistic 
groupings will be described based on their identity, geographic spread, demography and 
what is known about their history. 
 
1.1.1.1 The Ju  
The three main ethno-linguistic groups of the Ju; the !Xun, the Ju\?hoansi (meaning ?real 
people?) and ?X?ao//??esi (also called Auen or linguistic-branch ?Kx?au//?e) correspond to 
indigenously defined dialects that also parallel three different cultural units and geographic 
areas (Figure 1.2 and Table 1.1). The word !Xun (or the different spelling !Kung) has been 
widely used to describe all three of these groups, however, the only group that uses the 
term as self-identification are the !Xun groups of Angola and northern Namibia (!x? is a 
word indicating ?person? in !Xun languages). The three groups together are estimated to 
comprise 25 000 to 30 000 individuals (Marshall and Ritchie, 1984; Gordon, 1986). The 
largest group is the central Ju\?hoansi while the northern !Xun is distributed over a larger 
geographic area (Gordon, 1984; Barnard, 1988). 
 
1.1.1.1.1 The !Xun 
The northern !Xun do not live in the Kalahari like the other two groups but rather in the 
forested areas of southern Angola and northern Namibia. Their self-designation is !o !x? 
which means ?forest people? (Bleek, 1928). Two groups found in Angola are known locally 
as Kwankala (Vakwankala) and Sekele (Vasekele) (De Almeida, 1965). In the local Bantu-
 speaking languages these names have derogatory connotations (meaning poor uncivilized 
 13 
wanderers) and are not used anymore. The !Xun lived in close association with the local 
Ambo (Ovambo) population for centuries. It is through this association that the !Xun 
learned crop cultivation, herding and fishing with nets and spears. In the 1950s very few 
groups still followed a foraging lifestyle supplemented with assisting Bantu-speakers in the 
winter harvests in exchange for grain. In 1970-1980 Angola was a battleground between 
the government and guerrillas. Since then no ethnographic studies have been conducted to 
assess the extent of damage the war has had on the !Xun way of life (De Almeida, 1965; 
Barnard, 1992). 
 
1.1.1.1.2 The Ju\?hoansi 
The central Ju\?hoansi groups occupies areas with a large supply of water and plant 
resources. The area has over a hundred edible plants, the most important among these the 
Mongogo nut, a nutritious nut that can be gathered virtually the whole year. Bands of 
people usually camp out near permanent waterholes and Mongogo groves. In the past they 
only camped out during the dry winter and moved away during the wet season to exploit 
other territories.  In Botswana, however, over the past century, groups have increased their 
time camping out. Today most groups have settled at the waterholes, and depend on 
Herero and Tswana residents for their livelihood. Development projects, including schools 
and handicraft tourist shops, were implemented by the Botswana government and 
anthropologists.  In Namibia a ?homeland? reserve for the Ju\?hoansi (Bushmanland) was 
established and a school and administrative camp were built at Tsumkwe. In 1978 the 
South African Defense Force (SADF) built a military base at Tsumkwe and recruited 
Ju\?hoansi soldiers. Many families lived off the earnings from the military base. Traditional 
subsistence techniques started vanishing because of this and the fact that the reserve was 
too small to support the number of people. Anthropologists were partially successful in 
encouraging them to adopt cattle husbandry in the reserve but met with opposition from 
wildlife officials (Marshall, 1960; Lee, 1979; Guenther, 1986; Barnard, 1992). 
 
1.1.1.1.3 The ?X?ao//??esi 
The ?X?ao//??esi (Auen) occupy a region in Botswana that overlaps with another San 
group, the Khoe-speaking Naro San group. This land is also shared with Bantu-speaking 
Tswana and to a lesser extent Hereros. White ranchers, mainly Afrikaners, own most of the 
 14 
land. The white settlers arrived in 1897 from the Cape colony and occupy what is known as 
the Ghanzi farm block of western Botswana. Linguistic and anthropological evidence 
suggest ancient contact between the Khoe-speaking Naro and the ?X?ao//??esi. The 
direction of borrowing seems to be from the ?X?ao//??esi to the Naro. Today, overpopulation 
of the area by humans and livestock prevents traditional hunting and gathering practices. 
Many of the families have settled in towns and on ranches where they are laborers or they 
act as tourist attractions in exchange for the permission to use the land for gathering 
practices. They also earn small salaries and tips from tourists (Marshall, 1960; Barnard, 
1992). 
 
1.1.1.2 Khoe-speaking San groups 
The Khoe speaking San groups speak languages (linguistic grouping Kalahari Khoe) that 
are more closely related to KhoeKhoe languages than to other San languages. This 
relationship, however, is distant. The Khoe speaking San groups are distributed over most 
of Botswana and their regions overlap with some of the other San groups. 
 
1.1.1.2.1 The Tshua and Shua of eastern Botswana 
The Tshua (south) and the Shua (north) consists of a number of scattered groups 
distributed over a large area, from the Kweneng district in the southeast of Botswana to the 
Ngamiland district in the northeast. They have lived in close association with Bantu-
 speaking groups including the Tswana, Kgalagari and Kalanga (closely related to the 
Shona of Zimbabwe) for over a century. The groups have various names for self 
identification (Tshua, Hietshware, Kua, Shua, Ts?ixa, Danisi, Deti) but all speak the eastern 
Khoe dialect (Table 1.1) where the word Tshua or Shua is used to refer to a ?person? rather 
than the word Khoe. These eastern Khoe-speaking San are herders and cultivators as well 
as hunters. They also engage in extensive trade activities and ?contract work? with 
neighbouring Bantu-speakers. This contract work entails an agreement, (known locally as a 
?mafisa? relationship) between a local Bantu-speaking tribe and a San group. In this mafisa 
relationship the San group will look after the cattle of the Bantu-speaking group and in 
return have the right to the milk, meat in case of an accidental death, the right to use them 
in ploughing and in some cases the right to keep the calves. Due to these relationships 
 15 
many of the San groups of eastern Botswana settled at cattle posts (Dornan, 1975; 
Barnard, 1992). 
 
1.1.1.2.2 The Khwe of northern Botswana and southern Angola 
The Khoe-speaking San of northern Botswana, southern Angola and western Zimbabwe 
comprise the various Khwe (linguistic grouping - Kxoe) groups (including the Bugakhwe 
and //Anikhwe) (Table 1.1). They live in the Okavango swamp area and surrounding 
regions. This area is infested by tsetse flies; as a result livestock rearing is not viable. They 
sustain themselves through fishing as well as hunting and gathering. Linguistically, they are 
closer to the central Khoe speaking San than the eastern groups. Phenotypically, however, 
they resemble Bantu-speakers and genetic evidence also suggests a genetic makeup 
similar to the Bantu-speaking populations that surround them (Nurse and Jenkins, 1977; 
Cashdan, 1986). They share their territory with various Bantu-speakers including the 
Mbukushu (cultivators), the Yei (fishermen) and to a lesser extent the Tswana, Kgalagari 
and Herero herders. Each group operates in a different ecological niche. The San groups 
are concentrated on the banks of the Okavango River and the delta area as their informal 
name ?river Bushmen? implies (Barnard, 1992). It is not clear whether these northern Khoe 
speaking San groups are Khoe-San groups with extensive Bantu-speaking admixture, 
Bantu-speakers that lost their cattle, another pastoralist population closely related to Bantu-
 speakers who occupied the region before the Bantu expansions or maybe a mixture of 
various refugee groups driven from the grazing grounds into the Okavango swamps 
(Cashdan, 1986). 
 
1.1.1.2.3 The /Gui and //Gana of the central Kalahari 
The /Gui and //Gana groups lived in an area now occupied by the Central Kalahari Game 
Reserve (CKGR) in central Botswana. /Gui has no specific meaning other than the 
reference to the group while //Gana is derived from a word that means ?people of the well?. 
The /Gui and //Gana also shared the CKGR territory with the Kgalagari. The Kgalagari are 
the oldest existing Bantu-speaking tribe in Botswana. //Gana individuals all tend to speak 
Kgalagari as well as their own language and it is believed by the //Gana themselves that 
they originated from a intermixing of the /Gui and the Kgalagari. The /Gui occupied the 
region adjacent to the western CKGR as well as the western part of the CKGR and //Gana 
 16 
the central and eastern part as well as the region adjacent to the eastern CKGR. The 
CKGR was established in 1961 and extends over 52 600 square kilometers. Only the 
southern (wooded zone) and central (bushveld) parts have enough vegetation to support 
human occupation. The central part is good hunting territory. From the 1960s to the 1980s 
the population in the CKGR declined from 2 000 to approximately 1 000 individuals. The 
Ghanzi district commissioner George Silberbauer studied the /Gui and //Gana groups 
extensively and constructed a borehole in the south central parts of the CKGR near the 
?Xade pan. Subsequently ?Xade became a settlement with permanent occupation which 
grew from ~200 in the 1960s to ~700 in the late 1970s. In the late 1970s the people of 
?Xade were taught subsistent farming practices but with little available water this was not a 
successful strategy. The introduction of farming led to an increased number of livestock 
such as horses, donkeys and goats, which put further pressure on water supplies. Hunting 
on horseback and donkeys also ensued which caused a decline in large game and 
attracted the attention of wildlife park officials (Silberbauer, 1965; Barnard, 1992). A 
compromise was reached in which the San groups may stay as long as they only used 
traditional means of hunting.  
 
In 1986 the government decided that the CKGR should strictly be a wildlife reserve and 
that residents should be relocated. San groups wished to stay in the reserve and proposed 
to work with park officials to sort out problems. This was declined and the resistance to 
resettlement was met with threats from the government and discontinuation of services. In 
1997 the people of the CKGR were resettled from ?Xade in the Central Kalahari Game 
Reserve to New ?Xade, a large settlement in Ghanzi District, southwest of the reserve, and 
Kaudwane, a large settlement in Kweneng District not far from Khutse Game Reserve. 
Promises of large compensation to people that move soon were made. In reality very little 
compensation was paid-out and people struggled to keep their livelihoods. A San run NGO, 
First peoples of the Kalahari (FPO), worked with CKGR residents and took the Botswana 
government to court. In 2005, the government ruled that the CKGR was off limits to people 
even though some residents still lived there. San people trying to access the CKGR were 
shot at by government officials with teargas and rubber bullets, some individuals were 
injured, arrested and detained. In 2006 the final decision of the court was that San groups 
were unlawfully removed.  The government, however, was not required to restore services 
 17 
because it was not unlawful for them to have stopped these services. At the end of 2006 
San groups were allowed to return but without any domestic stock. They are only allowed 
to live from hunting and gathering practices. Hunting licenses, however, are still not issued 
and people are living mainly of wild foods from the reserve and food they obtain from 
outside (Broyhill et al., Current). 
 
1.1.1.2.4 The Naro 
The Naro live in the western parts of Botswana with the !X?? to the south and the 
?X?ao//??esi to the north. They are the most numerous of the San groups and are estimated 
to be one fifth of the total San population. In the 1980s they numbered approximately 9 000 
individuals; ~5 000 in Botswana and ~4 000 in Namibia (Barnard, 1992). Since the late 
1800s the Naro shared a large part of their eastern territory (Ghanzi block) with white and 
recently, black ranchers. Southwest of the Ghanzi farm block their territory overlaps with 
the Xanagas farm block where ranches are mostly owned by individuals of mixed white-
 black ancestry and also mixed Nama ancestry. Other areas south of the farm blocks and 
in-between the blocks are shared between the Naro and Tswana, Kgalagari and Herero 
subsistence herders. The Kgalagari entered the area in the early 19th century while the 
Tswana and Herero have migrated there since the settlement of the white ranchers. A few 
small San groups (Ts?aokhoe, Qabekhoe, N/haints?e and ?Haba) that are not Naro live in 
the northern parts of the Ghanzi block. They are linguistically grouped with northern and 
central Khoe speaking San groups with some linguistic similarities to the Naro. There is 
very little information available on these smaller groups (Guenther, 1986; Barnard, 1992; 
Guenther, 1996).  
 
The areas occupied by the Naro have a relatively good water supply. Because of the 
ranches, however, the majority of the Naro have settled permanently at ranch boreholes, 
cattle-herding posts and towns. They supplement their traditional livelihood with herding, 
mafisa relationships and wage labor. They also act as tourist attractions on game farms in 
exchange for permission to use the land for gathering practices. They earn small salaries 
from ranch owners and/or tips from tourists. Some settled in the outskirts of the towns like 
Ghanzi and D?Kar and government settlement schemes such as the settlement at Hanahai. 
Unemployment and alcohol abuse is a big problem. The general mood under the Naro is 
 18 
one of powerlessness, despair and deprivation. They lost their land and dignity and see 
themselves as weaker and less intelligent than surrounding groups. Names for themselves 
include ?voiceless people? and ?rubbish people?, but they still take pride in their language 
and traditions such as the trance-dance (Guenther, 1986; Barnard, 1992; Guenther, 1996) 
(Personal observation). 
 
1.1.1.2.5 The Hai\\om 
The Hai\\om live in the northern parts of Namibia in the areas around the Etosha pan. Their 
name means ?tree? or ?bush-sleepers?. They speak a language closely related to Nama 
and have been classified as !Xun who acquired the Nama language. The Hai\\om 
themselves, however, maintain to be a separate group with a separate group identity 
(Barnard, 1992). 
 
1.1.1.3 The Kwadi 
Very little is known about the Kwadi people of Angola. Their language is now extinct and 
the people have to a large extent integrated into surrounding groups. Records of their 
language suggest that they did speak a Khoe-related language (Table 1.1). Their 
language, however, were very distantly related to the extant Khoe languages as well as to 
the Khoe-speaking San languages. The closest language to Kwadi is geographically the 
most distant, namely, one of the eastern Khoe-speaking San group languages spoken by 
the Hietshware. Not much research has been done on the Kwadi but they seem to have 
been a large group of people in the past. They were mentioned in accounts of various 
navigators, historians and adventurers from the 16th to the 19th century. All mentioned the 
group of San people that lived near the mouth of the Curoca River. In the 1930 they were 
reported to be dying out and integrating into surrounding groups. In the 1950s only a few 
families remained. Their current status is unknown (De Almeida, 1965; Estermann, 1976; 
Barnard, 1992). 
 
1.1.1.4 The Khoe 
The Khoe can be divided into three ethnic divisions, namely, the !Ora (or Korana), the 
Cape Khoe and the Nama (Figure 1.2). Early reports also made mention of a fourth 
division, the Einiqua (language ? ?Eini?) that lived along the Orange River to the east of the 
 19 
Korana, but very little is known about this group (Figure 1.2) (Schapera, 1930; Elphick, 
1985; Barnard, 1992; Smith, 1995). 
 
1.1.1.4.1 The Korana 
The Korana (!Ora) were pastoralists that occupied much of the Karoo area of the Cape 
province but their descendants became absorbed / transformed into the Baster, Griqua and 
Coloured population of the area. Their early raiding activities were, however, recorded and 
remnants of their cultural practices survived into the 20th century (Schapera, 1930; 
Engelbrecht, 1936; Barnard, 1992). It is widely assumed that an essential proportion of the 
!Ora group came from the Cape Khoe people (see below) who fled from European 
colonization from the 1600?s onwards. These fleeing Cape Khoe met and mixed with other 
people on their way and finally settled at the confluence of the Vaal and Orange Rivers 
where they also had contact with North KhoeKhoe-speaking pastoralists like the Nama of 
the Lower Orange and the Eini to the east (G?ldemann, 2006b). 
 
1.1.1.4.2 The Cape Khoe 
The Cape Khoe was the pastoral population encountered by the first white settlers in 1652 
at the Cape of Good Hope. They were spread over the southern parts of the Cape 
Province and three subdivisions were distinguishable, namely, the Eastern, Central and 
Western Cape Khoe. Periods of warfare between the Cape Khoe and the white settlers 
ensued but their final cultural collapse took place shortly after an overwhelming smallpox 
epidemic in 1713. Today their descendants are found among the ?Coloured? population of 
the Cape province (Elphick, 1985; Barnard, 1992). 
 
1.1.1.4.3 The Nama 
The Nama are the best-known Khoe group. Today around 90 000 Nama individuals live in 
south and central Namibia, and to a lesser extent in the northern Cape (SA) and eastern 
parts of Botswana. The Nama people most probably came from an area located in the 
current northern parts of the Cape province (SA) and divided into two large subdivisions of 
people, the Great and the Little Nama (Westphal, 1963; Hoernle, 1985; Barnard, 1992).  
 
 20 
The Great Nama (Gai-Naman) settled in the great Namakwaland area of Namibia prior to 
European contact. Several tribes existed with certain associated territories. In recorded 
history the Great Nama were divided into seven tribes (the Gai-//haun or Rooi Nasie; the 
!Gami-?n?n or Bondelswarts; the //Haboben or Veldskoendraers; the !Khara-khoen or 
Kopers; the //Khau-/g?an or Swartboois; the //?-gain or Groot Doden; the ?Aonin or 
Topnaars) (Westphal, 1963; Hoernle, 1985; Barnard, 1992). The Nama presently use 
mainly the Afrikaans group delineations (italic).  
 
The Little Nama (?Kham-Naman) only migrated into Namibia in the 19th century in separate 
tribal groups. They were also known collectively as the ?incoming groups? and the 
?Oorlams?. The Little Nama tribes were the /H?a-/aran or Afrikaners; the /Khobesin or 
Witboois; the !Aman or Bethaniers; the /Hai-khauan or Bersebaers and the Gai-/khauan or 
Lamberts or Amraals. These Little Nama tribes came from the south in search for better 
grazing but met with the Great Nama and Herero that were already there and conflicts 
developed. The Nama, who remained south of the Orange River, became incorporated into 
the ?Coloured? population of South Africa (Westphal, 1963; Hoernle, 1985; Barnard, 1992).  
 
The Nama lived a nomadic life and were pastoralists. With the incursion of Bantu-speakers 
and Europeans into their territory, their tribal organization shifted from hereditary chiefs to 
military leaders and chiefs. Early forms of tribal organization and social structure quickly 
deteriorated with German colonization in 1890. Additional factors include a severe drought 
and a rinderpest epidemic. The Nama revolt and resultant wars (1904-7) finally broke up 
traditional tribal structure. Although the tribes are dispersed today there are still some 
chiefs that maintain control over their traditional locations (Westphal, 1963; Hoernle, 1985; 
Barnard, 1992). 
 
1.1.1.5 The !X?? and the ?H?? (Tuu division) 
The !X?? belong to the Southern Khoe-San language division (Tuu division, Taa branch). 
They identify themselves by a variety of names, !X?? is the most widely used. The !X?? 
consist of widely scattered groups of people that live in the southern parts of Botswana in 
one of the poorest environments of the Kalahari. Game and plant foods are sparse and 
permanent waterholes are few. The people, however, have extensive knowledge of their 
 21 
environment and are able to identify and utilize over 200 plant species. Today the eastern 
parts of their territory have ample water supplies due to boreholes associated with the 
trans-Kalahari highway that runs through the area. A development project at Bere (south of 
Takathswaane) involved the construction of a borehole, shop, school and projects with 
guidance in livestock rearing. Many !X?? people moved there and have settled around this 
area (Barnard, 1992). 
 
The ?H?? are distantly related to the !X?? but they live in close proximity to them. Their 
language is thought to be an intermediate between Ju and Tuu. Their region is also shared 
by the Kgalagari herders who have been in the area for centuries and Nama individuals 
from Namibia (refugees from the time of German occupation). 
 
1.1.1.6 Remnants and descendants of Khoe and San groups living in South Africa 
The Khoe-San people of South Africa have to a large extent completely lost their identities 
and have integrated or transformed into other populations. What we presently know of 
Khoe and San peoples of South Africa are derived from studies on very few remnant 
populations that survived into the 1700-1800s.  
 
The South African San groups belonged to the !Ui family of the Tuu (Southern Khoisan) 
language division. In historical times a large diversity of !Ui languages were spoken 
throughout all parts of the interior of South Africa. Their geographic range stretched from 
the Namaqualand in the west through the northern Cape, the Free State and Lesotho to 
KwaZulu-Natal and the south-eastern parts of Mpumalanga (old Transvaal). The best 
known of these languages is /Xam, a language mainly spoken in the Karoo, south of the 
Orange River. There were, however, numerous other !Ui languages more or less related to 
/Xam throughout South Africa. A few of these languages were recorded and still had a few 
active speakers in recent history like //Xegwi in the southeastern Transvaal. Of the other 
!Ui languages very little other than a name is known, like //Kx?au of Kimberly, //Ku //e 
(?Ungkue) of Theunissen in the Free State, Seroa (N//? or N//ng) of the Free State and 
Lesotho and !G? !ne of the eastern Cape area (Traill, 1996). 
 
 22 
Of the South African Khoe culture, language and traditions, very little also remains. In 1652 
the Khoe pastoralists of the Cape or the Cape Khoe, spoke either the eastern or the 
western Khoe dialect. The speakers of these dialects, however, rapidly converted their 
language to Afrikaans or Xhosa (on the eastern frontier). The western dialect survived until 
recently in the form of !Ora (Korana) and Xiri (Griqua) among groups of Cape Khoe who 
migrated from the Cape to the Orange River area. The descendents of the Korana and 
Griqua adopted Afrikaans as their mother tongue and today South African Khoe languages 
are virtually extinct outside a few scattered individuals who retained some knowledge of the 
languages. One such individual lived near Colesberg. He spoke a dialect of !Ora that was 
largely unintelligible to Nama speakers, illustrating the differences between these two Khoe 
languages (Traill, 1996). 
 
The next few sections describe the little knowledge we have about the history of these 
South African Khoe-San groups. 
 
1.1.1.6.1 N//? people (?Mountain Bushmen?) 
The N//? people or ?People of the Eland? were groups of San people that inhabited the 
mountainous regions of Lesotho, Natal, Griqualand East and the former Transkei (from 
there their name ?Mountain Bushmen?). Archaeological evidence indicates that the 
mountain regions were only occupied by San groups, with the influx of Bantu-speaking 
agriculturists into the regions of the KwaZulu-Natal midlands (Mazel, 1996). They were 
encountered by travelers and administrators of the 19th century but were already declining 
in numbers by then. At that stage, the available land was owned by Nguni and Sotho 
herders, and the San people lived by raiding the livestock of these herders. With the 
incoming white settlers the few remaining groups finally dwindled in numbers and they 
either died out and / or were absorbed into the Bantu-speaking groups (Wright, 1971; 
Vinnicombe, 1976; Barnard, 1992).  
 
1.1.1.6.2 The //Xegwi 
The //Xegwi is a group of San people that lived in the eastern Transvaal (now 
Mpumalanga) near Lake Chrissie. In the 1950s only 66 individuals were left (Potgieter, 
1955; Ziervogel, 1955; Barnard, 1992). Today only single individuals who still recognize 
 23 
their San ancestry remain, however, no one speaks the language or know of the cultural 
practices anymore (Personal observation). The last //Xegwi speaker who died in 1988 
spoke their own language and Southern Sotho (Potgieter, 1955; Ziervogel, 1955; Barnard, 
1992). The San of Lake Chrissie are believed to have been a collection of remnants from 
the original Transvaal San, such as those that inhabited the Honingklip shelter (Korsman 
and Plug, 1992) and also scattered refugee groups from the Orange Free State (Potgieter, 
1955) and the Natal Drakensberg/Lesotho (Prins, Unknown). These groups fled from the 
in-coming Boer and English settlers and the turmoil that resulted from clashes between 
settlers and the Bantu-speakers. Various historical documents recorded a large group of 
San individuals migrating from the central Natal Drakensberg to the southern Transvaal 
highveld (Prins, Unknown). It is believed that these fleeing Drakensberg San composed a 
large part of the more recent San groups from Lake Chrissie. This is corroborated by the 
fact that the //Xegwi language were very similar to the languages of the ?Mountain 
Bushman? and that their second language was Southern Sotho, a language spoken by 
Sotho people from Lesotho and surrounding areas (Potgieter, 1955; Prins, Unknown).  
 
1.1.1.5.3 /Xam descendants 
The /Xam inhabited a region of the Cape Province known as the great Karoo. The great 
Karoo area of South Africa is an arid scrubland with dispersed hills that stretch over an 
area of 400 000 sq/km of the Northern, Eastern and Western Cape provinces. This area 
was inhabited by both San and Khoe groups up until the late 1800?s. The San group was 
the /Xam and the pastoralist Khoe group was part of the Korana group. The /Xam had 
subgroups (?Ss?wa ka? or ?Plain bushmen?, ?/nussa? or ?Grass bushmen?, ?!Kaoken ss?o? or 
?Mountain bushmen? and ?Brinkkop bushmen?) but they all spoke the /Xam language with 
minor dialect differences (Traill, 1996).  
 
The western world has learnt about the /Xam through the pioneering work of Wilhelm 
Bleek, a 17th century linguist who moved from Germany to the Cape Province. Bleek, his 
sister in law, Lucy Lloyd and his daughter, recorded the cultural practices, language and 
religion of the /Xam people while providing shelter to various /Xam individuals 
(www.lloydbleekcollection.uct.ac.za) (Deacon, 1996).  
 
 24 
There are many reasons for the apparent disappearance of the /Xam; the principal factor 
probably is the advance of Bantu-speaking herders from the north and white colonists from 
the south, which led to the occupation and conquest of the great Karoo in the 18th century. 
Colonist hunters and farmers moved in and occupied all the remaining hunting ground 
previously used by the /Xam. The occupation of their resources was not the only reason for 
the disappearance of the /Xam, they were physically hunted by colonists and bounties 
were placed on their heads. Hunting parties were organized to hunt ?Bushmen?. Males that 
were not killed by hunters fled into the hilltops or were sent off to prisons. Females and 
children where relocated to farms to serve as farmhands, the so-called tame-bushmen. In 
the same way Khoe farmers living in the area were in competition with colonists for grazing 
ground. The Khoe people, however, claimed right to certain lands and had cattle to trade. 
They therefore generally received more respect from colonists than the San people 
(Barnard, 1992; Penn, 1996; Traill, 1996; Bennun, 2004). 
 
The descendants of the /Xam females and children who were relocated to farms, today still 
live on some of the farms but became admixed with the local Xhosa (Bantu-speaking) 
population. Older farm owners still call some of their labourers ?Bushmen? or recall that 
parents or grandparents of their workers were ?Bushmen?. Many farmers, however, tell the 
tale of ?Bushmen? that couldn?t settle in one place and had ?wanderlust?. These people 
became the ?Karretjie? people that had their donkey carts as mobile units and moved from 
place to place to do different periodic jobs (De Jongh, 2002).  
 
The Karretjie people 
The word ?Karretjie? is an Afrikaans word for ?donkey cart?, alluding to their mobile lifestyle 
on donkey carts. Throughout the great Karoo there exist small bands of people living this 
mobile lifestyle but due to recent changes in economical factors, this way of living is quickly 
disappearing. The Karretjie people phenotypically resemble Khoe and San people. Oral 
and archaeological records also suggest Khoe and San ancestry but the group completely 
lost their original language and culture. They identify themselves as ?Coloured? and speak 
Afrikaans. Most of the ?Karretjie? people are sheep shearers and fencers. Typically they 
have a home base or as they call it ?uitspan? or outspan where they keep their cart in 
between jobs. These outspans are usually on a neutral piece of land such as the section of 
 25 
land between a road and a farm fence. They would stay in this space until their skills in 
shearing or fencing was required by a farmer. When this happened, they would pack their 
donkey cart and the whole family and living unit would move to the farm until the work was 
completed, after which they would move back to the same outspan (De Jongh, 2002).  
 
1.1.1.6.4 The ?Khomani 
The ?Khomani together with the /?Auni tribe and several other now extinct groups lived in 
the far northern parts of the northern Cape (north of Upington), the southern part of 
Botswana and the southern parts of Namibia. Roughly where the Kalahari Gemsbok Park 
is located today. They all spoke branches and dialects of the Taa-Lower Nossob branch of 
the Tuu family of Khoisan languages. In 1980 there were only few individuals left who 
remembered a lifestyle of active hunting and gathering in this area. They self identified as 
N/amani and !gabani but by then only spoke Nama (only one woman could speak the N/u 
language, but remembered only words). The individuals said that in the past the San of the 
Gemsbok park area used to live in small scattered groups in the summer and aggregated 
in the area of the Nossob River (southern Botswana) in the winter. There they traded 
goods (ostrich eggshell beads and animal skins) with Tswana groups. Their main food 
sources were gemsbok and small game as well as tsama melons and other wild food 
(Steyn, 1984; Barnard, 1992). 
 
The Khoe-San people presently living in this area, spanning the borders of northern South 
Africa, southeast Namibia and southern Botswana, are from several different tribes that lost 
their individual tribal identities and speak either Afrikaans or Nama. The southern parts of 
Namibia, before the Nama colonization, had many San groups from the Taa language 
family. Today, however, all their descendants speak Nama (G?ldemann, 2006a). The 
South African descendants of these San groups mostly classify themselves as Coloured. 
The following passage from Steyn illustrates how most of South Africa?s Khoe and San 
have been reclassified as Coloured individuals.  
 
 
 
 
 26 
 
?Regarding their present 'ethnic' status, Regopstaan and his wife, as well as Axerob 
and G/okos, said that they were classified as 'coloureds' and, with the exception of 
G/okos who was too young, are all 'coloured' pensioners. Although the others 
seemed to take some pride in what they apparently saw as an improvement on 
'Bushman' status, Regopstaan took exception to this. He told the registering officer 
that he was no 'coloured', but a Bushman, a category that does not exist in the South 
African population classification system. In a sense he had his way; although not 
classified as a 'Bushman', he proudly showed me his identity card on which the 
bearer's name was registered as R. Boesman! ? (Steyn, 1984) 
 
 
 A group of South African descendants of these scattered southern Kalahari tribes now call 
themselves collectively ?Khomani. They have had a recent rediscovery of their identity; 
they won a land claim and organized themselves into a community governed by a council. 
Only very few old individuals, from the Northern Cape (SA) and Botswana, however, still 
speak the N/u language. The term ?Khomani was not known to the N/u speakers, it was 
introduced to San descendants of the northern Cape by representatives of the South 
African San Institute (SASI). Other than N/u, the only other extant Tuu language is !X??, of 
southern Botswana. Unlike N/u, however, !X?? is still an active language and is being 
taught to children (Crawhall, 2003; Sands et al., 2007).  
 
1.1.1.5.5 South African Khoe descendant groups 
The Khoe groups of South Africa included the Cape Khoe of the southern parts of the Cape 
Province, the Korana who occupied large parts of central South Africa extending over the 
Northern Cape into the Free State and the Nama of the North Western Cape region in the 
Richtersveld area extending into Namibia. Although Cape Khoe and Korana do not exist 
anymore today as specific populations their descendants were incorporated into ?mixed 
culture? groups like the Griqua, Baster and Coloured groups with their associated cultures. 
Certain aspects of Khoe culture can still be recognized in rural areas where livestock 
rearing is the prime economic goal. In a way the Khoe culture formed the base of the 
Griqua, Baster and Coloured cultures that developed (Barnard, 1992). 
 
1.1.1.5.6 The !Xun and Khwe of Platfontein 
Although not originally from South Africa, the !Xun and Khwe of Platfontein now made 
South Africa their permanent home. They originally came from Angola and were employed 
by the South African Defense Force (SADF) before they were relocated to SA. Five 
 27 
hundred veterans of the SADF together with 3500 dependants were relocated in 1990 from 
Namibia to South Africa (Sharp and Douglas, 1996). They currently live in Platfontein, near 
Kimberly. 
 
The people of Platfontein are two different San groups with separate identities. One third of 
the people are known as Khwe (also were called Barakwena) and two thirds are !Xun (also 
were known as Vasekele). They speak different languages and have a different phenotypic 
appearance. The groups have remained separate and have insisted to be settled in 
different parts of the camp. The !Xun group retained a much more cohesive nature and 
cling to their San identity. They have not mixed with outsiders beyond the camp and have 
retained a much more unified group than their Khwe counterparts. The Khwe have been 
more ambivalent about their group identity and have established relationships with 
surrounding South African groups (Sharp and Douglas, 1996). 
 
Although the people of Platfontein have separated themselves into these two groups, 
members within these groups were not individuals that came from the same area or even 
knew one another. The !Xun came from a wide region in central Angola around Serpa 
Pinto (currently Menongue) where many of them lived as stock farmers or cultivators 
alongside Bantu-speaking groups. !Xun men from different regions were recruited into the 
Portuguese colonial military in the late 1960s. When the Portuguese moved out the !Xun 
affiliated with a liberation force, FNLA, in the Serpa Pinto region. FNLA had links with the 
SADF and when FNLA collapsed the !Xun were recruited by the SADF and brought to the 
Omega military base in the Caprivi strip of the then South West Africa (Namibia) 
(Guenther, 1986; Sharp and Douglas, 1996). 
 
The Khwe on the other hand originally came from south-east Angola where they have lived 
along the river systems as cultivators and cattle keepers. They have also originally come 
from a widespread region of southeast Angola and were recruited into a different unit by 
the Portuguese army. When the Portuguese moved out of Angola, the Khwe fled into 
neighboring countries like southwest Zambia, northwest Botswana and the Caprivi Strip of 
South West Africa where there were other Khwe people amongst whom many of the Khwe 
 28 
soldiers had kin. From there they were recruited into the SADF (Guenther, 1986; Sharp 
and Douglas, 1996).  
 
This difference in recruiting background underlies the differences in the attitude that the 
two groups had towards the SADF. The !Xun had a favorable opinion of the SADF because 
the SADF saved them from Angola when FNLA collapsed. Also, there were no resident 
!Xun population in the Caprivi and they were dependant on the SADF. On the other hand 
the Khwe were much more skeptic about the army and what the army had to offer them. 
This is because the Angolan Khwe blended into the local Khwe population and only joined 
the army at Omega base as a source of employment (Guenther, 1986; Sharp and Douglas, 
1996).  
 
Many of the !Xun were later (late 1970s) relocated to the second ?Bushman battalion? in 
Tsumkwe. At Tsumkwe they were meant to join up with the Ju\?hoansi of Nyae Nyae but 
the Ju\?hoansi saw the !Xun as invaders and they had to be kept in isolated bases in 
western Bushmanland. Thus, in 1990 a large number of !Xun opted to come to South 
Africa while many of the Khwe stayed in the Caprivi where they had local contacts 
(Guenther, 1986; Sharp and Douglas, 1996). 
 
Both these groups were relocated in 1990 to the Schmidtsdrift military base. The South 
African government was reluctant to allocate land or commit funds to secure the future of 
the San groups. The SADF saw these two groups as ?former mercenaries who have 
outlived their usefulness? (Guenther, 1986; Sharp and Douglas, 1996). The !Xun and Khwe 
trust where established in 1993 to look after the interests of the groups. They remained in 
tented camps near the Schmidtsdrift military base for several years until recently, the new 
South African government allocated land to them in Platfontein near Kimberley, where they 
settled (Guenther, 1986; Sharp and Douglas, 1996). 
 
 
 
 
 
 29 
1.2 Khoe-San history 
 
1.2.1 Linguistics, Archaeology and Ethnography 
 
1.2.1.1 Khoisan Linguistic Family 
The languages of Africa are divided into four super language families, namely, Afro-Asiatic, 
Niger-Kordofanian, Nilo-Saharan and Khoisan. It has been believed for a long time that 
Khoisan is a single linguistic family with a common ancestor giving rise to all Khoisan 
languages (Greenberg, 1963). Recently, however, linguists studying Khoisan languages 
argue that all of the Khoisan languages are not necessarily genealogically related 
(Westphal, 1971; G?ldemann, Forthcoming-a) and the similarities between some of the 
main branches of Khoisan may be due to areal language contact. These main branches 
might be genealogically related and have very deep roots, but even the best linguistic 
methods cannot distinguish chance, inheritance, and contact over time depth of over 10 
000 years. Thus, current linguistic methods do not have the resolution to prove that all of 
the main Khoisan branches are genealogically related (G?ldemann, 2007; G?ldemann, 
Forthcoming-a).  
 
Current understanding indicates that the Hadza, Sandawe, Khoe-Kwadi, Ju and Tuu 
language families and possibly the ?H?? language are linguistic independent lineages 
within the Khoisan language group (see Table 1.1). They represent separate genealogical 
groups, which have not yet been proved to be linguistically related to each other or to any 
other language in the world (G?ldemann, 2007; G?ldemann, Forthcoming-a). While Hadza 
appear to be totally unrelated to all the other Khoisan languages, a recent study, however, 
does note a promising relationship between Khoe-Kwadi and Sandawe (G?ldemann and 
Elderkin, Forthcoming). 
 
There are only a few other similarities in the main branches of Khoisan languages than the 
fact that they use clicks as phononemic speech sounds. Many languages over the world, 
however, use clicks as paralinguistic speech sounds. Also at least one other language not 
related or in contact with the Khoisan language family, in aboriginal Australia, use clicks 
within their language. Instead of genealogical relationships between all the Khoe-San 
 30 
languages it might be possible that there was an earlier linguistic macro-area that stretched 
from eastern Africa to the southern Africa with a linguistic-areal connection. The Bantu 
expansion into eastern and southern Africa erased this connection by causing the 
extinction of a many local languages, which might have shared clicks as a common 
phoneme type (Traunm?ller, 2003; G?ldemann, 2007). 
 
To determine if there is deeper genealogical structure within the main branches of Khoisan, 
proto languages were inferred predating the current languages. In doing so it was 
discovered that the Khoe language was related to the now extinct language Kwadi. Khoe 
and Kwadi would have formed two sister branches deriving from an ancestral language, 
Proto-Khoe-Kwadi (G?ldemann, Forthcoming-b; G?ldemann and Elderkin, Forthcoming). 
Khoe-Kwadi also showed promising links to the east African language Sandawe (the other 
east African click language, Hadza, however, show no relationship to any other language) 
(G?ldemann and Elderkin, Forthcoming).  
 
1.2.1.2 Khoe-San History according to Linguistics 
The Ju and Tuu branches (non-Khoe branches) of the southern African Khoisan family 
show some linguistic homogeneity but the link is unclear from a historical perspective. It 
can either be due to a very old common ancestor or due to areal convergence of two 
distinct lineages over a very long time. This group in cultural-ethnology terms consists of 
foragers only and shows continuity from very old archaeological records (G?ldemann, In 
Press). The Tuu (southern) branches is thought to have a separation that goes back the 
furthest in history, based on the degree of linguistic distances of languages within the 
branches. The languages within the Tuu branch differ widely among themselves 
suggesting an extended process of divergent development. In the Ju branch languages are 
closely related to each other but not always mutually intelligible (Vossen, 1998; Miller-
 Ockhuizen and Sands, 1999). 
 
The Khoe-Kwadi branch is the largest attested Khoisan lineage; it contains considerable 
internal sub-branching and has a wide geographic spread. All of this suggests divergence 
and expansion of this family. The population is also diverse in cultural-ethnology terms and 
consists of both foragers and pastoralists (Vossen, 1998; G?ldemann, In Press). 
 31 
 
In historic times the Cape region had two groups of people that spoke two genealogical 
unrelated Khoisan languages. One group spoke a language belonging to the Khoe branch 
of Khoisan languages and the other group spoke an unrelated language belonging to the 
Tuu branch of Khoisan languages. Only two languages in these two branches still have 
active speakers in the Cape today. Nama (Khoe branch) are spoken by a few thousand 
people in the Richtersveld area in the northwestern corner of South Africa and N//u (Tuu 
branch) are spoken by fewer than 20 individuals scattered over the Northern Cape region 
north of the Orange River. A few extinct languages from this area, namely, !Ora (Khoe 
branch) and ?Ungkue and /Xam (both Tuu branch) have sufficient recordings to be 
linguistically analysed (G?ldemann, 2006b). 
 
The history of the Cape region of southern Africa, inferred from a linguistic perspective can 
be summarised as follows. The oldest known ethno-historical layer was the foraging 
society of the San. In the Cape the group involved, correlated with the !Ui linguistic unit. 
From 2 000-2 500 years BP, a new cultural type with animal husbandry appeared 
according to archaeological findings. In the Cape this group correlate with the distinct 
linguistic group, the KhoeKhoe. The archaeological record of the trajectory of pastoral 
expansion suggests that the KhoeKhoe entered the Cape from the north rather than the 
east. Corroborating this is the fact that the linguistic groups most closely related to the 
KhoeKhoe (the Kalahari Khoe) live in Botswana, Namibia and Angola. Due to their mode of 
life, pastoralists did not inhabit inhospitable areas like the Karoo and Kalahari. In coastal 
areas, however, and areas around great rivers a co-habitation of the !Ui foragers and the 
KhoeKhoe pastoralists for around two millennia is assumed. Because of the asymmetric 
relationship that usually exists between hunter-gatherers and pastoralist it would be 
expected that there would be an incorporation of hunter-gatherer females into the 
pastoralist group together with culture and language elements but not the other way 
around. From this would follow that the !Ui language would have an influence on 
KhoeKhoe. This can clearly be seen in a linguistic analyses of the KhoeKhoe language 
compared to the !Ui languages. Compared to the other Tuu languages the !Ui language 
structure stayed relatively unchanged while KhoeKhoe diverged from other Khoe 
languages (Kalahari Khoe and Kwadi) and incorporated many linguistic elements from the 
 32 
!Ui branch of the Tuu languages and leading to a situation where KhoeKhoe have a strong 
linguistic substrate of the Tuu languages. This scenario would imply that some gene-flow 
has occurred, probably from the San to the Khoe (through the incorporation of San females 
by the Khoe). Thus in a genetic sense the geneflow from the southern San, !Ui speakers 
into the KhoeKhoe would be apparent through studies on mitochondrial DNA but not in Y-
 chromosome studies, while autosomal markers would give an intermediate picture. The 
KhoeKhoe of southern Africa later expanded and moved back into Namibia and became 
the Nama of Namibia, but still retain the evidence of contact with the southern San !Ui in 
their language and presumably also would in their genetics (G?ldemann, 2006b).  
 
1.2.1.3 Khoe-San History according to Archaeology and Ethnography 
Archaeology is widely used to study and infer the history of the human population. An 
advantage that archaeology has is that some of the material used in investigations are very 
robust and withstand deterioration through time very efficiently. Depending on the material 
used in investigation (i.e. wood, bone, stone, etc.) the time depth investigated could be 
very deep. Since early hominid species used stone tools, archaeology can investigate 
hominid associated culture and demographics up to millions of years before present. 
 
It is generally assumed that the presence of flaked stone artifacts in the archeological 
record indicate the presence of true humans of the genus Homo. The time period in which 
members of the genus Homo had the ability to use and manipulate stone is known as the 
Stone Age. The Stone Age started ~2.5 million years BP and is divided into three stages, 
namely, the Earlier, Middle and Later Stone Age. Throughout all the stages of the Stone 
Age, humans were present in southern Africa. Their signature was left behind in the 
changes they caused in their environment and are studied by archaeologists.  
 
By studying the archealogical record it is possible to identify certain demographic 
tendencies in the human populations involved. For example, it is possible through looking 
at the sizes and frequencies of archaeological sites to infer population densities and 
thereby identify population expansions and contractions. These expansions and 
contractions can then be linked to certain events in the paleoenvironment. Similarly, by 
studying the genetic variation present in extant populations one can also identify historical 
 33 
population expansion and contraction patterns which can be dated to certain times in the 
past. It is therefore one of the aims of this thesis to identify these genetic signatures of 
population expansions and contractions and to try and correlate it with information 
available from the field of archaeology. The next section thus reviews the different stages 
in the archaeological record, their associated times, paleoenvironment and signatures of 
human occupation. 
 
The Earlier Stone Age (ESA) occupied the time period from 2.5 million years BP to 250 000 
years BP in southern Africa and is characterized by the use of large rudimentary flaked 
artifacts like handaxes. Throughout this stage there are evidence of the occupation of 
southern Africa by various hominans (humans and their extinct relatives) (Deacon and 
Deacon, 1999; Mitchell, 2002; Wadley, 2007).  
 
The Middle Stone Age (MSA) saw the introduction of ?cores? (pieces of rocks that are 
skillfully prepared to produce flakes of regular size and shape) into the archaeological 
record and stretched from 250 000 years BP to ~30 000 years BP in southern Africa. MSA 
tools were generally smaller than ESA tools and lack the large handaxes and cleavers. 
There is no consensus on the definition of the MSA. Some archaeologists believe that it is 
a time related sequence while, others identify it as a package of technologies. For some 
archaeologists the MSA in southern Africa is associated with the appearance of 
anatomically modern people (Homo sapiens) (Wadley, 2007; Lombard, 2008). This was 
confirmed with the discovery of remains of early modern human fossils dated to 90 000 and 
110 000 - 120 000 years BP at the Klasies River site in the eastern Cape. Further proof 
was the discovery of early modern human remains of a similar time period at a site named 
Border Cave on the KwaZulu-Natal, Swaziland border. Furthermore some form of 
symbolism can be dated as far back as 77 000 years BP. The shell beads from this period 
found at Blombos Cave imply individual or group identity and symbolism. While these 
cognizant and anatomically modern humans were roaming southern Africa, the European 
landscape was still dominated by Neanderthals (Deacon and Deacon, 1999; Henshilwood 
et al., 2002; Wadley, 2007; Lombard, 2008). The earliest known set of morphological 
characteristics associated with modern humans, however, appears in fossil remains from 
Ethiopia, dated to ~150 000 ? 190 000 years BP (White et al., 2003; McDougall et al., 
 34 
2005). This finding does not exclude the probability that modern morphological traits 
existed in other regions of Africa (such as southern Africa) during this time. In other regions 
specimens may have been less well preserved or archaeological and paleontological 
investigations may not have been conducted as yet (Lahr and Foley, 1998; Reed and 
Tishkoff, 2006). Presently a multiregional origin model for modern humans within Africa is 
not as unlikely as it would be for global populations (Lahr and Foley, 1998; Campbell and 
Tishkoff, 2008). 
 
Regarding the paleoenvironment of the MSA, it was previously believed that the period 
between 60 000 and 25 000 years BP was marked by very arid conditions in southern 
Africa, which led to a continuous population decline (Klein, 2000; Klein et al., 2004). This 
was partly inferred by an impoverished archaeological record for this period. Recently, 
however, a paper by Mitchell (Mitchell, 2008) summarised paleoenvironmental data that 
refute the presence of hyperarid conditions in southern Africa during this period. 
Furthermore, he showed that a substantial archaeological record does exist for this period, 
albeit not as well studied as the periods that flank this stage (the earlier Stilbaai and 
Howiespoort cultures and the later LSA period). In addition many of the human foci, which 
exploited coastal recourses during this period, are today submerged since the sea-level 
was 30-60m below the present level (Mitchell, 2008). 
 
The Later Stone Age (LSA) display technology to produce small specialized tools, such as 
microlithic tools and saw the introduction of bows and arrows, needles, bored stones, 
fishing equipment, etc. This period stretched from between 30 000 - 20 000 years BP to 2 
000 years BP. The transition from the MSA to LSA is an uncertain concept, while some 
archaeologists believe the LSA began as early as 40 000 years BP, others insist that in 
certain regions MSA technology can only be found as recent as 20 000 years BP. It was 
suggested that the division of the MSA and LSA might be more of an archaeological 
construct than a real divide. The LSA do, however, have marked technological innovations 
and a regular occurrence of behaviour that were only rarely found in the MSA (Wadley, 
2007). It is almost certain that the LSA sites were occupied by the descendants of the 
people who practiced MSA technology. Many sites have evidence for both complexes. San 
art, tools, burials and other remains of San hunter-gatherer lifestyle is associated with the 
 35 
LSA and can be traced back with confidence as far as 22 000 years BP in the 
archaeological record. Also San social structure is very evident in archaeological remains 
for the past 10 000 years. Archaeological deposits from the MSA suggest that the social 
organization and rules of group behaviour did not change with the transition of the MSA 
into the LSA and were the same for the last 100 000 years or more. It is most likely that the 
MSA people that lived in southern Africa were the direct ancestors of the LSA people, 
namely, the San (Deacon and Deacon, 1999). 
  
Concerning the paleoenvironment of the LSA, the period leading up to the Last Glacial 
Maximum (LGM) (28 000 to 19 500 years BP) is marked by the occurrence of ?higher 
energy? human settlement during certain periods and at certain sites. These sites include 
Lesotho, southern Cape, Caledon valley, southern Namibia and the southern Kalahari. The 
LGM period (18 000 years BP) was associated by significantly colder conditions and 
intensified aridity before moister and milder conditions returned after 16 000 years BP. The 
LGM associated period (19 000 ? 15 000 years BP) is marked by a major downturn in 
population size and distribution and may have caused localized extinctions. The rise in 
population numbers after the LGM was initially seen only at the few sites that existed 
though the LGM. The rise was slow until 13 500 years BP, thereafter population growth 
accelerated and deserted sites were reoccupied and new sites established. Distinct 
technological traditions for this period are reported for sites from South Africa (Robberg 
industry) compared to Namibian, northern Botswana and Mashonaland sites. It was 
suggested that this distinction could reach back to the distinct Tuu and Ju linguistic 
traditions and possibly also genetic distinctness (Mitchell, 2002). 
 
Relative cool conditions remained throughout the Pleistocene to Holocene conversion (10 
000 years BP) and maximum temperatures were only reached 8 000 years BP. The rise in 
sea level was effectively completed around 9 000 years BP and submerged large areas of 
previously exposed grassland. Groups became more concentrated and social exchange 
between groups increased. The later Holocene sites (~4 000 years BP) documents rising 
populations, expansions into new habitats and elaboration of material culture, especially in 
the Cape Fold Belt and Thukela basin. Technologies, which were characterized by delayed 
rather than immediate returns, developed and increased. For instance, the ?firestick-
 36 
farming? technology developed and practiced in the southern and eastern Cape, which 
regulates flowering and production times of geophytes, increased food production 
capabilities of populations dramatically (Mitchell, 2002). 
 
The archaeological record from 2 000 years BP changed radically with the introduction of 
pastoralism to southern Africa. This transition is marked by the introduction of pottery and 
sheep remains in the archaeological record followed by the introduction of cattle and 
domesticated dogs. The herder way of life is associated with the people who spoke the 
Khoe languages. The general feeling among current researchers is that a sheep herding 
economy and ceramics were adopted by aboriginal Khoe-speaking hunter-gatherers from 
Bantu-speaking agro-pastoralists. These agro-pastoralists were spreading south from east 
Africa and arrived in Zambia/Zimbabwe ~ 2 100 years BP. Current theories suggests that 
the transfer took place in southeastern Angola, southwestern Zambia or northern 
Botswana. From the core area of northern Botswana the sheep together with the Khoe-
 speaking herders migrated southwards and gradually settled in between the hunter-
 gatherers from South Africa (Smith, 1983; Smith, 1992; Sadr, 1998).  
 
Two migration routes are proposed, the first hypothesizes that stock keepers came west 
through northern Botswana and Namibia, down the Atlantic coast to the Cape and then 
further along the south coast and inland Cape areas (Stow, 1905; Cooke, 1965). This 
theory is based on the occurrence of paintings of sheep and shepherds in Zimbabwe and 
the ecological improbability of moving through the central Kalahari. This theory is also 
supported by records of oral traditions (Stow, 1905; Cooke, 1965). The second theory 
proposed that Khoe groups from northern Botswana acquired livestock from their Bantu-
 speaking Iron Age neighbors to the north. Subsequently their population and herds grew 
and the population spread south along the Zimbabwe/Botswana border, east of the 
Kalahari, towards the confluence of the Orange and Vaal Rivers. From there some groups 
spread south to the coast following one of the river valleys such as the Seekoe River and 
from there east and west along the coast. Other groups followed the Orange River to the 
Atlantic from where they spread north into Namibia and south into Namaqualand (Elphick, 
1977). At the moment various archaeological findings lends more support to the Atlantic 
coastal route through Namibia (Mitchell, 2002). The earliest dates for the arrival of sheep 
 37 
and ceramics in the Cape is 2 100 years BP in the northern Cape and 1 900 years BP on 
the southern Cape coast (Sealy and Yates, 1994; Henshilwood, 1996). It is further 
suggested that some of the hunter-gatherers of the Cape area were recruited and 
incorporated into Khoe culture. Those who remained hunter-gatherers moved into areas 
unsuitable for domestic stock or settled into an established working and trading relationship 
with the herders (Smith, 1983; Smith, 1992; Deacon and Deacon, 1999).  
 
Both theories support population movement from the northern Botswana Khoe groups 
together with the pastoralism culture to the southern parts of Africa. This is supported by 
the linguistic, glottochronology findings that the KhoeKhoe languages of the south, 
diverged from the Kalahari Khoe languages ~2 000 years BP (Ehret and Posnansky, 
1982). The homogeneity of the KhoeKhoe dialects further indicates a rapid recent 
expansion. The KhoeKhoe expansion is, however, only one component of the spread on 
the Khoe language group. The explosion of sites across Botswana in the last 2 000 years 
coupled to the oral traditions of Khoe-speaking San groups that they formerly owned 
livestock might be an indication of the Kalahari Khoe expansion linked to pastoralist 
groups/culture (Walker, 1995; Mitchell, 2002). 
 
Certain evidence in the archaeological record, however, indicates that a simple demic 
diffusion model might not be sufficient to explain the spread of pastoralism. The spread of 
ceramics is thought to be associated with the spread of pastoralism and the two 
technologies form a package. The rapidity of the spread of ceramics ahead of the 
pastoralist culture and their occurrence in sites where herders never penetrated raises 
questions. If ceramics and pastoralism was a package spread by the KhoeKhoe herders 
they would regularly co-occur, which is not always the case. There would also be a ceramic 
stylistic chain that link assemblages in the Cape and southern Namibia to those in 
Botswana from whence they came. The stylistic chain would thus mirror the migration 
routes of the people. Archaeological sites with sufficient material are not adequate to make 
definite conclusions. Thus far, however, evidence of radical differences between styles 
argues against a common origin (Sadr, 1998; Mitchell, 2002). 
 
 38 
The theories supporting a demic diffusion, argues that it would be very improbable of 
hunter-gatherers to adopt the pastoralist culture and therefore a population that spread the 
pastoralist culture is essential. Two hypotheses exist about the interaction between hunter-
 gatherers and herders. The first is that there is a great deal of overlap between these social 
and economic categories, and hunters who obtained stock could easily convert to herding, 
while herders who lost their stock easily fall back to hunting (Elphick, 1977). This theory 
thus support that at least some of the first livestock diffused southward from one to another 
group of hunter-gatherers (Deacon et al., 1978; Deacon, 1984; Klein, 1986; Kinahan, 
1995). The other hypothesis supports separate social and economic groups that do not 
interchange easily (Parkington, 1984; Parkington et al., 1986). These separate groups can 
be identified archaeologically through different cultural signatures in deposits (Smith et al., 
1991; Smith, 1992). Hunter-gatherers are seen as groups living on the fringe of herding 
society. They utilize wild resources, but occasionally interact as clients, through trading 
with herders. The hunters will also make forays against the herds of the herders leading to 
persecution and wars. Hunting and herding is thus quite discrete economic categories with 
hunter-gatherers occupying niches on the fringes of pastoralist society in a lower class or 
subservient status. These theories argues that the pastoralism culture requires a 
fundamental change in how hunter-gatherer social relations are organized and that the 
conversion of hunter-gatherer culture to pastoralism is very improbable (Parkington, 1984; 
Parkington et al., 1986; Smith, 1986; Smith et al., 1991; Smith, 1992; Boonzaier et al., 
1996). 
 
Archaeology can, however, not conclusively prove whether the spread of pastoralism is 
associated with a demic diffusion of populations together with the pastoralist culture or a 
diffusion of the culture on its own. An intermediate model where only few individuals, 
perhaps only males, spread and transferred the pastoralist tradition and their language to 
resident hunter-gatherer groups further south is also possible. A genetic approach using 
male specific and female specific markers would be ideal in this case and would be 
addressed in one of the aims of this thesis.  
 
The introduction and manipulation of iron and copper tools in southern Africa is known as 
the Iron Age and is associated with the arrival of the pre-colonial Bantu-speaking farmers 
 39 
(Deacon and Deacon, 1999). The relationship and interaction between the hunter-
 gatherers and the in-moving Bantu-speakers is another hotly debated topic in the 
archaeological community. While some groups support hunter-gatherers as affluent 
independent communities (Marshall, 1976; Lee, 1979) others support the theory that in-
 moving Bantu-speakers marginalized, dispossessed and isolated San communities 
(Wilmsen, 1989; Wilmsen et al., 1990). There is also support for a theory that San-Bantu-
 speaker relations varied temporally and geographically. In some instances they may have 
retained their independent hunter-gatherer lifestyles and in some they may have been 
marginalized and subjected by Bantu-speakers (Campbell, 1990; Sadr, 1997). 
Furthermore, some communities may in fact have had active and beneficial trade relations 
with Bantu-speakers and therefore benefiting indirectly from the cultivator/pastoralist 
culture (Nurse, 1983; Denbow and Wilmsen, 1986; Campbell, 1990; Sadr, 1997). Some 
resolution to this problem might be found in the analysis of population expansion and 
bottleneck/contraction signals found in genetic data. If in-moving pastoralists adversely 
affected San communities a signal of a recent population contraction would be evident. 
Such a post-Neolithic population bottleneck was indeed proposed recently through analysis 
of hunter-gatherer genetic data (Excoffier and Schneider, 1999) (see further discussion in 
section 1.2.2.2). An investigation of genetic evidence for recent population bottlenecks 
associated with the in-moving herders will form the basis of one of the aims of the present 
thesis. 
 
The major events evident in the archaeological record in the history of the LSA San hunter-
 gatherers and the Khoe herders can be summarised as follows. During the MSA to LSA 
transition (30 000 ? 20 000 years BP) there was the introduction of the specialized LSA 
technology and certain sites showed increases in population sizes but only for truncated 
periods. The population density only increased noticeably from 13 500 years ago and 
especially in the last 4 000 years. The hunter-gatherers from northern Botswana adopted a 
herding economy (and perhaps the Khoe-language) 2 000 years BP and migrated 
southwards into South Africa. During the same time Bantu-speakers moved southwards 
from East Africa and settled in the eastern parts of South Africa (Deacon and Deacon, 
1999; Mitchell, 2002). There was trade and interaction between San, Khoe and the metal-
 working Bantu-speaking agriculturists of the Early Iron age. At the time of European 
 40 
colonization the eastern part of southern Africa had been populated by Iron Age Bantu-
 speakers for about 1 000 years. Hunter-gatherers had developed working relationships 
with the Bantu-speakers as well as the Khoe herders who had been settled in the southern 
and western parts of southern Africa for at least 1500 years. This situation was disrupted 
by the loss of control over land with the start of the European colonization (Deacon and 
Deacon, 1999). 
 
1.2.1.4 Khoe-San history according to Physical Anthropology 
While the archaeological record attest to a continuous human occupancy of southern Africa 
from the Earlier Stone Age to present times, it is difficult to directly link specific signatures 
and fossils in the record to ancestors of present day populations. Aside from ancient DNA 
studies, which up to now have not been successfully conducted on human fossils from 
southern Africa, physical anthropology provides a possible solution to the problem (Morris, 
2005; Morris, 2008). The field of physical anthropology studies the osteological features of 
fossils and compares them to current osteological features from different populations. 
Although this field of research are regarded by some scientists as controversial and/or 
obsolete, this is still an active area of research that uses state of the art statistical 
procedures and contributes valuable hypotheses about the history of the Khoe-San in 
southern Africa (Morris, 2005; Morris, 2008). It should, although, be stressed that genetic 
studies conducted to date have not been able to show correlation between morphological 
features and genetic variants. In other words, it is not possible to explain how the different 
anthropometric traits found in modern humans have come about and which gene(s) are 
responsible for generating particular traits. The next section briefly outlines how studies 
based on methods used by physical anthropologists have contributed to reconstructing the 
early history of southern African populations. 
 
Craniometric studies suggest that the earliest appearance of the morphological traits  found 
in South African Khoe-San could be traced to around the terminal Pleistocene and early 
Holocene period (around 12 000 BP) (Stynder et al., 2007a; Stynder et al., 2007b). The 
fossil evidence before 50 000 years BP are difficult to link to any contemporary population 
(Beaumont, 1980; Grun et al., 1990; Morris, 1992), while late MSA osteological features 
such as the Hofmeyer cranium (36 000 years BP) falls outside the range of modern Khoe-
 41 
San variation (but surprisingly fall within the range of European Upper Paleolithic cranial 
variation) (Grine et al., 2007). A hypothesis that was put forward was that the aboriginal 
populations from southern Africa developed the distinct  Khoe-San morphological traits 
after a period of  isolation, caused by the arid conditions of the LGM, in which drift and 
selection acted on the isolated southern African populations (Morris, 2002).  
 
In contrast to the uncertainty surrounding morphological features of fossils before the LGM, 
it has been shown that there was continuity in the morphological features of fossils from the 
terminal Pleistocene until present day. Two periods of possible genetic discontinuity was 
identified around 4 000 years BP, when the population sizes increased dramatically and 
around 2 000 years BP, when pastoralism was introduced. It was however concluded that 
the variations of morphology during these two stages was most likely due to in situ 
changes of populations in response to environmental factors (Stynder et al., 2007a). Aside 
from the evidence of population continuity across the time when pastoralism was 
introduced to the southern parts of Africa, further craniometric studies failed to find 
evidence for distinctive features between southern African hunter-gatherers and herders 
(Stynder, 2009). The study note a small increase in variation during this period but fail to 
find support for a large-scale immigration of morphologically different herders or the long 
term co-existence of two different populations. The two hypotheses offered to explain the 
small increase of variation were (1) a small-scale immigration of morphologically distinct 
herders or (2) increased morphological variation in response to lifestyle changes due to the 
adoption of pastoralist practices (Stynder, 2009). 
 
Another area in which physical anthropology contributed valuable information is the 
question of whether a connection exists between the Khoe-San and east African 
populations. While evidence of recent common ancestry or contact between these two 
groups exist in the linguistic and genetic fields, contemporary physical anthropology have 
not found any support of morphological commonalities (Morris, 2003; Morris and Ribot, 
2006). For decades the presence of ancient Khoe-San populations in east Africa has been 
accepted in anthropological literature, however, a review of the initial studies and evidence 
failed to find any support of a overlap between east African and Khoe-San morphological 
variation (Reviewed in Morris, 2003 and Morris, 2008). 
 42 
1.2.2 Khoe-San history according to molecular genetic studies 
 
During the course of the 20th century molecular biology and genetics started to contribute 
towards inferences of the histories of different population groups. Studies in physical 
anthropology that concentrated on morphological trait differences in the different groups 
were also common (Reviewed in Tobias, 1985). However, because phenotypic traits are 
not inherited in a straightforward manner and are more sensitive to observational error and 
environmental influence many reports and articles seems to be contradictory. While some 
of the studies could find differences between San and Khoe groups and their different 
subgroups other studies failed to find significant differences (Jenkins, 1986). The first 
molecular biology study on Khoe-San groups was based on the use of ABO blood groups 
in 1932 (Pijper, 1932). Since then several other serogenetic markers have been used to 
examine patterns of genetic affinities of the Khoe-San (Reviewed in Nurse et al., 1985; 
Jenkins, 1986 and Jenkins, 1988). Section 1.2.2.1 will give an overview of these 
serological studies and highlight the important findings. More recently work on the 
hereditary material itself, DNA, were published. These studies are, however, few and 
involve only few selected Khoe-San groups. Sections 1.2.2.2 - 1.2.2.4 will review the 
published genetic studies on Khoe-San.  
 
1.2.2.1 Serological studies 
 
1.2.2.1.1 Differences between San and Khoe 
When serological studies were conducted on the San and the Khoe the most prominent 
difference between them were found using the ABO and Rhesus blood group systems as 
well as the haptoglobins. In the ABO system the B allele has a very low frequency in the 
San groups (including the Khoe speaking San groups like the /Gui, //Gana and Naro) and 
occur at frequencies less than 0.04. In the Khoe groups (Sesfontein Topnaars, Tsumaris 
Nama and Nama from southern Namibia) it was found at frequencies 4-8 times higher than 
that in the San groups. These frequencies are similar to the Zimbabwean and Zambian 
Bantu-speakers and marginally lower than the South African Bantu-speakers (Pijper, 1932; 
Pijper, 1935; Zoutendyk et al., 1955; Jenkins and Nurse, 1972; Jenkins, 1986). In the 
Rhesus system the allele frequencies of the different alleles differed significantly between 
San and Khoe groups and the frequencies in Khoe groups correspond more to the 
 43 
frequencies found in Bantu speakers. Haptoglobin Hp1 frequencies is also different in San 
and Khoe groups with a low frequency in San groups and a higher frequency in Khoe 
groups again corresponding to the higher frequencies in Bantu-speakers (Jenkins and 
Nurse, 1972; Jenkins, 1986). A possible explanation for the correspondence of the Khoe 
frequencies to the Bantu-speaking frequencies rather than to the San groups can be due to 
the high amount of Dama (a subservient group with Bantu-speaking ancestry) admixture 
into the Nama groups. Since the Nama is the only extant Khoe group the hypothesis 
cannot be tested by comparing it to the frequencies of other Khoe groups (Jenkins, 1986). 
 
1.2.2.1.2 Differences between Khoe-San subgroups 
In 1971 Jenkins et al., combined allele frequencies from several serogenetic studies (blood 
groups, serum protein and red cell enzyme systems) in multivariate analysis through 
genetic distances. Data from gene frequencies in 10 loci in 18 southern African populations 
were compared though genetic distance measures coupled to clustering methods (Jenkins 
et al., 1971; Jenkins, 1986). Figure 1.3 A and B shows a clustered tree and Principal 
Component Analysis (PCA) plot adapted from the genetic distance matrix published in 
Jenkins (1986). 
 
In the cluster analysis (Figure 1.3 A) all the Khoe and Khoe descendant groups cluster 
together. The grouping of the Hai//om within the Khoe cluster is in contrast to the theories 
that the Hai//om is a !Xun group that acquired the Nama language. The Hai//om does not 
cluster with any of the four Ju groups (!Xun-Vasekele, !Xun-Kavango valley, Ju\?hoansi, 
?X?ao//??esi) but rather with two Nama groups. Interestingly, the Hai//om that inhabit 
northern Namibia (Figure 1.1), cluster more closely with the two southern Nama groups of 
Keetmanshoop and !Kuboes (Richtersveld) than the more northern Rehoboth Nama. This, 
however, might have to do with the various amount of admixture from Caucasoid and 
Bantu-speaking groups into the different Nama groups. The clustering of the Basters and 
Coloured populations within the Khoe cluster confirm the large inputs of Khoe groups into 
these two groups of hybrid ancestry. 
 
All of the San groups except the !Xun (previously also referred to a Vasekele) form a 
monophyletic cluster. The clustering within the San cluster does not conform to linguistic 
 44 
clustering but rather with geographic proximity. The speakers of northern San languages of 
the Ju linguistic group (Tsumkwe Ju\?hoansi, ?X?ao//??esi, Kavango valley !Xun and the 
Vasekele !Xun) does not form a uniform cluster and neither do the Khoe-speaking San 
groups of the central Kalahari (Naro, /Gui and //Gana). Rather the Naro (Khoe linguistic 
group) and ?X?ao//??esi (Ju linguistic group) who have geographically overlapping 
territories form the closest cluster; followed by the Ju\?hoansi (Ju) which is also 
geographically close. The G!ang!ai !Xun then joins the cluster also according to geographic 
distance. The two central Kalahari San groups /Gui and //Gana, forms a separate branch 
within the San cluster. This indicates that geographic separation has a greater influence on 
geneflow than linguistic barriers. The !Xun cluster with two Bantu-Speaking groups 
suggests a higher amount of Bantu-Speaking admixture. 
 
The Bantu-speaking groups (Kgalagari from Botswana, Ngwato - a Tswana group from 
Botswana, Herero from Namibia) form two separate clusters that also include the Khwe 
and Dama who are classified as ?Khoisan speaking Negros? by Jenkins (1986). The Khwe 
appear to be most closely related to the Herero supporting observations that the Khwe 
phenotypically resemble Bantu-speakers rather than Khoe-San groups. The Khwe cluster 
closer with the Western Bantu speaking Herero group than with the Eastern-Bantu 
speaking Ngwato. The Dama also clearly cluster with the Bantu-speakers confirming 
historic accounts that the Dama were similar genetically to Bantu-speaking people before 
they adopted Nama as a language, possibly a consequence of their enslavement by the 
Nama. This study together with others (Nurse et al., 1976; Nurse and Jenkins, 1977) have 
thus shown that the Khwe (as well as the Dama) have genetic profiles that are more similar 
to Bantu-speakers. In a further study it was found that the Khwe most closely resemble 
their Mbukushu neighbours and an Ambo chiefdom (the Ndonga) (Nurse and Jenkins, 
1977). 
 
The first axis on the PCA plot contains more than half of the total variation (56%) and 
summarises the Bantu-speaking versus Khoe-San variance component. The Bantu-
 Speaking groups cluster to the one side of this component while the northern San (NS) 
groups occupy the other extreme. Even though the Dama speak a Khoe language they 
show very little admixture from Khoe-San groups. The Kgalagari is known to live in close 
 45 
contact with the /Gui and //Gana and the relative higher Khoe-San contribution into this 
group is evident in the first component. The Khwe, although grouping with the Bantu-
 Speaking groups show a higher contribution from the Khoe-San variance component than 
most of the Bantu-Speaking groups. The Nama, Baster and Coloured groups are located 
between the San and Bantu-Speaking groups on the first component, indicating higher 
admixture from Bantu-Speaking groups into these groups. This evidence of geneflow from 
Bantu-Speaking into the Khoe and Khoe descendant groups are expected, due to 
enhanced contact between these groups as a consequence of the sharing of the pastoralist 
culture (versus the absence of pastoralism in the San groups). With exception of the 
Vasekele !Xun and the //Gana, all the other northern and central San groups contain little 
Bantu-Speaking admixture. As mentioned before, the Vasekele !Xun lived in close 
association with the local Ambo (Ovambo) population for centuries, from whom they 
learned crop cultivation and herding. The higher Bantu-Speaking component in this group 
is therefore not surprising. Also, the higher Bantu-Speaking component in the //Gana 
(G//ana) is expected since it is believed by the //Gana themselves that they originated from 
a intermixing of the /Gui (G/wi) and the Kgalagari. 
 
The second and third component seem to both summarise a component of variance that 
exist between the Khoe groups and the !Xun groups. The second component (21%) 
summarises variation between the !Xun of the Kavango valley in northern Namibia and the 
Khoe groups. The /Gui, //Gana, Ju\?hoansi, ?X?ao//??esi and Naro occupy intermediate 
positions with the /Gui, //Gana located more to the !Xun side and the other three groups 
more towards the Khoe side. The third component separate the Vasekele !Xun from the 
Khoe groups with the Ju\?hoansi, ?X?ao//??esi and Naro intermediate. 
 
This thesis will investigate the genetic relationships between different Khoe and San 
groups further to see if the mitochondrial, Y-chromosome and autosomal genetic variation 
reflect group affinities that were apparent from serological studies. Through analysis of the 
genetic systems this study will investigate the genetic relatedness of the Khwe to the other 
Khoe-San populations and to the Bantu-speakers. The amount of admixture from Bantu-
 speakers into different Khoe-San groups will also be analysed. Furthermore this study will 
investigate if the genetic systems suggest differences between San and Khoe populations. 
 46 
The study will also focus on how physical distance between groups influence their genetic 
relatedness, since cluster analyses of serological studies suggest a strong influence. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 1.3 A  Cluster analysis of distance matrix data from Jenkins (1986).  
11 loci in 18 populations. NS ? Northern San, CS ? Central San, BS ? Bantu-speaking 
 47 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 1.3 B  Principal Component Analysis of distance matrix data from Jenkins (1986).  
11 loci in 18 populations. (axis 1 = 56.1 % variation, axis 2 = 21.1%,  axis 3 = 8.1% variation) 
 48 
1.2.2.1.3 Commonalities between Hadza, Sandawe and Khoe-San 
Jenkins (1982) did correspondence analysis on 23 sub-Saharan populations for 11 
serogenetic systems containing 32 alleles (Jenkins, 1982). Correspondence analysis is a 
mutivariate statistical method similar to principal component analysis, except it applies to 
categorical data rather than continuous data. The analysis revealed that the Hadza group 
closest to the Babinga Pygmies and the Sandawe closest to east African Bantu-speakers 
especially the Nyaturu. The Nyaturu are the Bantu-speaking neighbours of the Sandawe 
with whom they have intermarried frequently. They, however, also show similarities to the 
Dama on axis 1 of the correspondence plot. On the second axis they are similar to the 
Keetmanshoop Nama but the presence of malaria protective alleles in the Sandawe 
separates them on the first axis from the Nama. It might be that there are some genetic 
similarities between the Khoe and the Sandawe that are masked by the extensive 
intermarrying by the Nyaturu and the effects of selection on protein coding alleles, which 
are affected by geography (Jenkins, 1982). It would be interesting to look for genetic 
similarities between these groups by looking at genetic variation that are not affected by 
selection through the study of neutral polymorphisms. 
 
As mentioned before the Sandawe have possible linguistic links with the Khoe-San, but not 
the Hadza. According to Ten Raa (1970) the southern Sandawe groups are also 
phenotypically similar to the Khoe-San while the central Sandawe groups resemble Bantu-
 speakers and Nilotes (Ten Raa, 1970). He describes the resemblance to the Khoe-San as 
follows: ?a short stature, a yellow skin, peppercorn hair, the epicanthic fold, excessive 
wrinkling of the skin at an advanced age, and a typical pentagonal Bushman-like skull: 
even steatopygia appears to occur in some women?. Phenotypic features are rarely used 
today but this description contributes to the hypothesis that Khoe-San like hunter gatherer 
groups existed from mount Kenya to the Cape of Good Hope before the Bantu-expansions 
(Ten Raa, 1970; Traunm?ller, 2003). Current physical anthropological research, however, 
found no support for morphological similarities between east African and Khoe-San groups 
(see section 1.2.1.4). 
 
 49 
 1.2.2.1.4 Khoe-San admixture into other population groups 
Immigrant Bantu-speakers had close contact with the indigenous Khoe-San people. This is 
evident form linguistic borrowings as well as morphological and genetic characteristics. The 
amount of Khoe-San admixture into various Bantu-speaking groups have been estimated 
by making use of an immunoglobulin allotype system known as Gm (Jenkins et al., 1970). 
This system contains a specific haplotype that is characteristically and almost exclusively 
Khoe-San. This specific haplotype was used to determine the amount of Khoe-San 
admixture into certain Bantu-speaking groups. The group with the highest admixture was 
the Cape Nguni (Xhosa) population, with frequencies of over 50%. Other southern African 
Bantu-speaking groups with appreciable frequencies were the Sotho/Tswana people and 
the other Nguni people. The frequency, however, declines in the more northern groups like 
the Pedi (14%) and the Tsonga (12%). The Namibia southwestern Bantu-speakers show 
very low frequencies of admixture. The Kavango group shows no admixture at all while the 
Herero and Himba show slightly elevated frequencies at about 12%. Studies in other gene 
marker systems also confirmed these proportions (Jenkins and Corfield, 1972; Jenkins, 
1974; Jenkins and Dunn, 1981).  
 
In this thesis, southwestern, southeastern and central African Bantu-speaking groups were 
included as comparative groups to the Khoe-San populations.  Although not the main focus 
of the thesis, the amount of admixture from Khoe-San groups into these Bantu-speaking 
groups will also be analysed using the different genetic systems. 
 
1.2.2.2 Mitochondrial DNA studies 
Following on the influential paper by Cann et al., concerning the value of mtDNA in 
reconstructing human origins, mtDNA studies have continued to advance our 
understanding on historical human migration routes and assessing population affinities 
(Cann et al., 1987). 
 
Mitochondrial DNA (mtDNA) is located in an extra-nuclear organelle, the mitochondria (a 
cytoplasmic organelle involved in energy production in eukaryotic cells). The mitochondrial 
genome is a circular molecule of double-stranded DNA that contains 16 569 basepairs (bp) 
(Anderson et al., 1981). Each mtDNA contain genes coding for 13 proteins, 22 transfer 
 50 
RNAs and two ribosomal RNAs (Anderson et al., 1981; Wallace, 1995). Nearly all the non-
 coding DNA of the mtDNA molecule is contained in a 1.122kb region known as the control 
region or D-loop (Anderson et al., 1981). This non-coding region has an extremely high 
mutation rate and is divided into two hypervariable regions, named hypervariable segments 
I and II (HVS-I and HVS-II). These two regions have been used extensively in phylogenetic 
studies. Their positions vary between studies but roughly correspond to base pair positions 
16024-16400 for HVS-I and 57-372 for HVS-II (Stoneking and Soodyall, 1996; Stoneking, 
2000). 
 
The mtDNA phylogeny has played a central role in locating the human maternal most 
recent common ancestor (MRCA) to sub-Saharan Africa. It also indicated an initial and 
modest spread of humans within Africa more than 100 000 years BP, a prominent 
expansion within Africa 60 000 ? 80 000 years BP, leading ultimately single dispersal wave 
out of Africa that populated the rest of the world (Forster, 2004; Reed and Tishkoff, 2006; 
Torroni et al., 2006; Behar et al., 2008). 
 
Several factors make mtDNA ideal for phylogenetic analysis over the time scale of modern 
humans, i.e. the absence of recombination, combined with a high copy number and fast 
mutation rates. A caveat, however, is that due to the inheritance from mother to child, 
mtDNA captures the history of the maternal lineage only. Another problem that arises when 
using only the mtDNA control region is that this part of the mtDNA genome is subject to 
saturation due to excessive homoplasy because of the rapid mutation rate. Furthermore 
the distribution of mutations in the control region is non-random, leading to problematic rate 
heterogeneity issues when calculating divergence date estimates (Tamura and Nei, 1993; 
Excoffier and Yang, 1999; Meyer et al., 1999). Furthermore, there is an ongoing discussion 
on whether human mtDNA evolves neutrally. An assumption behind various population 
genetic analyses is the selective neutrality of the genetic markers employed. There have 
been reports on natural selection affecting mtDNA, with temperature being highlighted as a 
possible selective force (Torroni et al., 2001; Mishmar et al., 2003; Ruiz-Pesini et al., 
2004). Several other studies, however, concluded that human mtDNA sequence variation 
has not been significantly influenced by climate (Elson et al., 2004; Kivisild et al., 2006; 
Amo and Brand, 2007; Ingman and Gyllensten, 2007; Balloux et al., 2009). Despite these 
 51 
caveats, mtDNA remains by far the most widely used genetic marker in studies of human 
populations. 
 
Various methods such as high resolution restriction fragment length polymorphisms 
(Merriwether et al., 1991; Semino et al., 1991; Soodyall and Jenkins, 1992) control region 
sequencing (Vigilant et al., 1991; Richards et al., 1996; Yao et al., 2003), and a 
combination of these two methods (Torroni et al., 1996; Torroni et al., 1998; Macaulay et 
al., 1999) have been used to screen for mtDNA variation. The analysis of whole mtDNA 
sequence data has reaffirmed the observation deduced from other methods of mtDNA 
analysis that certain mtDNA polymorphisms show geographical differentiation (Ingman et 
al., 2000; Kivisild et al., 2006; Gonder et al., 2007; Behar et al., 2008).  
 
African mtDNA haplogroups are divided into seven macro-haplogroups (L0?1?2?3?4?5?6), 
while the rest of the worlds? lineages are classified as subgroups of macro-haplogroups M, 
N and R (Figure 1.4) (Behar et al., 2008). 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 1.4  Tree showing global mtDNA macro-haplogroups according to the 
nomenclature of Behar et al., (2008) 
 52 
The first split in the human mtDNA phylogeny is between the two daughter branches, L0 
and L1?2?3?4?5?6 (L1-6), located on opposite sides of the root. They split from each other 
133 000 ? 155 000 years BP (Behar et al., 2008). The archaeological record from this 
period is too poor to reliably propose hypotheses to for this separation event. Recent 
studies, however, show that stressful climatic fluctuations known to have occurred 
throughout the MSA might have caused sporadic settlements of Homo sapiens in 
northwest Africa, the Near East, Chad, and southern Africa (Walter et al., 2000; 
Henshilwood et al., 2002; Bouzouggar et al., 2007). Today, the L1-6 branch haplogroups 
are far more widespread while the L0 haplogroups (Figure 1.5) are limited to certain sub-
 Saharan African population groups. Studies that predate the recognition of L0 as sister to 
L1-6 suggest that the spread of the haplogroups now labeled as haplogroups within L0 and 
L1 is the result of an early expansion of modern humans from a location often suggested to 
be East Africa, to most of the African continent (Maca-Meyer et al., 2001; Mishmar et al., 
2003). Traces from this early migration event was partially erased by a vast later expansion 
wave of L2 and L3 clades dated to 60 000 ? 80 000 years BP (Watson et al., 1997; Forster, 
2004). Some traces of this early structure, however, still remains among certain hunter?
 gatherer groups such as the localization of L1c1a to the Pygmy groups of central Africa 
(Quintana-Murci et al., 2008) and L0d and L0k to the Khoe-San people. 
 
Previous studies reported high frequencies of haplogroups L0d and L0k among Khoe-San 
groups (Vigilant et al., 1991; Chen et al., 2000; Tishkoff et al., 2007). Haplogroup L0d was 
found in the !Xun and Khwe at frequencies of 51% and 16%, respectively, while L0k was 
found at frequencies of 26% in the !Xun and 23% in the Khwe (Chen et al., 2000) (Table 
1.2). The same groups were examined by Tishkoff et al., and they reported frequencies of 
61% and 22% of L0d and L0k, respectively, in the combined group (Table 1.2) (Tishkoff et 
al., 2007). In addition, they found L0d at a frequency of 5% in the click speaking Sandawe 
but not in the Hadzabe population from Tanzania (Tishkoff et al., 2007).  
 
In the Ju\?hoansi from Botswana, L0d was found to be the most prevalent haplogroup 
(96%), while the remaining mtDNA lineages (4%) were resolved into haplogroup L0k 
(Vigilant et al., 1991) (Table 1.2). L0d and L0k are absent or found at low frequencies in 
other sub-Saharan African populations. Salas et al., and Perreira et al., respectively, 
 53 
reported thirteen (out of 307) and eight (out of 109) L0d individuals in Bantu-speaking 
individuals from Southeastern Africa (Pereira et al., 2001; Salas et al., 2002). There was 
also a report of an L0d individual in Lake Turkana, Kenya (Watson et al., 1997), one in an 
African American individual (Allard et al., 2005) and one from Kuwait (Behar et al., 2008).  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Table 1.2  MtDNA haplogroup frequencies in San populations studied to date 
 Haplogroup frequencies 
Haplogroup 
Ju\?hoansi 1 
(n=24) 
!Xun 2 
(n=43) 
Khwe 2 
(n=31) 
!Xun+Khwe 3 
(n=18) 
L0d 0.958 0.512 0.161 0.611 
L0k 0.042 0.256 0.226 0.222 
L0a - 0.023 0.097 - 
L1b - - 0.032 - 
L2* - - - 0.167 
L2a - - 0.097 - 
L2b - 0.047 0.065 - 
L3b - 0.116 0.032 - 
L3e - 0.047 0.290 - 
 
1 
 Vigilant et al., 1991 
2 
 Chen et al., 2000 
3 
 Tishkoff et al., 2007 
Figure 1.5  Haplogroups within the L0 macro-haplogroup according to the 
nomenclature of Behar et al., (2008) 
 54 
Gonder et al., did whole genome mitochondrial sequencing on selected individuals in the 
group reported on in Tishkoff et al., (Gonder et al., 2007; Tishkoff et al., 2007). They 
proposed that the L0d clade be split into a San L0d clade and another L0d clade present in 
Tanzania. Behar et al., have presented an updated phylogeny and nomenclature of mtDNA 
haplogroups in Africa and have subsequently shown that the L0d haplogroup consists of 
L0d1, L0d2 and L0d3 and their associated sub-haplogroups (Figure 1.6) (Behar et al., 
2008). The Tanzanian L0d sequences fall into a subgroup of L0d3, while San individuals 
are represented in all seven of the L0d sub-haplogroups (Figure 1.6). 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
When published data are classified according to the new nomenclature (Table 1.3) it 
becomes clear that the L0d sub-haplogroups do not have a homogenous distribution 
across the different San groups and need to be investigated independently. 
 
 
 
 
 
 
Figure 1.6  Sub-haplogroups within the L0d haplogroup according to the 
nomenclature of Behar et al., (2008) 
 55 
Table 1.3 Published mtDNA sub-haplogroup frequencies in San populations as fractions of the total number 
of L0d/k haplotypes in the sample group 
 Haplogroup frequencies 
Sub- 
Haplogroup 
Ju\?hoansi 1 
(n=24) 
!Xun 2 
(n=33) 
Khwe 2 
(n=12) 
!Xun+Khwe 3 
(n=15 
Bantu-speakers 4,5 
(n=21) 
L0d1a 0.083 0.091 - 0.067 - 
L0d1b 0.708 0.030 - - 0.190 
L0d1c - 0.485 - 0.467 0.095 
L0d2a 0.042 - - 0.067 0.143 
L0d2b - - - - 0.095 
L0d2c 0.125 - - 0.067 0.238 
L0d2d - - - - 0.095 
L0d3 - - - - 0.048 
L0dx - 0.061 0.417 0.067 0.048 
Unclassified - - - - 0.048 
L0k1 0.042 0.333 0.583 0.267 - 
 
1 
 Vigilant et al., 1991 
2 
 Chen et al., 2000 
3 
 Tishkoff et al., 2007 
4 
 Salas et al., 2002 
5 
 Perreira et al., 2001 
 
Behar et al., proposed two hypotheses to explain how the two ancient lineages L0d and 
L0k became largely localized to the Khoe-San groups of southern Africa where they 
remained isolated from other haplogroups, for an extremely long period of between 50 000 
and 100 000 years until the development of LSA technologies (Behar et al., 2008). 
 
In the first hypothesis an initial prolonged diffusion of anatomically modern humans from 
east Africa (200 000 - 100 000 years BP) is followed by a dispersal wave (~100 000 years 
BP) of a part of the population and the localization of L0d and L0k to southern Africa. 
 
In the second hypothesis an early division in the human population (~200 000 years BP) 
resulted in the localization of L0 in southern Africa and L1-6 in eastern Africa. The eastern 
and southern populations continued to evolve separately until a dispersal event (~150 000 
- 100 000 years BP) of the L0abf group from the southern population and its merger with 
the eastern population. This resulted in the southern population composed only of L0d and 
L0k and the eastern composed of L1-6 and L0abf. 
 
Information about ancestral population sizes and population growth parameters can be 
used to infer past population demographics and expansion events. Various methods exist 
 56 
to investigate such events. Star like patterns in median networks is an indication of recent 
expansions, Watson et al., used median networks to identify star-like patterns in L2 and L3 
RFLP and D-loop sequence diversity, which suggested an expansion 60 000 ? 80 000 
years BP (Watson et al., 1997).  
 
 
Coalescent theory is another method that can be used to gain information about ancestral 
population size and growth parameters using extant genetic variation (Kingman, 1982; 
Hudson, 1990; Griffiths and Tavare, 1994). Coalescent theory was implemented to predict 
population expansions in the form of mismatch distributions using sequence data (Rogers 
and Harpending, 1992) and by testing the validity of summary statistics that predict 
expansions (Schneider and Excoffier, 1999; Rozas et al., 2003). Coalescent analysis of 
past African population demographics using RFLP and mtDNA sequence data (Harpending 
et al., 1993; Sherry et al., 1994; Excoffier and Schneider, 1999; Harpending and Rogers, 
2000; Pilkington et al., 2008) revealed that most human populations show significant signs 
of expansion around 70 000 years BP. An exception, however, were that some hunter-
 gatherer populations from different continents (including San and Pygmy populations from 
Africa) did not show these expansion signals. The lack of expansion signal around this time 
in hunter-gatherers was proposed to be due to post-Neolithic population bottlenecks that 
led to the loss of previous expansion signals (Excoffier and Schneider, 1999). In this 
theory, populations that did not go through the Neolithic transition, experienced reduction of 
effective population sizes because of competing Neolithic farmers that caused 
fragmentation of the hunter-gatherer habitat (Excoffier and Schneider, 1999). 
 
The coalescence analysis employed in mismatch distributions assumes a single 
exponentially growing population and contained large degrees of statistical uncertainty. 
Also, by applying these methods earlier population expansions can be obscured by recent 
population bottlenecks (Excoffier and Schneider, 1999). Recent improvements in 
coalescence inference methods led to increased accuracy, without the need to assume a 
single exponential growth curve (Shapiro et al., 2004; Atkinson et al., 2008). The Bayesian 
skyline plot (BSP) is a useful coalescence procedure that uses Bayesian inference to infer 
effective population sizes through time (Drummond et al., 2005; Drummond and Rambaut, 
2007). 
 57 
 
Atkinson et al., constructed BSP for the four most common African mtDNA macro-
 haplogroups, L0, L1, L2 and L3 using whole mitochondrial genomes (Atkinson et al., 2009). 
The four haplogroups revealed very different patterns of growth. The patterns of 
haplogroups L2 and L3 were significantly different from each other and from haplogroups 
L0 and L1. Both haplogroups showed signals of rapid expansions albeit at different times. 
While L2 occurred at relatively low frequencies until a sudden period of fast growth 
beginning 12 000 ? 20 000 years BP, L3 had a rapid expansion phase from its time to most 
recent common ancestor (TMRCA), 61 000 ? 86 000 years BP onwards (Atkinson et al., 
2009).  
 
Haplogroups L0 and L1 on the contrary showed slow constant growth over the last 100 000 
? 200 000 years BP. Growth patterns throughout the history of these two lineages are not 
significantly different from each other. This would be expected if both lineages formed part 
of an early panmictic African population that contributed equally to the current African 
mtDNA diversity. Various studies, however, support population structure deep in the 
mtDNA tree based on the localization of L0d and L0k to the Khoe-San speakers of 
southern Africa (Knight et al., 2003; Tishkoff et al., 2007; Behar et al., 2008). The BSP 
analysis of Atkinson et al., showed that, if such deep population structure exists, it did not 
generate considerably different population growth profiles between these early lineages 
(Atkinson et al., 2009). A BSP for L0d and L0k lineages only compared to the L0 and L1 
growth profile also was not significantly different. This indicates that these suggested deep 
divergence events were probably not connected to substantial changes in available 
territory or mode of living (Atkinson et al., 2009).  
 
This thesis will use mitochondrial DNA variation from the different included population 
groups to infer mitochondrial DNA phylogenies and networks. These will be used to group 
the mtDNA genomes into haplogroups and to compare the haplogroup profiles of the 
different populations in the study group. Furthermore, different coalescence methods will 
be employed to look for population expansions and contractions in the different 
haplogroups as well as in the different population groups. 
 
 58 
1.2.2.3 Y-chromosome studies 
Similar to the mitochondrial genome, large parts of the Y-chromosome is haplotypic and 
does not undergo recombination. The Y-chromosome contains the largest non-recombining 
block in the human genome and is therefore extremely important for evolutionary genetic 
studies. While studies on mtDNA describe the maternal history of a population, the paternal 
history can be described through using the Y-chromosome. The first Y-chromosome 
polymorphism was reported in 1985 (Casanova et al., 1985) but more then a decade 
elapsed before a well-resolved Y-chromosome tree was available (Underhill et al., 2000; 
Hammer et al., 2001a; YCC, 2002).  
 
Y-chromosome tree structure 
The Y-chromosome tree structure is primarily based on binary polymorphisms, and specific 
branches are assigned to haplogroups following a hierarchical pattern. Currently the Y-
 Chromosome tree consists of 20 major clades (Figure 1.7) containing 311 distinct 
haplogroups defined by 599 mutational events (Karafet et al., 2008). Furthermore, by 
typing Y-chromosome short tandem repeats (Y-STRs) haplotypes are generated, which are 
then used for finer resolution within the haplogroups (Underhill and Kivisild, 2007).  
 
The two primary splits in the Y-chromosome tree leads to two branches, Haplogroup A and 
B, which have a distribution restricted to Africa (Figure 1.7). These two clades are 
genetically diverse and its haplogroups have different geographical distribution patterns. 
This suggests population fragmentation, isolation and re-expansions in pre-historic Africa. 
Haplogroups A and B are associated with the distribution of ancient hunter-gatherer tribes 
before the expansions of pastoralists (Underhill et al., 2001; Underhill and Kivisild, 2007). 
The rest of the Y-chromosome tree is defined by the M168 mutation, which represents the 
most common African lineages (Haplogroup E) as well as all the non-African clades (Figure 
1.7). 
 
 
 
 
 
 59 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 1.7  Tree showing global Y-chromosome macro-haplogroups according to 
the nomenclature of Karafet et al., (2008) 
 60 
Haplogroup A is defined by the M91 and P97 mutations and contains 12 branches 
determined by 45 (internal) mutations (Figure 1.8). A strict regional distribution is 
particularly pronounced for haplogroup A. Within Haplogroup A; A1 is found in Mali and 
Morocco (Underhill et al., 2000; Scozzari et al., 2001), A3b2 is found in east Africa (Sudan, 
Ethiopia, Tanzania, Kenya) and in lower frequencies in north Cameroon (Scozzari et al., 
1999; Underhill et al., 2000; Cruciani et al., 2002; Semino et al., 2002; Knight et al., 2003), 
while A3b1 and A2 is found exclusively among the Khoe-San (Scozzari et al., 1999; 
Underhill et al., 2000). 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Haplogroup B is defined by four mutations (M60, M181, P85, and P90) and contains 17 
branches with 28 internal markers (Figure 1.9) (Karafet et al., 2008). Haplogroup B occur 
throughout Africa but have high frequencies among Pygmies, Khoe-San and Hadza, with 
some lineages being restricted to them (Underhill et al., 2000; Cruciani et al., 2002; Semino 
et al., 2002; YCC, 2002; Knight et al., 2003). There is a clear-cut difference between the B 
haplogroups associated with the Pygmies, Khoe-San and Hadza vs. all the other African 
populations. Pygmies, Khoe-San and Hadza populations have mainly Haplogroup B 
Figure 1.8  Sub-haplogroups within haplogroup A according to the 
nomenclature of Karafet et al., (2008) 
 61 
haplotypes defined by the M112 mutation, while other populations have the M150 mutation. 
Within haplogroup B-M112, haplogroups B2b2, B2b3 and B2b4b are restricted to the 
Pygmy populations while B2b1 (P6) and B2b4a (P8) are restricted to Khoe-San groups 
(Hadza groups were not typed for B-M112 sub-groups). The B2b* ancestral haplotype 
occurs in both Pygmy and Khoe-San groups (Underhill et al., 2000; Cruciani et al., 2002; 
Semino et al., 2002; YCC, 2002; Knight et al., 2003).  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Eighteen mutations currently define haplogroup E. Haplogroup E is the most mutationally 
diverse of all the major Y-chromosome clades and contains 83 polymorphisms that define 
56 distinct haplogroups (Figure 1.10) (Karafet et al., 2008). The E haplogroups are found at 
high frequencies in Africa, at moderate frequencies in the Middle East and southern 
Europe, and has sporadic occurrences in Central and South Asia. Although Haplogroup E 
groups are widespread all over Africa, the distributions of the numerous distinctive 
haplogroups are not homogeneous across the continent (Hammer and Horai, 1995; 
Figure 1.9  Sub-haplogroups within haplogroup B according to 
the nomenclature of Karafet et al., (2008) 
 62 
Hammer et al., 1997; Qamar et al., 1999; Bosch et al., 2001; Hammer et al., 2001a; 
Underhill et al., 2001; Cruciani et al., 2002; Cruciani et al., 2004). 
 
Haplotypes carrying the mutations M75 (E2) and M33 (E1a) are present at low frequencies 
across Africa but with different individual distributions. Haplogroups E1b1a and E1b1b is 
the most frequent and widespread of the E haplogroups (Hammer et al., 2001a; Underhill 
et al., 2001; Cruciani et al., 2002; Cruciani et al., 2004; Semino et al., 2004). 
 
 
E1b1a, defined by M2 and seven other mutations is mainly limited to sub-Saharan 
populations and is associated with the expansion of Bantu-speaking populations (Hammer 
et al., 1998; Passarino et al., 1998; Scozzari et al., 1999). The E1b1a subgroups have 
differential distributions and frequencies. The M191 mutation defines the most frequent E-
 M2 subgroup and is evident of a founder effect that resulted from the Bantu-expansions 
(Hammer et al., 2001a; Underhill et al., 2001; Cruciani et al., 2002; Cruciani et al., 2004; 
Semino et al., 2004). 
 
The non-African distribution of haplogroup E is associated with haplogroup E1b1b 
characterized by the M35 and M215 mutations (Hammer et al., 1998; Semino et al., 2000; 
Underhill et al., 2001; Semino et al., 2004). This haplogroup, however, also have a 
widespread African representation (Hammer et al., 2001a; Underhill et al., 2001; Cruciani 
et al., 2002; Cruciani et al., 2004; Semino et al., 2004). Compared to other E haplogroups, 
M35 occur at very low frequencies within Bantu speakers but is widely though not uniformly 
dispersed throughout Africa. Among the different lineages carrying the M35 mutation, 
haplotypes defined by M78 occurs in east Africa, north Africa, the Middle East and Europe. 
It is the E-M35 subgroup with the highest frequency and the widest distribution outside 
Africa. This marker has a northeastern African origin and multiple exodus routes out of 
Africa have been demonstrated (Cruciani et al., 2007). M123 haplotypes are present in 
eastern Africa, northeast Africa, the Middle East and southeast Europe but does not reach 
western Europe. M81 is found at high frequencies only in northern Africa and is almost 
absent in Europe (with the exception of Sicily and Iberia) (Bosch et al., 2001; Cruciani et 
al., 2004; Semino et al., 2004). In addition to these differentiated E1b1b lineages there 
were many haplotypes that were classified as E-M35*, which occurred in high frequencies 
 63 
particularly in Ethiopian, Kenyan, Tanzanian and Khoe-San groups (Cruciani et al., 2004; 
Semino et al., 2004).  
 
Recently a new Y-chromosome polymorphism (M293) was discovered, which grouped 
these previously paraphyletic E-M35 groups into a monophyletic group (Henn et al., 2008). 
This E-M293 haplogroup has a spread concentrated in eastern and southern Africa with 
maximum frequencies in Tanzania and southern Africa. In eastern Africa high frequencies 
of M293 is observed in the Datog (43%), Burunge (28%), Sandawe (24%) and Hadza 
(11%). The Datog are pastoralists who speak a Southern Nilotic language and the Burunge 
are Afro-Asiatic agropastoralists. In southern Africa it was observed in the Khwe (31%) and 
!Xun (11%). The Khwe and the !Xun were the only Khoe-San groups included into the 
study. Network analysis revealed haplotype sharing and close similarities between 
Khwe/!Xun haplotypes and Hadza/Sandawe haplotypes. M293 also occurs at low to 
moderate frequencies in Bantu-speaking populations of eastern and southern Africa, which 
likely reflects recent admixture with local populations after the Bantu-expansions (Henn et 
al., 2008).  
 
E-M293 data from the study supported a demic diffusion model correlated with the spread 
of sheep, cows and pottery along a Tsetse fly free corridor between eastern and southern 
Africa, 2 000 years BP (Sadr, 1998; Gifford-Gonzalez, 2000; Smith, 2005; Henn et al., 
2008). A previous model where pastoralism was transmitted from eastern Africa to 
southern-central Africa with little to no population movement was thus rejected (Sadr, 1998; 
Smith, 2005). The new model suggested that a small pastoralist population carrying M293 
migrated from east Africa into southern-central Africa with their livestock (Henn et al., 
2008). After arriving in southern Africa, these pastoralists could have mixed with local 
populations, or expanded without substantial genetic exchange with local groups. Without 
representation of more Khoe-San groups the study, however, could not address the 
question of how pastoralism spread after it reached south-central Africa. The scale of the 
migration from east Africa may have been small, minimally four E-M293 male individuals. It 
is possible that other male individuals who did not carry M293 were also involved. For 
instance E-M2 individuals could have been involved but it would not be possible to 
distinguish these from the E-M2 introduced later during the Bantu-expansions. The Henn et 
 64 
al., study thus supports a migration, independent of the Bantu expansion, of east Africans 
harbouring the E-M293 marker, which initially brought pastoralism to southern Africa (Henn 
et al., 2008). 
 
 65 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 1.10  Sub-haplogroups within haplogroup E according to the nomenclature of Karafet et al., 
(2008) 
 66 
The age of the Y-chromosome tree 
The TMRCA of the human Y-chromosome tree has been determined by using 
microsatellites (Wilson and Balding, 1998; Pritchard et al., 1999) and by sequencing parts 
of the Y-chromosome (Thomson et al., 2000). Both the sequence based and microsatellite 
based studies supported a model of exponential growth for the Y-chromosome and also 
found substantial continental structure in the data. The microsatellite studies estimated a 
TMRCA between 46 000 and 91 000 years, depending on the different mutation models 
used. This is a very young date compared to what is predicted by the TMRCA of 
mitochondria, X-chromosome and autosomes (Horai, 1995; Harding et al., 1997; Harris 
and Hey, 1999; Kaessmann et al., 1999). The mutation mechanisms of microsatellites are 
not understood very well and this might lead to dating errors. This young date was, 
however, confirmed by sequencing three genes in the Y-chromosome. The sequence data 
was analyzed using a coalescent approach and yielded a TMRCA of 59 000 years 
(Thomson et al., 2000). 
 
Reasons for this young date might be that the ancestral population was very small or that 
the Y-chromosome is subject to strong selection. The strong selection might be in the form 
of positive selection for advantageous mutations (hitchhiking) or negative selection against 
deleterious mutations (background selection) (Thomson et al., 2000). A low effective 
population (Ne), which will lead to a younger date, could also have been caused by higher 
variance in male reproductive success (Wilder et al., 2004). 
 
Y-chromosome studies in the Khoe-San 
Published studies on the Khoe-San people covered three sample collections, the 
Platfontein Khwe and !Xun (Scozzari et al., 1997), a mixed group of Ju\?hoansi and !Xun 
(Underhill et al., 2000) and a mixed group of Ju\?hoansi, !Xun, Khwe, Nama and Dama 
(Hammer et al., 1997; Wood et al., 2005). The group of Khwe and !Xun was originally 
reported in (Scozzari et al., 1997) and reanalyzed or included in various other studies 
(Scozzari et al., 1999; Cruciani et al., 2002; Jobling and Tyler-Smith, 2003; Knight et al., 
2003; Cruciani et al., 2004; Henn et al., 2008). The mixed group of Ju\?hoansi and !Xun 
was originally reported in (Underhill et al., 2000) and were also subsequently included in 
various comparative studies (Underhill et al., 2001; Semino et al., 2002; Knight et al., 
 67 
2003). The mixed Khoe-San group reported in Wood et al., included Khoe-San groups 
reported on in Hammer et al., (Hammer et al., 1997; Hammer et al., 2001a; Wood et al., 
2005). Mainly the African haplogroups (A, B and E) have been found in varying frequencies 
in the above-mentioned published Khoe-San groups (Table 1.4). These studies revealed 
that: 
- the Khoe-San people carry high frequencies of the most ancient lineages on the Y-
 chromosome tree (Haplogroup A and B),  
- some of these lineages are exclusive to the Khoe-San (A-M51 and A-M23 derived 
lineages),  
- other lineages in these ancient clades (A and B) was also identified in high 
frequencies in other populations with recent hunter-gatherer ancestry (such as the 
Pygmy populations and the Hadza and Sandawe of east Africa) 
- Khoe-San populations have varying frequencies of Bantu-speaking associated 
haplogroups and the more isolated populations (such as the Ju\?hoansi) have lower 
frequencies of these haplogroups. 
- The Khwe population is different from the other San populations in that they have 
lower frequencies of haplogroups A and B, higher frequencies of the Bantu-
 associated E haplogroups and higher frequencies of the E-M35 haplogroup.  
 
Both Y-chromosome binary polymorphisms and Y-chromosome microsatellites will be 
used in this thesis to assign the Y-chromosomes from the various population groups to 
haplogroups. Y-chromosome haplogroup profiles will thereafter be compared between 
the different Khoe-San groups to assess their relatedness. Furthermore haplogroup 
profiles will be compared to neighboring groups thereby investigating amount of 
admixture. Additionally, the published high frequencies of specific sub-groups of 
haplogroup A and B will be evaluated against frequencies seen in Khoe-San groups 
from this study. The prevalence and spread of the E-M35* haplogroup will also be 
examined to try to infer the spread of pastoralism. 
 
 
 68 
Table 1.4  Y-chromosome haplogroup frequencies (%) of Khoe-San populations studied to date 
 
1 
 Scozzari et al., (1997)  
2 
 Underhill et al., (2000) 
3 
 Wood et al., (2005) 
 
 
Y-chromosome and mtDNA comparative studies 
The haploid nature of the mtDNA and Y-chromosome allow us to study the history maternal 
and paternal lineages separately because of their unilateral transmission. It further allows 
us to compare their dynamics and deduce female vs. male migration rates and effective 
population sizes. 
 
Wood et al., investigated the effects of male vs. female gene flow in various African 
populations (Wood et al., 2005). Mantel tests and AMOVA analysis found strong 
correlations between Y-chromosome genetic distance and linguistic distance, but no 
correlation between Y-chromosome genetic distance and geographic distance. Conversely 
the mitochondrial genetic distances between populations showed weak correlations with 
both geographic distance as well as linguistic distance. When Bantu speakers were 
removed, however, the correlation with linguistic variation disappears for the Y-
 chromosome and strengthens for mtDNA (Wood et al., 2005). 
 
 Haplogroup frequencies (%) 
Haplogroup 
!Xun 1 
(n=64) 
Khwe 1 
(n=26) 
Ju\?hoansi and !Xun 2  
(n=39) 
Mixed Khoe-San 3 
(n=90) 
A-M51 28 12 28 22 
A-M14 5 - 13 14 
A-M114 3 - 3 - 
A-P28 - - - 11 
B-M182 - - - 1 
B-M112 
 
8 
 
- 
 
28 
 
13 
(P6 = 9 and P7 = 4) 
E*-SRY4064 - - - 1 
E-M75 - - - 1 
E-M54  - - 1 
E-M85 6 4 - - 
E-M2 23 50 18 14 
E-M191 16 - - 10 
E-M154 - 4 - - 
E-M35 11 31 10 7 
J-12f2 -  - 1 
R-M343 -  - 2 
 69 
From this it is clear that patterns between different populations vary. Differences in mtDNA 
and Y-chromosome gene-flow can be extrapolated to sociocultural practices in the 
populations involved. Seielstad et al., inferred patrilocality in African populations based on 
the fact that inter-population variation was much higher based on Y-chromosome variation 
than mtDNA variation (Seielstad et al., 1998). From this higher female than male migration 
rates were calculated. A study by Hammer et al., however, found contrasting results 
(Hammer et al., 2001a). The gene flow in this study was male biased and supported a 
greater mobility of male individuals that led to lower inter-population Y-chromosome 
distances than mtDNA distances. The discrepancy between the two studies was explained 
by the fact that the Seielstad et al., study only considered food-producing populations while 
the study of Hammer et al., included hunter-gatherer populations (Destro-Bisol et al., 
2004). In a comparative study between food producers and hunter-gatherers a marked 
heterogeneity in terms of distribution of the unilaterally transmitted markers was found 
(Destro-Bisol et al., 2004). While in food producers the gene flow was female biased 
because of patrilocality, hunter-gatherer populations had a male biased gene-flow. The 
male biased gene-flow in hunter-gatherers was explained as a combined effect of 
asymmetric gene flow between the food producers and hunter-gatherers as well as 
different levels of polygyny and patrilocality between the two groups (Destro-Bisol et al., 
2004). 
 
Wood and colleagues (2005) also investigated the paternal and maternal signatures in 
food-producers and hunter-gatherers by sequencing parts of the mitochondrial genome 
and Y-chromosome. The resultant data also supported dissimilar male and female histories 
and differences in hunter-gatherers and food-producers. For mitochondrial data the food 
producers fit a model of population expansion and the hunter-gatherers a model of 
population stationarity, while for the Y-chromosome both populations best fit a model of 
constant population size. The reasons proposed for the dissimilar Y-chromosomal and 
mtDNA results, were that food-producers in the past had a smaller effective population 
sizes (Ne) and lower migration rates (m) than hunter-gatherers. Cultural practices that lead 
to a lower Nem are polygyny and patrilocality (Wood et al., 2005). 
 
 70 
Polygyny leads to variance in reproductive success between males, which lower their Ne 
relative to females (Low, 1988; Wilder et al., 2004). Generally food-producers are 
described as more polygynous than hunter-gatherers (Cavalli-Sforza, 1986; Biesele and 
Royal, 1999). Additionally, patrilocality can result in lower rates of male migration 
(Murdock, 1981). Most agricultural societies are patrilocal (Murdock, 1967), but hunter-
 gatherer groups are bilocal, (spending time living with both the male?s and the female?s 
families (Marlowe, 2004)). These processes would have changed and shifted as 
populations converted from foraging to food-producing lifestyles. This may have played an 
important role in the distinctive patterns observed for mtDNA and the Y-chromosome. 
 
In the present thesis correlations between physical geographic distances and genetic 
distances will be done for lineages representing male lines (Y-chromosomes) as well as 
lineages representing the female lines (mtDNA). Positive correlations between Y-
 chromosome genetic distances and physical distances are expected if the geneflow is male 
biased, as was seen previously in food producers (Seielstad et al., 1998; Destro-Bisol et 
al., 2004). On the contrary, if gene-flow is female biased, we expect to see correlations 
between physical geographic distance and mtDNA genetic distance, as was previously 
seen in hunter-gatherer societies (Hammer et al., 2001a; Destro-Bisol et al., 2004).   
 
1.2.2.4 Autosomal DNA studies 
Compared to Y-chromosome and mtDNA phylogenetic studies, studies on the autosomes 
are complicated because of recombination. This problem can be partly overcome by 
studying short stretches of linked polymorphisms and inferring haplotypes. The inference of 
haplotypes has been made easy by the development of various algorithms that use 
homozygous group frequencies to infer the phase of heterozygous loci (Excoffier and 
Slatkin, 1995; Stephens et al., 2001; Niu et al., 2002; Scheet and Stephens, 2006). 
Consequently these short stretches of inferred haplotypes can be treated as lineages in the 
same way that the non-recombining mtDNA and Y-chromosome DNA are treated. An early 
example of a autosomal haplotype study is the 2.7 kb region on chromosome 11 that 
encompass the ?-globin gene (Harding et al., 1997). The phylogeny obtained from the 326 
haplotypes reflected results from Y-chromosome and mitochondrial studies. The root of the 
tree was in Africa with many lineages that were exclusive to Africa. Since then several 
 71 
other loci have been studied, all supporting an African root (Clark et al., 1998; Harris and 
Hey, 1999; Harding et al., 2000). Similar to mtDNA and Y-chromosome studies, however, 
the history and dynamics of the lineage under investigation is the history coupled to a 
certain locus and is reflective of only a small part of the genome. Some of these loci might 
be heavily influenced by selection, which would violate the assumptions of population 
genetic models and in the end would not give a true picture of the population history. 
Ultimately to get the true history of a population or the human species, one should take into 
account all of the separate loci.  
 
Another way to utilize information contained in the autosomes is to use genotypes of 
unlinked markers spread over the whole genome, instead of inferred haplotypes. Through 
using AMOVA analysis on such multilocus genotypes (microsatellites, single nucleotide 
polymorphisms (SNPs) and insertion/deletions) it was found that 79-94% (depending on 
the marker type) of variation represents variation between individuals within the same 
population (Barbujani et al., 1997; Jorde et al., 2000; Romualdi et al., 2002; Rosenberg et 
al., 2002). This thus means that genotypic variation is not homogenous across the human 
species but 21-6% of the variation is due to differences between populations and 
continental groups. This led to the question of whether a genotype from an individual can 
be correctly assigned to the correct population or continent of origin. 
 
The earliest method that explored this question was by calculating pairwise individual 
distances based on allele sharing (Bowcock et al., 1994). These distances were then used 
to construct a tree of genotypes from individuals without taking into account any prior of 
population origin. The aim was to see if the tree shows clusters according to populations or 
continents. The tree that resulted correctly assigned 88% of genotypes to continent specific 
clusters. The population specificity was less precise but 64% of populations formed 
clusters that included more than half of their individuals (Bowcock et al., 1994). Since then 
more powerful genotype assigning methods have been developed (Pritchard et al., 2000; 
Corander et al., 2003; Falush et al., 2003; Francois et al., 2006; Falush et al., 2007).  
 
A widely used technique, implemented in the program STRUCTURE, is based on the 
Bayesian clustering of individuals into K number of clusters (Pritchard et al., 2000; Falush 
 72 
et al., 2003; Falush et al., 2007). The user specifies the K number of clusters and the 
program assign a genotype or a proportion of a genotype to a certain cluster with a certain 
probability. A signature of population structure will then emerge (if there is structure) 
through the unequal assignment of individuals or partial genotypes to certain clusters. For 
instance: If K=2, the program will divide the total variation of the whole study group 
optimally into two clusters and then assign each individual with a certain probability to each 
of the two clusters. When there is structure in the sample group, individuals from 
population x will be preferentially assigned to a certain cluster, for instance cluster 1, while 
individuals of population y, will be preferentially assigned to cluster 2. If, for instance 
individuals from population z resulted from a admixture event between population x and y, 
these individuals will be assigned with certain probabilities to both clusters 1 and 2 
depending on the marker contribution from each population into the individual. When an 
admixture model is assumed, individuals are not assigned to a cluster with a certain 
probability; rather a part of their genome (made up by the markers included in the study) is 
assigned to a certain cluster. The procedure usually followed when running STRUCTURE 
is to assign K clusters from K=2 to K=10 and then test which K number of clusters has the 
highest likelihood by looking at the posterior likelihood scores or by using the deltaK 
method that takes into account the rate of change between successive K clusters (Evanno 
et al., 2005). 
 
The first genotypic studies were based on limited number of markers and individuals 
(Bowcock et al., 1987; Nei and Livshits, 1989; Bowcock et al., 1991a; Bowcock et al., 
1991b). RFLPs were individually typed from isolated DNA, which were cloned or 
transformed to increase the quantity. These laborious processes limited the experimental 
size. During the past 20 years, however, techniques rapidly developed that enabled high 
throughput marker typing. The newest techniques are able to type thousands of markers 
(Jakobsson et al., 2008; Li et al., 2008; Tishkoff et al., 2009). 
 
Despite the small size of the first studies it was immediately apparent that African and non-
 African genetic variation represent the earliest diversion in human history (Bowcock et al., 
1987; Nei and Livshits, 1989; Bowcock et al., 1991a; Bowcock et al., 1991b; Bowcock et 
al., 1994). Africans had higher levels of nucleotide diversity compared to non-Africans. 
 73 
Furthermore, the genetic diversity in non-African populations represents a subset of the 
genetic diversity in sub-Saharan Africa. Also, more private alleles and haplotypes are 
observed in Africa than in other regions. All of this strongly supported the Out of Africa 
model that was suggested by mitochondrial studies. Additionally, these low-resolution 
studies were already able to distinguish individuals on a continental basis. 
 
Increasing the number of loci, increased the accuracy of the continental assignment of 
genotypes and facilitated the emergence of sub-clusters which correspond to populations 
within continents (Rosenberg et al., 2002; Rosenberg et al., 2005; Jakobsson et al., 2008; 
Li et al., 2008; Tishkoff et al., 2009). Most of these studies utilized the HGDP-CEPH panel 
(Rosenberg et al., 2002; Rosenberg et al., 2005; Jakobsson et al., 2008; Li et al., 2008). 
The panel consists of cell lines of 1064 individuals from 51 populations from sub-Saharan 
Africa, North Africa, Europe, the Middle East, South/Central Asia, East Asia, Oceania, and 
the Americas (Cann et al., 2002). This data set is freely available and allows a detailed 
characterization of worldwide genetic variation. The Khoe-San representation in this panel 
is, however, limited. Only seven individuals from a location south of Tsumkwe in Namibia 
are included in the panel. These individuals are indicated as ?San relatives? and based on 
the geographic location probably belong to the Ju\?hoansi or the  ?X?ao//??esi groups.  
 
The study by Rosenberg et al., used 377 autosomal microsatellite loci on the HGDP-CEPH 
panel (Rosenberg et al., 2002). They found that worldwide variation could be clustered into 
six clusters of which five correspond to major geographic locations. Furthermore they could 
infer sub-clusters within these major regions. The sub-Saharan African cluster is optimally 
divided into four sub-clusters, which represent Bantu-speaking + pre Bantu-speakers, San, 
Mbuti Pygmy and Biaka Pygmy clusters (Rosenberg et al., 2002; Rosenberg et al., 2005).  
 
Li et al., used 650 000 SNP markers on the HGDP-CEPH panel and also found clustering 
into the five continental groups at K=5 (Li et al., 2008). At K=6 south/central Asia separates 
from Europe and the Middle East; and at K=7 the Middle East separates from Europe. 
Many populations, however, have representation from more than one cluster. This can be 
an indication of recent admixture or shared ancestry before divergence. Additionally, PCA 
showed that the largest part of variation (56%) can be summarised as variation between 
 74 
African and non-African populations. In the population distance tree, African populations 
lay closest to the root. The San group forms the earliest branch, followed by the Mbuti 
Pygmy group, the Biaka Pygmy group and thereafter the Bantu-speakers + pre-Bantu-
 speakers (Li et al., 2008).  
 
Jakobsson et al., typed both SNPs (525 910) and STRs (396) on the HGDP-CEPH panel 
(Jakobsson et al., 2008). This study found that on the global scale, similar to Rosenberg et 
al., (2002), populations optimally grouped into six clusters of which five correspond to 
major geographic regions. The clustering within Africa, however, yielded interesting results. 
While Rosenberg et al., optimally identified four clusters corresponding to the two Pygmy 
groups, the San and the Bantu- and pre-Bantu speakers, Jakobsson et al., only identified 
three clusters. One of the three clusters represented Bantu- and pre-Bantu-speakers 
grouped. The Bantu-speakers from South Africa showed the largest contributions from the 
San/Pygmy clusters additional to the Bantu-Speaking cluster, followed by the Kenyans, the 
Yoruba and the Mandenka. The remaining two clusters were present at highest frequency 
in the Pygmy and San populations. Aside from small amounts of admixture from the 
Bantu/pre-Bantu speaking cluster, the Mbuti belonged almost exclusively to one of these 
clusters. The Biaka predominantly belonged to a third cluster but also had large 
contributions from the Mbuti cluster. The San contained both the Mbuti and Biaka cluster 
but with a larger contribution from the Mbuti cluster. It thus appears that the San and Mbuti 
group are closer related (Jakobsson et al., 2008). 
 
A recent study by Tishkoff et al., included 2 432 African individuals from 113 geographically 
diverse populations (Tishkoff et al., 2009). For evaluation against non-African groups the 
HGDP-CEPH panel was also included. The San group representation was better in this 
study, with a group of !Xun/Khwe samples included in addition to the HGDP-CEPH San 
samples. Additionally, a group of mixed Cape Coloured individuals were also typed. In 
these samples 1 327 polymorphic markers (microsatellites and insertion/deletions) were 
typed. Similar to previous studies, African populations contained the highest levels of 
genetic diversity. Globally, diversity declines with distance from Africa. Within Africa, the 
Pygmy and San populations had the highest genetic diversities, while the San groups had 
the most private alleles. In the tree analysis, the two Khoe-San populations cluster together 
 75 
and are most distant from the other populations. The Cape Coloured population shows 
high levels of non-African admixture and are located between African and non-African 
groups. Using PCA, 72 significant global Principal Components (PCs) were identified. The 
first PC (19.5%) separates African from non-African populations. The Hadza is separated 
from other populations at PC3 (3.5%).  
 
Using STRUCTURE analysis, the populations showed clustering according to major 
geographic region, both on a global scale and within Africa (Tishkoff et al., 2009). Globally 
14 ancestral population clusters were identified, while nine of these were found in Africa 
(Tishkoff et al., 2009). A cluster emerged (at K=5) that is present in the Hadza, and to a 
lesser extent the Pygmy, San and Sandawe hunter-gatherers. Subsequently (at K=6) the 
cluster split into a Hadza/Sandawe and Pygmy/Khoe-San cluster. The Mbuti Pygmy and 
San groups split from the other Pygmy groups at K=11, indicating common ancestry 
between these groups. Results from this study showed that the San, Hadza, Sandawe and 
Pygmy populations contain shared genetic variation that distinguishes them from other 
African populations (Tishkoff et al., 2009). This led to the suggestion that these groups are 
the remnants of a proto-Khoe-San/Pygmy/Hadza/Sandawe population of hunter-gatherers. 
MtDNA and Y-chromosome analysis suggest a divergence of >35 000 years BP  (Semino 
et al., 2002; Gonder et al., 2007; Tishkoff et al., 2007; Behar et al., 2008; Tishkoff et al., 
2009).  
 
The Hadza are genetically the most distinct from the other African groups (Tishkoff et al., 
2009), which is consistent with linguistic evidence that the Hadza language is unrelated to 
other Khoisan languages (Sands, 1998; G?ldemann and Elderkin, Forthcoming; 
G?ldemann, In Press). The Hadza is an isolated population that had little interaction with 
surrounding groups and has maintained their hunter-gatherer lifestyle up to recent times. 
They show only very low levels of asymmetric gene flow from surrounding groups. The 
Sandawe on the other hand adopted mixed farming practices and show evidence of bi-
 directional gene flow with neighboring groups (Newman, 1995). Populations from northern 
Tanzania, Southern Ethiopia and northern Kenya show evidence of the Sandawe 
associated genetic cluster (Tishkoff et al., 2009). Aside from the association proven by 
autosomal DNA results, other commonalities between these two east African groups and 
 76 
the Khoe-San groups are: the language connection between Sandawe and Khoisan, 
similarities between Tanzanian and San rock art, the Sandawe formerly performed a trance 
dance similar to San trance dances and there is evidence of pan San believe system 
across al of southern Africa to as far north as Zimbabwe (Huffman, 1983; Lewis-Williams, 
1986; G?ldemann and Elderkin, Forthcoming). 
 
The clustering of the Khoe-San groups with the Pygmies (Tishkoff et al., 2009) suggests 
that they may have a common genetic history. Pygmy populations might have had a 
Khoisan related language before it was replaced by Bantu-speaking language. 
Anthropological support for this theory comes from the shared music styles between the 
Khoe-San and Pygmy groups (Lomax, 1968; Tishkoff et al., 2009). The San populations 
show a closer shared genetic ancestry to the Mbuti Pygmy than the Biaka Pygmy groups 
(Jakobsson et al., 2008; Tishkoff et al., 2009). The Mbuti lives in the Ituri rainforest of the 
eastern DRC while the Biaka (also called Baka, part of the Mbenga group) live to the west 
of the Mbuti in Cameroon, Gabon and the Republic of Congo (Figure 1.11). Another main 
group of Pygmies, the Twa or Ba-Twa and Cwa, live in dispersed groups south-central to 
the Mbuti and Mbenga (Cavalli-Sforza, 1986). These groups live in swamps and deserts far 
from the forest, there are no genetic data available for them, and it is not known if they are 
indigenous to the area or more recent migrants from the forest. It may be that before the 
Bantu-expansions, these Pygmy groups formed a continuous network of related groups 
that also had contact and gene-flow with their Khoe-San neighbours to the south and 
Hadza and Sandawe neighbours to the east. 
 
To summarise; the genetic evidence emerging from the cluster analysis regarding the 
hunter gatherer populations, support linguistic data that suggest that Khoe-San ancestors 
may once have extended from Somalia through eastern Africa and into southern Africa and 
possibly also into western Africa (Ambrose, 1982; Tishkoff et al., 2009; G?ldemann and 
Elderkin, Forthcoming).  
 
 
 
 
 77 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
To better understand the evolutionary history of the Khoe-San, this study has made use of 
a number of autosomal SNPs that were typed in the various representative Khoe-San 
groups and their neighbours. These SNPs are spread over all of the 22 autosomes and are 
thus representative of the whole autosomal genome. By typing and analysing these SNPs 
we expect to find an intermediary picture of population structures and affinities compared to 
what we will find for the Y-chromosome and mtDNA, which represents the male and female 
lineage histories. 
 
 
 
 
 
 
 
 
 
Figure 1.11  Distribution of Pygmies according to Cavalli-Sforza (1986). The Hadza do not 
form part of the Pygmy groups but are included to indicate proximity.  
Map obtained from Wikipedia (http://en.wikipedia.org/wiki/Pygmy) 
 78 
1.3 Aims 
 
In this thesis the genetic structure of some living Khoe and San populations will be 
examined making use off different genetic markers (mtDNA, Y-chromosome and 
autosomal DNA).  The study critically examines how females (mtDNA) and males (Y-
 chromosome) have contributed in shaping the gene pool of Khoe and San populations. The 
additional investigation of autosomal DNA markers will give an all-inclusive view of the 
population structures within the Khoe and San. The three genetic systems will also give 
insight to the amount and mode of admixture from various neighbouring population groups 
into the Khoe-San groups. An assessment of the ancestral association of San and Khoe 
populations will be implemented using various analytical methods. The resultant 
information from the genetic data will then be discussed in conjunction with linguistic, 
archaeological, historical and anthropological data to contribute to the writing of the history 
of the Khoe and the San.  
 
In previous sections certain aspects about the presently known history of the Khoe-San 
where highlighted and elaborated upon. Other disciplines have contributed most of these 
historical perspectives regarding the Khoe-San and the aim of this thesis is to address 
these aspects from a genetic point of view. In particular the following fields will be 
concentrated on: 
 
- Evidence of genetic distinction between the groups that represent the linguistic Ju, 
Tuu and Khoe divisions 
The grouping of the Khoe-San into separate populations is largely based on a linguistic 
classification system. In sections 1.1.1 and 1.2.1.1 the linguistic classification system is 
reviewed in detail in conjunction with the demography and geographic localization of the 
groups involved. The history of the Khoe and San populations based on inference form the 
linguistic classification is discussed in sections 1.2.1.1 and 1.2.1.2. Linguistics supports a 
hierarchical relatedness of Khoe-San groups within the three main branches of the Khoisan 
linguistic family (Ju, Tuu and Khoe). It further supports the possibility that the Ju and Tuu 
branches may share a very deep common ancestor and were associated with the original 
 79 
San hunter-gathers, while the Khoe branch was introduced to the area later in conjunction 
with pastoralism. 
 
This study aims to investigate if the genetic relatedness between the groups correlate with 
the classification based on linguistics. The genetic relatedness of representatives from the 
three main Khoisan linguistic branches will be evaluated to see if they are closer related to 
each other than to representatives from other linguistic branches.  
 
- Evidence of a relationship between physical geographic distance and genetic 
distance between groups regarding males and females in hunter-gatherer 
communities 
Serological studies (discussed in section 1.2.2.1) suggested relatedness between different 
Khoe-San groups based on geographical distance rather than linguistics. In this thesis the 
relationship between genetic distance and physical geographic distance for all three of the 
genetic systems (mtDNA, Y-chromosome, autosomal) will be investigated. Section 1.2.2.3 
discussed that results from previous studies suggested that either Y-chromosomal genetic 
distance (male line) or mtDNA genetic distance (female line) shows a correlation with 
geographic distance depending on if the population involved are food-producers or hunter-
 gatherers. Food-producers practice patrilocality, which limit male migration and cause 
strong correlations between the Y-chromosome genetic distance and physical distance. 
The reverse case applies to hunter-gatherers where the mtDNA genetic distance correlates 
with geographic distance but not the Y-chromosome genetic distance. In this study 
correlations between genetic and physical geographic distances between the different 
genetic systems will be considered to identify dissimilarity between the female and male 
migration histories. 
 
-  The genetic affinities of the Khwe population 
The Khwe group is discussed in sections 1.1.1.2.2 and 1.1.1.5.6. Although the Khwe speak 
a Khoe language their classification as a Khoe-San group has been questioned. They 
phenotypically resemble Bantu-speakers and it is not clear if they are Khoe-San groups 
with extensive Bantu-speaking admixture, Bantu-speakers that lost their cattle, another 
pastoralist population closely related to Bantu-speakers who occupied the region before 
 80 
the Bantu expansions or a mixture of various refugee groups driven from the grazing 
grounds into the Okavango swamps. Serological studies (section 1.2.2.1.2) found them to 
be closely related to Bantu-speakers. Y-chromosome studies (discussed in section 1.2.2.3) 
suggested that the Khwe might be related to east African groups who introduced 
pastoralism into southern Africa (possibly together with the Khoe languages). 
 
By typing the three genetic systems in the Khwe and comparing their genetic profile with 
other Khoe-San groups as well as Bantu-speakers this study will aim to establish the 
genetic identity of the Khwe. Furthermore, Y-chromosome evidence will be assessed to 
evaluate the claim that the Khwe are descendent from east African populations who 
introduced pastoralism into southern Africa.  
 
- Whether genetic evidence supports a cultural or demic diffusion of pastoralism. 
The possibility of a combination of a cultural/linguistic and demic diffusion and the 
likelihood of gender biased demic diffusion will also be looked at 
Both archaeology and linguistics contributed to the theory of how pastoralism spread from 
the area of northern Botswana towards the south (see sections 1.2.1.3 and 1.2.1.2 for 
discussion). Y-chromosome genetic studies suggested how pastoralism was introduced 
into northern Botswana from east Africa (discussed in section 1.2.2.3). Without 
representation of more Khoe-San groups the study, however, could not address the 
question of how pastoralism spread after it reached the area around northern Botswana.  
 
Linguistics couples the large amount of variation and dialects in the Khoe language branch 
to a rapid expansion related to the spread of pastoralism. According to the linguistic theory 
(see section 1.2.1.2) pastoralism was introduced to northern Botswana by a group from 
east Africa (link that exist between Khoisan and Sandawe). Thereafter there was a rapid 
diversification of the language that formed the Kalahari Khoe branches. It is not known if 
the language expansion and diversification that formed the Kalahari Khoe branches are 
correlated with the diversification and expansion of the east African immigrant groups. 
Thus, it is not sure if all the groups that speak the Kalahari Khoe branches are descendant 
form the east African immigrants or descendant from hunter-gatherers that adopted 
pastoralism and language from the east African immigrants with limited admixture.  
 81 
Thereafter pastoralism, the Khoe language and possibly the pastoralist groups themselves 
spread south into the present day Cape Province of South Africa. Here the KhoeKhoe 
branch diverged from the Kalahari Khoe branches by incorporating elements from the !Ui 
language group from the Tuu linguistic division spoken by resident San hunter-gatherers.  
 
The archaeological explanation for the spread of pastoralism (see section 1.2.1.3) is based 
on the introduction of pottery and sheep remains in the archeological record. Two 
alternative routes were suggested for the southern spread of the pastoralists based on a 
demic diffusion model. Certain aspects in the archaeological record, however, suggest that 
a clear-cut demic diffusion model might not be the best explanation (see section 1.2.1.3).   
 
Neither archaeology nor linguistics can conclusively prove whether the spread of 
pastoralism is associated with a demic diffusion of populations together with the pastoralist 
culture or a diffusion of the culture on its own. An intermediate model where only few 
individuals, perhaps only males, spread and transferred the pastoralist tradition and their 
language to resident hunter-gatherer groups further south is also possible. A genetic 
approach using male specific (Y-chromosome) and female specific (mtDNA) markers 
would be employed in this thesis. A specific Y-chromosome marker was coupled to the 
introduction of pastoralism, and this marker was strongly associated with the Khwe 
population (see section 1.2.2.3). Therefore, this Y-chromosome profile as well as the male 
and female profile of the Khwe will be examined in this study to see if Khwe associated Y-
 chromosome and/or mtDNA markers are prevalent in the Khoe groups and other southern 
Khoe-San groups. 
 
- Investigation if population growth signals in genetic data reflects population 
expansions in the archaeological record 
- Following from above? if these signals give an indication of a recent population 
contraction due to a post-Neolithic population bottleneck induced by pastoralist 
groups 
The archaeological record contributed extensively to the inference of Khoe-San history. In 
section 1.2.1.3 a broad overview of the Stone Age history from southern Africa is 
presented. Temporally associated population expansions are discussed in conjunction with 
 82 
the factors that possibly caused these expansions. In the present theses various methods 
will be used to infer expansion signals in the mtDNA sequence data. These signals will be 
dated to specific times in the past and correlated with the archaeological record to identify 
possible temporal overlaps. 
 
Section 1.2.1.3 discussed how published data explained results from mismatch 
distributions by inferring post-Neolithic bottlenecks in hunter-gatherer societies induced by 
pastoralists. In the archaeological community there is also disparity about how in-moving 
pastoralists affected hunter-gatherer communities (see section 1.2.1.3). This study will 
investigate mtDNA sequence data for evidence of recent bottlenecks associated with in-
 moving pastoralists. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 83 
2. SUBJECTS AND METHODS 
 
2.1 Subjects 
 
A total of 551 individuals were included in this study for Y-chromosome, mitochondrial and 
autosomal SNP screening. An additional 161 individuals were included for the validation of 
a mitochondrial minisequencing panel. All samples were collected with the subjects? 
informed consent. This study was approved by the Human Research Ethics Committee 
(Medical) at the University of Witwatersrand, Johannesburg, South Africa (Himla Soodyall, 
Protocol Number M980553 and Carina Schlebusch, Protocol Number M050902; Appendix 
A). The study and the participation of San individuals were furthermore approved by the 
South African San Council and the Working Group of Indigenous Minorities in Southern 
Africa (WIMSA). The additional 161 DNA samples used for the validation of the 
minisequencing panel were contributed by Prof. S.W. van der Merwe from the Department 
of Immunology, University of Pretoria as part of a collaborative project with Prof. Soodyall. 
 
Due to gender, family relations or missing data particular individuals were excluded from 
certain parts of the analyses. Table 2.1 and Figure 2.1 summarise the numbers of 
individuals included in the different parts of the project, their place of sampling, their 
population group and the population group code used throughout this manuscript. 
 
At the time of sampling 10 ml of blood were collected into EDTA tubes or buccal swabs 
were taken from volunteers. Information from the subjects on their place of birth, the 
language spoken by them and by their parents and their self-classified ethnicity were 
collected. This information was used to group individuals. 
 
 
 
 
 
 
 84 
 
Table 2.1  Number of individuals in which mtDNA, Y-Chromosome and autosomal variation were examined, 
their group and group-code, and place of sampling and origin 
Group name 
Group 
code 
Place of sampling 
(Country) 
Place of origin 
If different from 
place of sampling 
N 
(mtDNA) 
N 
(Y-Chr) 
N 
(Autosomal 
SNPs) 
* Karretjie people KAR Colesberg (SA)  30 19 25 
* Karoo Coloured COL Colesberg (SA)  77 35 22 
# Cape Coloured CAC Wellington (SA)  20 3 20 
* ?Khomani KHO Askham (SA)  57 37 - 
* Northern Cape Coloured CNC Askham (SA)  40 23 - 
* //Xegwi XEG Chrissiesmeer (SA)  3 3 - 
* Duma San DUM   1  - 
# Nama NAM Windhoek (NM)  28 14 28 
# /Gui, //Gana and Kgalagari GUG Kutse Game reserve (BT)  22 19 21 
* Naro NAR Johannesburg (SA) Ghanzi (BT) 2 2 - 
# Ju\?hoansi JOH Tsumkwe (NM)  42 28 41 
# !Xun XUN 
Omega camp  (NM) and 
Schmidtsdrift (SA) 
 49 48 45 
# Khwe KWE 
Omega camp  (NM) and 
Schmidtsdrift (SA) 
 18 13 19 
# Manyanga DRC Luozi (DRC)  14 14 14 
# Herero HER (NM)  15 15 14 
#* Sotho, Tswana SOT Various (SA)  22 21 See SEB 
* Swazi SWZ Chrissiesmeer (SA)  5 2 See SEB 
#* Zulu, Xhosa ZUX Various (SA)  36 30 See SEB 
#* Afrikaner AFR Various (SA)  21 13 15 
#* European EUR Various (SA) 
Europe and 
Canada 
11 3 15 
#* Indian IND Various (SA)  25 11 25 
#* South-eastern Bantu-
 speakers 
SEB Various (SA)    48 
       
Total    538 353 352 
 
AN ? Angola    * Collected during field trip conducted by author 
BT ? Botswana    # Other samples collected by Prof?s H. Soodyall and T. Jenkins 
DRC ? Democratic Republic of Congo 
NM ? Namibia 
SA ? South Africa 
 
 
 85 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 2.1  Map indicating the place of origin for the Coloured and Khoe-San individuals who 
participated in the study     
 86 
 
The Coloured, Khoe and San groups were collected on specific sampling trips at specific 
locations. The South-eastern Bantu-speaking individuals, Afrikaner and European 
individuals were assembled from various sampling groups and originate from various 
locations. They, together with the Herero and Manyanga, were used as comparative data 
to test admixture proportions into the San, Khoe and Coloured groups. 
 
During the collection in Colesberg, the KAR samples were collected at the ?outspans? of 
the Karretjie people, while the COL samples were from the Coloured school in the 
Lowryville township adjacent to Colesberg. The samples collected in and around Askham 
were also divided into two groups. Individuals who indicated their ethnicity as Coloured, 
Griqua or Nama were assigned to the CNC group while the individuals who identified 
themselves and/or their parents as ?Khomani or ?Bushmen? were assigned to the KHO 
group. The KHO and CNC samples were not collected at the time when autosomal SNP 
work were being conducted and are therefore absent from the autosomal SNP analyses. 
 
The //Xegwi, Duma and Naro groups had very few representative individuals and were only 
used in individual based analysis, and not in group based analyses. The CAC group was 
also excluded from group-based analysis for the Y-chromosome since the group contained 
only three males. The EUR and AFR groups were combined into the AFE group for Y-
 chromosome analysis due to low number of males. 
 
The /Gui, //Gana and Kgalagari (GUG) were a mixed group of San and Bantu-speaking 
individuals who had ancestries from both /Gui and //Gana San groups as well as the 
Kgalagari Bantu-speaking group. 
 
The 161 additional samples screened for the validation of the minisequencing panel 
included 156 individuals contributed by Prof. S. van der Merwe and 5 additional individuals 
from the HGDDRU laboratory. The samples contributed by Prof. van der Merwe included 
29 Khoe-San individuals (9 ?Khomani, 11 !Xun, 7 Khwe and 2 unspecified San) and 127 
south-eastern Bantu-speakers (SEB) from various ethnic groups. The extra 5 individuals 
 87 
from the HGDDRU laboratory comprised 3 individuals from Zanzibar and 2 additional SEB 
individuals. 
 
2.2 Methods 
 
Details of reagents used in molecular methods are available in Appendix B. 
 
2.2.1 DNA extraction 
 
DNA was extracted from either EDTA-blood or buccal swabs.  
 
DNA extraction from EDTA-blood was done using the salting-out method as described in 
Miller 1988 (Miller et al., 1988) with some modifications. The modified procedure is 
described as follows. After thawing EDTA-blood tubes, the blood was decanted into 
centrifugation tubes and filled to the 30 ml mark with chilled Sucrose-Triton-X Lysing buffer. 
The tubes were inverted several times to mix and centrifuged for 15 min @ 1000 g (4?C). 
The supernatant was discarded and 20 ml Sucrose-Triton-X Lysing buffer was added to the 
pellet. Tubes were then vortexed to break up the pellet and put on ice for 5 min. After 
centrifugation for 10 min @ 1000 g (4?C) the supernatant was again discarded. The pellet 
was resuspended and digested overnight at 42?C with 1.5 ml T20E5, 0.1 ml 10% SDS and 
0.25 ml freshly made Proteinase-K mix. After digestion, 0.5 ml of saturated NaCl was 
added to each tube, shaken vigorously for 15 s and put on ice for 10 min, followed by 
centrifugation for 30 min @ 1000 g (4?C). The DNA-containing supernatant was poured 
into a clean tube and the protein pellet was discarded. Two volumes of room temperature 
100% ethanol were added and the tube was gently agitated. The visible DNA was spooled, 
washed in 70% ice-cold ethanol and transferred to an empty Eppendorf tube. After air-
 drying for 30 min the DNA was resuspended in 500-1000 ?l TE buffer. DNA was allowed to 
dissolve overnight before quantification.  
 
If no DNA was visible after precipitation with 100% ethanol the following procedure was 
followed. The tube was left at -20?C overnight. The next day the tube was centrifuged for 
 88 
20 min @ 1000 g (4?C), the supernatant discarded and 10 ml of 70% ethanol was added to 
the pellet. The tube was centrifuged for 20 min @ 1000 g (4?C), the supernatant discarded 
and the pellet allowed to air-dry. After air-drying the DNA was resuspended in 100-200 ?l 
TE buffer. DNA was allowed to dissolve overnight before quantification. 
 
Extraction from buccal swabs was done using the PureGene? Genomic DNA Purification 
Kit (Gentra Systems) according to the manufacturer?s instructions.  
 
DNA was quantified using the NanoDrop ND-1000 Spectrophotometer (Coleman 
Technologies Inc., LabVIEW?) and diluted to the required concentration using double 
distilled water (ddH2O). 
 
2.2.2 MtDNA methods 
 
To assign individuals to mitochondrial haplogroups, two approaches were followed. A 
mitochondrial minisequencing panel was designed to target specific polymorphisms in the 
mtDNA coding region. The minisequencing panel allocates the mtDNA to one of the 10 
major macro-haplogroups found in mitochondrial variation worldwide. The design and 
implementation of the minisequencing panel have recently been published (Schlebusch et 
al., 2009). Secondly HVS-I and HVS-II were sequenced to further classify the haplotype 
into sub-haplogroups and to be used in phylogenetic and population genetic analyses.  
  
2.2.2.1 MtDNA minisequencing method 
The minisequencing procedure is based on a single base extension of an unlabelled 
primer. The reaction mix contains ddNTPs labeled with four different colours and a fifth 
colour is used for the internal lane standard (LIZ 120). The primers are designed to bind 
directly adjacent to the 5? side of the mutation of interest. During the extension cycles the 
primer is extended by only one basepair, which is the colour labeled ddNTP. Primers are 
designed to differ in size by attaching poly(GATC) tails to the hybridization part. When the 
products are separated on the Genetic Analyser an elecropherogram of different sized 
peaks result. The colour of the peak indicates the allele present at the site of interest. 
 89 
 
For the design of the minisequencing protocol the ABI PRISM? SNaPshotTM Multiplex Kit 
was used and the general guidelines of the Protocol were followed with some minor 
modifications. Whereas the supplier?s protocol was originally optimized using POP-4 
polymer, our method was optimized using the POP-7 polymer using suggestions proposed 
by Applied Biosystems in a subsequent bulletin (Applied Biosystems Manual P/N: 
4367258).  
 
The minisequencing protocol was designed to distinguish between the seven African L 
mitochondrial macro-haplogroups as well as the three non-African macro-haplogroups M, 
N and R. The panel tests for 14 SNP variations that define these 10 macro-haplogroups 
(Figure 2.2). It was designed in such a way that for every split in the tree there is a SNP 
that defines both branches in the split. For instance where L1 splits from the rest of the tree 
there is a SNP defining the L1 branch (Figure 2.2) and a SNP defining the L2-6 clade 
(Figure 2.2). 
 90 
Figure 2.2  Tree showing the 10 mtDNA macro-haplogroups (dark-grey) that are distinguished by typing 14 SNPs (light-grey). L4 (*) is identified by 
a HVS polymorphism that is not included in the panel. R and F indicate whether the reverse or forward primer orientation was used. Branch 
nomenclature on the tree is according to Behar et al., (2008) 
 91 
2.2.2.1.1 PCR-multiplex amplification 
The PCR-multiplex preceding the minisequencing reaction consisted of the simultaneous 
amplification of 6 PCR fragments of various lengths (Table 2.2). The binding sites for the 
14 minisequencing primers are contained within these 6 fragments (SNPs that are closely 
positioned are co-amplified in the same amplicon). Multiplex primers were designed by 
selecting specific regions and adjusting amplicon lengths in order to correlate annealing 
temperatures to allow for multiplexing. Only regions without sequence polymorphism or low 
amounts of sequence polymorphism were considered as possible primer binding sites. 
Primers for the multiplex were designed using Primer 3 software (Rozen and Skaletsky, 
2000) and checked with the Autodimer program (Vallone and Butler, 2004) (Reverse and 
forward primer sequences in Table 2.2). PCR Multiplex Primers were manufactured and 
HPLC purified by Metabion. The primers were diluted to 100?M, and stored at -20?C. 
 
The concentration of primers, MgCl2 and DNA template were optimised. The reaction was 
not sensitive to variation in DNA concentration as long as amounts above 5 ng were used. 
In the final optimised PCR procedure, the reaction volume was 25 ?l, including 10 ng DNA 
template, 1 ?l of premixed 25x primer mix (see Table 2.2 for final reaction concentrations), 
2 U FastStart Taq (Roche Applied Science), 1x FastStart Taq buffer (containing no added 
MgCl2), 3,5 mM MgCl2, 0.3 mM dNTPs and ddH2O to make up the reaction volume to 25 
?l.  
 
Thermal cycling conditions were as follows: Initial step at 95?C for 6 min followed by 35 
cycles of denaturation at 95?C for 1 min 30 s, annealing at 60?C for 1 min 30 s and 
amplification at 72?C for 2 min; final extension for 10 min at 72?C. All the PCR reactions 
were performed on a 9700 GeneAmp? PCR System (Perkin-Elmer, Applied Biosystems). 
During optimisation PCR product sizes were checked on a 2% agarose gel with ethidium 
bromide staining (1 x TBE running buffer, Bromophenol blue Ficoll dye loading buffer, 1Kb 
DNA ladder size standard (Gibco BRL).  
 
Post PCR purification was done by adding 1 U of Shrimp Alkaline Phosphatase (USB 
Corporation) and 2 U of Exonuclease I (New England Biolabs) to 5 ?l PCR product in a 
 92 
total reaction volume of 7 ?l. The reaction was incubated at 37?C for 1 h followed by 15 min 
at 75?C for enzyme inactivation.  
 
Table 2.2   Primer sequences, binding sites, amplicon sizes and concentrations for multiplex PCR 
amplification of 6 fragments 
Primer name Amplicon 
size (bp) 
PCR primer sequences (5? - 3?) Mitochondrial 
region of primer 
binding site * 
Final 
Concentration 
(?M) ** 
     
MTSS_1f 210 CCGGCGTAAAGAGTGTTTTAGAT 931-953 0.04 
MTSS_1r  TTCTGGCGAGCAGTTTTGTT 1121-1140 0.04 
MTSS_2f 502 CCCTATTCTCAGGCTACACCCTA 7096-7118 0.03 
MTSS_2r  TGCATGTGCCATTAAGATATATAGGA 7572-7597 0.03 
MTSS_3f 1051 CAGTGAAATGCCCCAACTAAATAC 8359-8382 0.05 
MTSS_3r  TGGTATGTGCTTTCTCGTGTTAC 9387-9409 0.05 
MTSS_4f 868 CTCTTTTAGTATAAATAGTACCGTTAACTTCC 9992-10023 0.20 
MTSS_4r  TAATTAGGCTGTGGGTGGTTGT 10838-10859 0.20 
MTSS_5f 1577 CAGCTATCCATTGGTCTTAGGC 12281-12302 0.20 
MTSS_5r  TAGGTAGTTGAGGTCTAGGGCTGTTA 13832-13857 0.20 
MTSS_6f 672 CCACGACCAATGATATGAAAAAC 14694-14716 0.03 
MTSS_6r  TGTTTGATCCCGTTTCGTG 15347-15365 0.03 
     
 
* Numbering according to the revised Cambridge reference sequence. 
** The final concentration of the primers in the reaction mix 
 
 
2.2.2.1.2 Minisequencing reaction 
Minisequencing primers were designed using Primer 3 software (Rozen and Skaletsky, 
2000) and checked with the Autodimer program (Vallone and Butler, 2004) 
(minisequencing primer sequences in Table 2.3). In minisequencing, primer sizes and 
different fluorochrome colours are important in the separation and detection of the 
extension products. Therefore, the primers were designed to be of varying lengths (at least 
5 bp) through the addition of poly (dGACT) tails at the 5? end to ensure good separation in 
the electropherogram (Table 2.3). Minisequencing primers were manufactured and HPLC 
purified by Metabion. 
 
The minisequencing reaction had a total volume of 5 ?l containing 1.5 ?l of purified PCR 
product, 1 ?l of ABI PRISM? SNaPshotTM Multiplex Ready Reaction Mix, 1 ?l of premixed 
 93 
5x minisequencing primer mix (see Table 2.3 for final concentrations) and 1.5 ?l ddH2O. 
Thermal cycling was performed for 35 cycles with denaturation at 96?C for 10 s, annealing 
at 50?C for 5 s and extension at 60?C for 30 s.  
 
Post extension treatment was done in a total volume of 7 ?l containing 5 ?l minisequencing 
reaction product, 0.5 U Shrimp Alkaline Phosphatase (USB Corporation), 1x Shrimp 
Alkaline Phosphatase buffer and ddH2O to make up the reaction volume. The reaction was 
incubated at 37?C for 1 h followed by 15 min at 75?C for enzyme inactivation. 
 
Two ?l of cleaned minisequencing reaction product was then mixed with 7.5 ?l Hi-Di 
formamide (Applied Biosystems) and 0.5 ?l of GeneScan-LIZ 120 internal size standard 
(Applied Biosystems). After a denaturing step for 2 min at 95?C followed by cooling to 4?C 
the fragments were separated on an ABI PRISM? 3130xl Genetic Analyzer (Applied 
Biosystems) according to ABI PRISM? SNaPshotTM Multiplex Kit instructions and analysed 
using GeneMapperID v3.2 software (Applied Biosystems). The resultant electropherogram 
displayed the different sized products (Table 2.4 gives the expected band sizes and peak 
colours).
 94 
Table 2.3   Minisequencing primers used to distinguish haplogroups L0-L6, M, N and R 
PCR amplicon 
Primers 
(see Table 
2.2) 
SNP 
sequence 
variation 
Haplogroup 
resolved 
(see tree in 
Figure 2.2) 
Minisequencing primer sequences (5? - 3?) * 
Minisequencing 
primer 
orientation 
Mitochondrial 
region of 
primer binding 
site ** 
Final 
concentration 
(?M) *** 
   
 
   
MTSS1 F+R 1018G L3 (GATC)CAGATATGTTAAAGCCACTTTCGTAGT R 1019-1045 0.20 
MTSS1 F+R 1048C L1-6 CCC(GATC)2CCAGTTTGGGTCTTAGCTATTGTGT R 1049-1073 0.10 
MTSS2 F+R 7256C L3?4 (GATC)5CGATGCATACACCACATGAAA F 7235-7255 0.20 
MTSS2 F+R 7521G L3?4?6 (GATC)4TGACAAAGTTATGAAATGGTTTTTCTAATA R 7522-7551 0.20 
MTSS3 F+R 8468C L2-6 (GATC)6CCAACTAAAAATATTAAACACAAACTACCAC F 8473-8467 0.20 
MTSS3 F+R 8701A N (GATC)11CTAATCAAACTAACCTCAAAACAAATGATA F 8671-8700 0.40 
MTSS3 F+R 9347G L0 (GATC)9ATTGGTATATGGTTAGTGTGTTGGTTAG R 9348-9375 0.20 
MTSS4 F+R 10115C L2 (GATC)10AACACCCTCCTAGCCTTACTACTAATAAT F 10086-10114 0.20 
MTSS4 F+R 10810T L2?3?4?6 (GATC)12CAACAATTATATTACTACCACTGACATGACT F 10779-10809 0.14 
MTSS5 F+R 12432T L5 CC(GATC)15CAATGGATTTTACATAATGGGG R 12433-12454 0.50 
MTSS5 F+R 12705C R CC(GATC)14CGGTAACTAAGATTAGTATGGTAATTAGGAA R 12706-12736 0.50 
MTSS5 F+R 13789C L1 C(GATC)18CGAGGGCTGTGAGTTTTAGGT R 13790-13810 0.50 
MTSS6 F+R 14783C M CCC(GATC)18CGCAAAATTAACCCCCTAATAAAA F 14759-14782 0.50 
MTSS6 F+R 15289C L6 C(GATC)20ACCCTCACACGATTCTTTACCTT F 15266-15288 0.30 
   
 
   
* The non-specific primer tail is underlined and in italic 
** Numbering according to the revised Cambridge reference sequence. 
*** The final concentration of the primers in the reaction mix 
 95 
Table 2.4   Chromatogram band profile for identifying haplogroups L0-L6, M, N and R 
Mutation 
Electropherogram 
Band size 
Haplogroup resolved 
(See tree in Figure 
2.2) 
Primer 
Orientation 
Peak color 
Negative 
Peak color 
Positive 
1018G 31 L3 R t-red c-black 
1048C 36 L1-6 R a-green g-blue 
7256C 41 L3?4 F t-red c-black 
7521G 46 L3?4?6 R t-red c-black 
8468C 55 L2-6 F t-red c-black 
9347G 64 L0 R t-red c-black 
10115C 69 L2 F t-red c-black 
8701A 74 N F g-blue a-green 
10810T 79 L2?3?4?6 F c-black t-red 
12432T 84 L5 R g-blue a-green 
12705C 89 R R a-green g-blue 
13789C 94 L1 R a-green g-blue 
14783C 99 M F t-red c-black 
15289C 104 L6 F t-red c-black 
 96 
2.2.2.2 HVS amplification and sequencing 
Mitochondrial sequencing of HVS-I and II were done to cover regions 16024-16569 for 
HVS-I and 57-630 for HVS-II. The amplification and sequencing were done according to 
two previously published methods (Vigilant et al., 1989; Behar et al., 2007). Initially the 
protocol of (Vigilant et al., 1989) was followed and later replaced by the protocol of (Behar 
et al., 2007). Amplification and sequencing primers are shown in Table 2.5 and procedures 
in Table 2.6. 
 
Table 2.5   Sequences of primers used to amplify and sequence HVS-I and II 
Primer Description Primer sequence 5?-3? Reference 
PCR primers 
L15996 PCR forward CTCCACCATTAGCACCCAAGC (Vigilant et al., 1989) 
H408 PCR reverse CTGTTAAAAGTGCATACCGCCA (Vigilant et al., 1989) 
15876F PCR forward TCAAATGGGCCTGTCCTTGTAG (Behar et al., 2007) 
639R PCR reverse GGGTGATGTGAGCCCGTCTA (Behar et al., 2007) 
 
Cycle sequencing primers 
L15996 HVS-I forward CTCCACCATTAGCACCCAAGC (Vigilant et al., 1989) 
H16401 HVS-I reverse TGATTTCACGGAGGATGGTG (Vigilant et al., 1989) 
15946F HVS-I forward CAAGGACAAATCAGAGAAAA (Behar et al., 2007) 
132R HVS-I reverse GACAGATACTGCGACATAGG (Behar et al., 2007) 
L29 HVS-II forward GGTCTATCACCCTCTTAACCAC (Vigilant et al., 1989) 
H408 HVS-II reverse CTGTTAAAAGTGCATACCGCCA (Vigilant et al., 1989) 
639R HVS-II reverse GGGTGATGTGAGCCCGTCTA (Behar et al., 2007) 
    
 
 
 
 
 
 
 
 
 
 97 
Table 2.6   PCR ingredients and cycling conditions for amplification and sequencing of HVS-I and II. Final 
concentrations of ingredients are shown 
Description Concentrations / Conditions according to: 
 
PCR 
 
 
Ingredients (Vigilant et al., 1989) (Behar et al., 2007) 
   
DNA  ~50 ng ~50 ng 
FastStart 10x Buffer  
(with added MgCl2) 
1 x 1 x 
Primer 1 0.4 ?M 0.4 ?M 
Primer 2 0.4 ?M 0.4 ?M 
dNTP?s 0.1 mM 0.1 mM 
BSA 1 mg/ml - 
FastStart Taq   
(Roche Applied Science) 
1 U 1 U 
Total volume 50 ?l 50 ?l 
   
Cycling conditions 
Temperature (?C) 
Time 
(min:sec) Cycles Temperature (?C) 
Time 
(min:sec) Cycles 
Initiation 95 5:00  95 5:00   
Denaturation 94 1:00 95 0:30 
Annealing 56 1:00 55 0:30 
Extension 74 1:00 
 
30 
72 2:00 
 
35 
 
Final extension 74 10:00  72 10:00   
Hold 4 Hold  4 Hold   
       
   
Cycle sequencing   
   
Ingredients (Vigilant et al., 1989) (Behar et al., 2007) 
   
PCR product 4-8 ?l 2 ?l 
Big Dye  4 ?l 1 ?l 
Primer 0.165 ?M 0.33 ?M 
Total volume 20 ?l 10 ?l 
   
Cycling conditions 
Temperature (?C) 
Time 
(min:sec) Cycles Temperature (?C) 
Time 
(min:sec) Cycles 
Initiation    96 1:00   
Denaturation 96 0:30 96 0:10 
Annealing 50 0:15 50 0:05 
Extension 60 4:00 
 
25 
 60 4:00 
 
25 
 
Hold 4 Hold  4 Hold   
 
 98 
PCR cleanup was performed using MultiScreen? PCR?96 Plates (Millipore) according to kit 
instructions. Product sizes of the PCR were checked on a 2% agarose gel with ethidium 
bromide staining (1 x TBE running buffer, Bromophenol blue Ficoll dye loading buffer, 1Kb 
DNA ladder (Gibco BRL) size standard). Sequencing reaction cleanup was done using 
Montage SEQ96 Sequencing Reaction Cleanup Plates (Millipore). All thermal cycling were 
performed on a 9700 GeneAmp? PCR System (Perkin-Elmer, Applied Biosystems). 
Sequencing products were separated on an ABI PRISM? 3130xl Genetic Analyzer 
(Applied Biosystems) and analysed using Sequencing Analysis Software v5.2 (Applied 
Biosystems) 
 
2.2.2.3 MtDNA data analysis 
The designed minisequencing method was used to group samples in their major 
haplogroups. Further classification was achieved by analysing HVS-I and II. 
 
HVS-I and II sequences were aligned to the control region reference sequence (Andrews et 
al., 1999) using the Clustal W algorithm (Thompson et al., 1994) implemented in BioEdit 
v.7.0.5.3 (Hall, 1999). HVS-I and II sequences (15997-16569 and 57-607) were then 
combined into one sequence of 1124 bp for further analysis. Unique haplotypes were 
identified using DnaSP v4.10 (Rozas et al., 2003) and variant sites were recorded 
electronically using S-compare (Nelson, 2006). Using the variant positions together with a 
phylogenetic approach, haplogrouping was done according to the nomenclature of Behar 
2008 (Behar et al., 2008).  
 
Variation in the HVS-II region 303-315 were not considered or reported in any of the 
analyses. Insertions in the poly C repeat track at position 568-573 where taken as a 1 bp C 
insertion. All other regions were considered albeit some regions were differentially 
weighted as outlined in the analysis description. 
 
Phylogenetic tree analyses of sequences were done through Maximum likelihood analysis 
using PHYML (Guindon et al., 2005). The HKY substitution model with Gamma distributed 
rates and Invariable sites, received the best likelihood prediction through likelihood ratio 
 99 
tests using Modeltest 3.7 (Posada and Crandall, 1998) in conjunction with PAUP v4.0b10 
(Swofford, 1998) and were implemented in the Maximum likelihood analysis. The tree 
topology search employed was nearest neighbour interchange (NNI). An approximate 
likelihood ratio test (aLRT) was computed to determine branch support (Anisimova and 
Gascuel, 2006). Trees were visualized in MEGA4 (Tamura et al., 2007). 
 
Networks of the sequences were constructed using the Median Joining algorithm (Bandelt 
et al., 1999) of Network v4.5.0.0 (Fluxus-engineering, 2008). Networks were subjected to 
maximum parsimony post-analysis using the Steiner maximum parsimony algorithm (Polzin 
and Daneschmand, 2003) within Network 4.5.0.0. For network analysis the epsilon 
parameter (Network program parameter for quick calculation of sparse networks), was set 
to 2 and transversions were weighted 3x the weight of transitions. Furthermore the weight 
of the 16189 position was reduced 10x and the weight each of the CA repeats at position 
523 was reduced 5x per nucleotide in the repeat.  
 
Sequences from other sources included in phylogenetic and network analyses were 
Neanderthal (Genbank accession number: NC_011137) (Green et al., 2008) and the 
control region reference sequence (Andrews et al., 1999). Additional L0d sequences 
published in the literature (Gonder et al., 2007; Tishkoff et al., 2007; Behar et al., 2008) 
were included in the L0d network to compare our results with. Sequences from Gonder et 
al., and Tishkoff et al., had overlap in some of the subjects and only one of the two in each 
case were selected (Gonder et al., 2007; Tishkoff et al., 2007). 
 
Time estimates of L0d subgroups were calculated using the Rho statistic (Forster et al., 
1996) with the associated standard deviation, sigma (Saillard et al., 2000), using a 
mutation rate of 2.5 x 10-6 per nucleotide per generation (Ward et al., 1991) (25 yrs per 
generation; 1124 nucleotides). Time estimates were also calculated using other published 
mutation rates (i.e. 1.75 x 10-6 per nucleotide per generation (Horai et al., 1995); 4.5 x 10-6 
per nucleotide per generation (Forster et al., 1996); 2.1 x 10-6 per nucleotide per generation 
(Soodyall et al., 1996) but because of its intermediate value the mutation rate of Ward et 
al., was used in subsequent discussions and analyses (Ward et al., 1991). A generation 
time of 25 years was used throughout. 
 100 
 
Haplogroup isofrequency maps were generated applying the Kriging method (Oliver and 
Webster, 1990; Xue et al., 2005) incorporated in the Surfer v.8.06.39 program (Golden-
 Software, 2006). Mitochondrial contour plots were based on frequencies of the L0d/k 
subgroups on the background of the L0d/k group as a whole. This was done to eliminate 
the effects that admixture from Bantu-speakers and non-Africans would have on the 
distribution of the L0d/k subgroups. When frequencies were calculated, sample size effects 
were corrected by adjusting the total sample sizes in all groups to the same value. 
 
Mismatch distributions of populations and haplogroups were calculated in Arlequin v.3.11 
(Excoffier et al., 2005). From these the validity of demographic expansions and the date of 
expansions were inferred. The demographic expansion scenario is tested through 
simulating a population going through an expansion and testing whether the actual data is 
significantly different from the simulated expansion scenario. A non-significant Sum of 
Squared deviation (SSD) p-value will therefore indicate a population/group of sequences 
that went through an expansion. Parameters calculated are ?1 , ?0 , and ?. Dividing ?1 by ?0 
give an indication of the magnitude of the expansion while ? gives an indication of the time 
of the expansion. The mutation rate of 2.5 x 10-6 per nucleotide per generation (Ward et al., 
1991) and a generation time of 25 years were used to convert ? (Tau) to T (Time BP when 
expansion took place) by using the equation T= (?/2?) x generation time. In the equation ? 
is the mutation rate per gene per generation i.e 2.5 x 10-6 per nucleotide per generation 
(Ward et al., 1991) x 1124 sites results in ? = 2.81 x 10-3. 
 
The summary statistics; number of sequences, haplotype number, gene diversity (Nei, 
1987) and nucleotide diversity (Nei, 1987), for each group were calculated in DnaSP v4.10 
(Rozas et al., 2003). Using DnaSP v4.10, the population mutation parameter (?) was 
estimated from using segregating sites (?s per nucleotide site) as well as the Waterson 
estimator (W-?s per sequence)  (Tajima, 1996). From W-?s the effective population size 
(Ne) was estimated by dividing W-?s with 2? where ? is the mutation rate per gene per 
generation of 2.81 x 10-3 (Ward et al., 1991) as explained in the previous paragraph. The 
 101 
selective neutrality tests of Tajima?s D (Tajima, 1989), Fu?s Fs statistic (Fu, 1997) and the 
R2 statistic (Ramos-Onsins and Rozas, 2002) were also calculated using DnaSP v4.10. 
 
To visually represent the effective population size changes through time, Bayesian Skyline 
Plots (BSP) (Drummond et al., 2005) were constructed. For each of the haplogroups, BSPs 
of effective population size through time were constructed using a Markov Chain Monte 
Carlo (MCMC) sampling algorithm, as implemented in BEAST v. 1.4.8 (Drummond and 
Rambaut, 2007). The population size function of the BSP can be implemented using either 
a piecewise constant or a piecewise linear function of population size change. In the 
present study, a piecewise linear model made up of 10 control points was used. The 
general time-reversible (GTR) substitution model with estimated base frequencies and a 
Gamma + Invariant Sites heterogeneity model was used to infer the ancestral gene trees 
for each haplogroup. The mean substitution rate was fixed to the rate of Ward et al., (Ward 
et al., 1991) and a relaxed molecular clock (Uncorrelated Lognormal) was employed. Each 
MCMC sampling was repeated for 40 000 000 generations, sampled every 4 000, with the 
first 4 000 000 generations discarded as burn-in. All runs had an effective sample size of at 
least 1 000 for the parameters of interest. Each independent run was repeated at least 
twice and results were combined using the LogCombiner v1.4.8 tool included in the BEAST 
package. BSPs were visualized in TRACER v. 1.4 (Rambaut and Drummond, 2007). 
 
Population pairwise differences were calculated with Arlequin v3.11 (Excoffier et al., 2005) 
by using Fst distances (Reynolds et al., 1983) incorporating the nucleotide correction 
model of Tamura and Nei (Tamura and Nei, 1993) and a gamma correction of 0.532. An 
exact test of population differentiation (Raymond and Rousset, 1995) was also calculated 
using Arlequin v3.11 (Excoffier et al., 2005). The distance matrix was visualized through 
PCA and cluster analysis in PAST v.1.54 (Hammer et al., 2001b). 
 
The relationship between physical and genetic distances were investigated in the Khoe-
 San and Coloured groups by doing a linear regression using R v.2.5.0 (R-Project, 2006). 
The regression was applied on a scatter plot resulting from pairwise comparisons of 
distance matrices based on physical and genetic distances. The linear regression model 
tests a curved line, a straight line with a gradient and a straight line with a gradient of zero 
 102 
against one another and assign significance values to each model. Additionally, a Mantel 
test implemented in Arlequin v3.11 (Excoffier et al., 2005) was also done to test the 
correlation between the two distance matrices. 
 
The physical distance matrix was constructed by obtaining latitude and longitude 
information of the different sampling locations from the website ?Google Maps Latitude, 
Longitude Popup? (Gorissen, 2008) and calculating the great circle distance (in km) 
between the points using the ?Latitude/Longitude Distance Calculation? website (Michels, 
1997). The physical distance matrix is included in Appendix C. 
 
Inter-population genetic distances were used in Analyses of Molecular Variance (AMOVA), 
implemented in Arlequin v.3.11 (Excoffier et al., 2005). The distribution of variance among 
three hierarchical levels was tested in order to assess relationships among groups of 
populations. The lowest level is the variation contained between individuals within the 
same population. The next level contains the variation that exists between populations 
(populations in this case was the groups defined in Table 2.1). The third level contains the 
variation between groupings of these populations. Different groupings of populations were 
attempted, which were based on geographic distribution, language and self-identification of 
populations. 
 
 
2.2.3 Y-chromosome methods 
 
A total of 353 male samples were typed for Y-chromosome variation. Analyses of the Y-
 chromosomes were performed at two levels: firstly, haplogroup-defining bi-allelic markers 
were typed using restriction fragment length polymorphism (RFLP) assays or by using 
several SNaPshot minisequencing systems designed by the HGDDRU laboratory (Naidoo 
et al., Unpublished). Secondly, microsatellite repeat-length analysis of short tandem repeat 
loci (Y-STRs) was done to determine intra haplogroup variation. 
 
The RFLP assays were used initially and were gradually replaced as new minisequencing 
panels were developed and became available in the HGDDRU laboratory. The two 
 103 
techniques combined use 83 polymorphisms in the non-recombining part Y-chromosome to 
assign individuals to 71 haplogroups. Figure 2.3 illustrate the Y-chromosome tree with 
nomenclature according to Karafet et al., (Karafet et al., 2008) and highlight the mutations 
used through different methods applied in the HGDDRU laboratory. 
 
 104 
 
 
 
 
 
 
Figure 2.3  The Y-chromosome haplogroup tree with nomenclature according to Karafet et al., (2008) indicating the branch-defining mutations screened for 
by using SNaPshot minisequencing panels and RFLP assays in the HGDDRU laboratory 
 105 
2.2.3.1 Y-chromosome RFLP 
For the assignment of the 353 Y-chromosomes in the sample group a total of 24 of the Y-
 chromosome RFLPs were typed in a hierarchical manner (listed in Table 2.7) using the tree 
provided in Figure 2.3. The assays consisted of PCR amplification followed by restriction 
digests and separation on agarose gels. 
 
The PCR amplification reaction had a total volume of 25 ?l, which contained ~50 ng DNA, 
0.1 mM dNTPs, 1 U FastStart Taq polymerase and optimised amounts of MgCl2, BSA, 
primer and spermidine (Table 2.8). All PCR?s were performed on a 9700 GeneAmp? PCR 
System (Perkin-Elmer, PE Applied Biosystems), with the following thermal cycling 
conditions: Initial step at 95?C for 6 min followed by 35 cycles of denaturation at 95?C for 
30 s, annealing at the appropriate temperature (Table 2.8) for 30 s and amplification at 
72?C for 30 s; finishing off with a 7 min final extension at 72?C.  
 
Restriction digests were done in a 30 ?l volume containing 25 ?l PCR product, 1x 
restriction enzyme buffer, 0.1 U restriction enzyme and in some cases (Table 2.8) added 
BSA (final concentration = 0.3 mg/ml). Digestion temperatures and reaction specific 
conditions are listed in Table 2.8. After digestion the fragments were separated on agarose 
gels of appropriate concentrations (Table 2.8) (1 x TBE running buffer, Bromophenol blue 
Ficoll dye loading buffer, 1Kb DNA ladder size standard (Gibco BRL). During every RFLP 
assay, control samples known to be ancestral and derived for the polymorphism were 
included, as well as a PCR blank control containing no DNA. Separated fragments were 
visualized under UV light and gel photographs were taken using the G:Box gel 
documentation system (Vacutec, SynGene, Cambridge, England) and GeneSnap v6.08 
software (Synoptics Ltd., SynGene, Cambridge, England).  
 
 
 
 
 
 
 
 106 
Table 2.7   SNPs typed in RFLP assays to determine Y-chromosome haplogroup 
  
Initial typing 
  
 SRY10831-1 (G-A) 
 YAP (a/p)  
 M213 (T-C)  
 M168 (C-T)  
   
African groups   Eurasian groups 
Haplogroup A 
 
Haplogroup C 
M51 (G-A)  M130 (C-T) 
M23 (A-G)  Haplogroup R 
Haplogroup B 
  M9 (C-G) 
M112 (G-A)   M207 (A-G) 
M150   SRY10831-2 (A-G) 
M129 (G-A)   M17 (a/p -  G) 
M169 (T-C)  Other groups (J,C,I,H,L) 
M211 (C-T)  p12f2 (a/p ? 88 bp) 
Haplogroup E 
 M172 (T-G) 
M2 (A-G)  M52 (A-C) 
M191 (T-G)  M170 (A-C) 
M75 (G-A)  L-M11 (A-G) 
M35 (G-C)     
 a/p ? absent or present 
 
 
 
 
 
 
 
 
 
 
 
 107 
Table 2.8   Conditions and concentrations used during Y-chromosome RFLP typing 
Marker SRY10831 M51 M23 M168 M150 M112 
Mutation A-G; reversion G-A G-A A-G C-T C-T G-A 
Haplogroup(s) defined by derived state SRY10831.1: B - R  A - M51 A - M23 E - R B - M150 B - M112 
  SRY10831.2: R       
PCR stock solutions 
            
MgCl2 (25 mM) 1.5 mM 2 mM 2.5 mM 2.5 mM 1.5 mM 2 mM 
primer F (10 uM) 0.4 uM 0.4 uM 0.4 uM 0.4 uM 0.4 uM 0.4 uM 
primer R (10 uM) 0.4 uM 0.4 uM 0.4 uM 0.4 uM 0.4 uM 0.4 uM 
BSA (10 mg/ml) - - - - - - 
spermidine (2.5 mM) - - - - - - 
        
PCR conditions 
          PCR in 50-ul vol. 
annealing temperature (?C) 60 58 58 56 60 61 
        
Digestion  
          in 50-ul vol.; overlay with oil 
PCR product size (bp) 167 339 327 473 167 227 
restriction enzyme  Dra III + BSA Hind III Xba I + BSA Hinf I Aat II TspR I + BSA 
digestion conditions (?C)   37 37 37 37 37 65 
gel detection 3% agarose 2% agarose 2% agarose 3% agarose 3% agarose 2% agarose 
ancestral allele - product sizes (bp) 167  (A) 339  (G) 173 + 154  (A) 234 + 105 + 81 + 52  (C ) 146 + 21  (C) 155 + 72  (G) 
derived allele - product sizes (bp) 112 + 55  (G) 307 + 32  (A) 327  (G) 234 + 186 + 52  (T) 167  (T) 227  (A) 
        
Comments reverse mut. G-A      
  in HG R      
References 
            
Reference: polymorphism (Whitfield et al., 1995) (Underhill et al., 2000) (Underhill et al., 2000) (Shen et al., 2000) (Underhill et al., 2000) (Underhill et al., 2000) 
Reference: primers (Santos et al., 1999) (Underhill et al., 2000) (Underhill et al., 2000) (Underhill et al., 2000) unpublished  unpublished  
Reference: PCR-RFLP assay (Santos et al., 1999) unpublished  unpublished unpublished  unpublished  unpublished  
          mismatch primer   
 
 
 
 108 
Table 2.8 - continue   Conditions and concentrations used during Y-chromosome RFLP typing 
Marker M129 M169 M211 M130 (RPS4Y) YAP M2 (sY81) 
Mutation G-A T-C C-T C-T absence - presence of YAP A-G 
Haplogroup(s) defined by derived state B - M129 B - M169 B - M211 C - M130 D and E E - M2 
  
      
PCR stock solutions 
            
MgCl2 (25 mM) 1.5 mM 1.5 mM 2 mM 1.5 mM 1.5 mM 1.5 mM 
primer F (10 uM) 0.4 uM 0.4 uM 0.4 uM 0.4 uM 0.4 uM 0.2 uM 
primer R (10 uM) 0.4 uM 0.4 uM 0.4 uM 0.4 uM 0.4 uM 0.2 uM 
BSA (10 mg/ml) - - - - 1 ug/ul - 
spermidine (2.5 mM) - - - - - - 
        
PCR conditions 
            
annealing temperature (?C) 62 58 58 50 51 58 
        
Digestion  
            
PCR product size (bp) 255 200 208 91  150  (YAP-) /  450  (YAP+)   148 
restriction enzyme  Msp I Dra I (Roche) Rsa I Bsl I - Nla III  + BSA 
digestion conditions (?C)   37 37 37 55 - 37 
gel detection 2% agarose 2% agarose 2% agarose 3% agarose 2% agarose 3% agarose 
ancestral allele - product sizes (bp) 219 + 36  (G) 106 + 94  (T) 208  (C) 57 + 34  (C)  150  (YAP-)    105 + 43  (A) 
derived allele - product sizes (bp) 255  (A) 200  (C) 137 + 71  (T) 91  (T) 450  (YAP+)   148  (G) 
        
Comments 
       
  
       
References 
            
Reference: polymorphism (Underhill et al., 2000) (Shen et al., 2000) (Shen et al., 2000) (Bergen et al., 1999) (Hammer, 1994) (Seielstad et al., 1994) 
Reference: primers (Underhill et al., 2000) unpublished  unpublished  (Kayser et al., 2000) (Hammer and Horai, 1995) (Thomas et al., 1999) 
Reference: PCR-RFLP assay unpublished  unpublished  unpublished  (Kayser et al., 2000) - (Thomas et al., 1999) 
              
 
 
 
 109 
Table 2.8 - continue   Conditions and concentrations used during Y-chromosome RFLP typing 
Marker M191b M35 M75 M213 M170 M52 
Mutation T-G G-C G-A T-C A-C A-C 
Haplogroup(s) defined by derived state E - M191 E - M35 E - M75 F - R I - M170 H - M52 
  
      
PCR stock solutions 
            
MgCl2 (25 mM) 2.5 mM 2 mM 2 mM 2.5 mM 4 mM 2.5 mM 
primer F (10 uM) 0.3 uM 0.4 uM 0.3 uM 0.4 uM 0.4 uM 0.4 uM 
primer R (10 uM) 0.3 uM 0.4 uM 0.3 uM 0.4 uM 0.4 uM 0.4 uM 
BSA (10 mg/ml) - - - - - - 
spermidine (2.5 mM) - - - - - - 
        
PCR conditions 
            
annealing temperature (?C) 60 58 55 56 59 60 
        
Digestion  
            
PCR product size (bp) 156 186 189 409 129 164 
restriction enzyme  Mbo I Bsr I Nla III + BSA Nla III  + BSA Nla III  + BSA HpyCH4 IV 
digestion conditions (?C)   37 65 37 37 37 37 
gel detection 3% agarose 2% agarose 3% agarose 2% agarose 3% agarose 3% agarose 
ancestral allele - product sizes (bp) 156  (T) 122 + 64  (G) 189  (G) 290 + 119 (T) 109 + 20  (A) 164  (A) 
derived allele - product sizes (bp) 129 + 27  (G) 186  (C) 165 + 24  (A) 409 (C ) 129  (C) 138 + 26  (C) 
        
Comments 
    incomplete   
  
    digestion  
References 
            
Reference: polymorphism (Shen et al., 2000) (Underhill et al., 2000) (Shen et al., 2000) (Shen et al., 2000) (Shen et al., 2000) (Underhill et al., 2000) 
Reference: primers unpublished  unpublished  unpublished  (Underhill et al., 2001) unpublished  unpublished  
Reference: PCR-RFLP assay unpublished  unpublished  unpublished  unpublished  unpublished  unpublished  
  mismatch primer   mismatch primer   mismatch primer mismatch primer 
 
 
 
 110 
Table 2.8 - continue   Conditions and concentrations used during Y-chromosome RFLP typing 
Marker p12f2 M172 M11 M9 M207 M17 
Mutation no del - del T-G A-G C-G A-G WT-del G 
Haplogroup(s) defined by derived state J - p12f2 (Eu 10) J - M172 (Eu 9) L - M11 O - R R - M207 R-M17 
  
      
PCR stock solutions 
            
MgCl2 (25 mM) 1.5 mM 2 mM 3.5 mM 1.5 mM 1.5 mM 1.5 mM 
primer F (10 uM) 0.3 uM 0.4 uM 0.4 uM 0.2 uM 0.4 uM 0.3 uM 
primer R (10 uM) 0.3 uM 0.4 uM 0.4 uM 0.2 uM 0.4 uM 0.3 uM 
BSA (10 mg/ml) - - - - - - 
spermidine (2.5 mM) - - - - - - 
  M2-F and M2-R  (0.3 uM each)      
PCR conditions 
            
annealing temperature (?C) 58 58 58 54 56 56 
        
Digestion  
            
PCR product size (bp) p12f2 = 88; M2 = 148  148 215 340 423 124 
restriction enzyme  - Nla III  + BSA Msp I Hinf I Dra I (Roche) Afl III 
digestion conditions (?C)   - 37 37 37 37 37 
gel detection 2% agarose 2% agarose 3% agarose 3% agarose 2% agarose 3% agarose 
ancestral allele - product sizes (bp) 148 + 88  (no del) 148  (T) 215  (A) 181 + 95 + 64  (C ) 356 + 77 (A) 124 (+G) 
derived allele - product sizes (bp) 148  (del) 122 + 26   (G) 193 + 22  (G) 245 + 95  (G) 423 (G) 101 (-G) 
  (co-amplification with M2)      
References 
            
Reference: polymorphism (Casanova et al., 1985) (Shen et al., 2000) (Underhill et al., 1997) (Underhill et al., 1997) (Shen et al., 2000) (Underhill et al., 1997) 
Reference: primers (Rosser et al., 2000) (Nebel et al., 2001) (Qamar et al., 2002) (Underhill et al., 1997) (Underhill et al., 2001) (Thomas et al., 1999) 
Reference: PCR-RFLP assay - (Nebel et al., 2001) (Qamar et al., 2002) unpublished  unpublished  (Thomas et al., 1999) 
    mismatch primer mismatch primer     mismatch primer 
 111 
2.2.3.2 Y-chromosome minisequencing 
Seven Y-chromosome minisequencing panels were developed in the HGDDRU laboratory 
by T. Naidoo (Naidoo et al., Unpublished). The ?Y-SNP1? panel resolve some of the basal 
branches in the Y-chromosome tree (SRY10831.1, M168, M89) and thereafter targets 
Eurasian haplogroups (Figure 2.3 and Table 2.9). The other 6 panels concentrate on 
resolving African haplogroups. The ?haplogroup E? panel resolve the main haplogroup E 
branches while the ?E1b1a? and ?E1b1b? panels focuses on these two common subgroups. 
The main branches of haplogroup B are resolved by the ?Haplogroup B? panel and the 
?B2b? panel focus on the branches of the B2b subgroup. The subgroups of haplogroup A 
are fully resolved by one ?Haplogroup A? panel (Figure 2.3 and Table 2.9). 
 
The various SNPs in the minisequencing panels, their ancestral and derived states and 
their electropherogram profiles are listed in Table 2.9. 
 
The methods for implementing these panels are according to T. Naidoo (Naidoo et al., 
Unpublished). Each panel involves one multiplex PCR amplification followed by a PCR 
cleanup, minisequencing reactions with labelled ddNTPs, minisequencing reaction cleanup 
and analysis on a sequencer, similar to the mitochondrial minisequencing methodology 
described earlier. 
 
 
 
 
 
 
 
 
 
 
 
 
 112 
Table 2.9  Information on the seven Y-chromosome minisequencing panels used to resolve haplogroups 
according to Figure 2.3 
Marker name Electropherogram peak number Ancestral allele 
Electropherogram 
label color Derived allele 
Electropherogram 
label color 
      
Haplogroup A      
M91 1 T RED A GREEN 
M31 2 C BLACK G BLUE 
M14 3 A GREEN G BLUE 
M114 4 A GREEN G BLUE 
P28 5 C BLACK T RED 
M28 6 A GREEN C BLACK 
M51 7 C BLACK T RED 
M13 8 G BLUE C BLACK 
M171 9 C BLACK G BLUE 
M118 10 A GREEN T RED 
Haplogroup B      
M60 1 A GREEN T RED 
M146 2 T RED G BLUE 
M182 3 C BLACK T RED 
M150 4 C BLACK T RED 
M152 5 C BLACK T RED 
M108 6 A GREEN G BLUE 
M43 7 A GREEN G BLUE 
M112 8 G BLUE A GREEN 
Haplogroup B2b      
P6 1 G BLUE C BLACK 
M115 2 C BLACK T RED 
M30 3 G BLUE A GREEN 
P7 4 T RED C BLACK 
P8 5 G BLUE A GREEN 
M211 6 C BLACK T RED 
Haplogroup E      
M40 1 C BLACK T RED 
M33 2 A GREEN C BLACK 
M44 3 C BLACK G BLUE 
M75 4 G BLUE A GREEN 
M41 5 G BLUE T RED 
M85 6 G BLUE T RED 
P2 7 G BLUE A GREEN 
M2 8 T RED C BLACK 
M35 9 C BLACK G BLUE 
Haplogroup E1b1a      
M58 1 G BLUE A GREEN 
M116.1 2 A GREEN C BLACK 
M149 3 G BLUE A GREEN 
M154 4 A GREEN G BLUE 
M155 5 C BLACK T RED 
M10 6 T RED C BLACK 
M191 7 T RED G BLUE 
Haplogroup E1b1b      
M78 1 C BLACK T RED 
M148 2 T RED C BLACK 
M81 3 C BLACK T RED 
M107 4 A GREEN G BLUE 
M165 5 T RED C BLACK 
M123 6 G BLUE A GREEN 
M34 7 G BLUE T RED 
M136 8 G BLUE A GREEN 
M281 9 C BLACK T RED 
Y-SNP1      
SRY10831 2 A GREEN G BLUE 
M168 3 C BLACK T RED 
M89 4 G BLUE A GREEN 
M201 5 G BLUE T RED 
M69 6 T RED C BLACK 
M170 7 A GREEN C BLACK 
M172 8 T RED G BLUE 
M9 9 C BLACK G BLUE 
M207 10 A GREEN G BLUE 
M198 11 C BLACK T RED 
M343 12 C BLACK A GREEN 
      
 
 
 113 
2.2.3.3 Y-chromosome STR 
Twelve Y-chromosome STRs on the Y-chromosome non-recombining region were typed 
using the PowerPlex? Y System (Promega) according to kit instructions with some 
modification. The kit allows for the co-amplification of 12 Y-STR loci (DYS19, DYS385a/b, 
DYS389I/II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, and DYS439) in a 
single multiplex reaction. The multiplex reaction was performed in a total volume of 6.25 ?l, 
including 5 ng DNA, 1 x PowerPlex? Y Buffer, 1 x PowerPlex? Y Primer mix, 5 U FastStart 
Taq polymerase (Roche Applied Science) and ddH2O to make up the reaction mix. 
Thermal cycling conditions are shown in Table 2.10 
 
1 ?l of the multiplex PCR reaction product was mixed with 8.5 ?l Hi-Di formamide (Applied 
Biosystems) and 0.5 ?l of ILS600 internal size standard (Promega). After a denaturing step 
for 2 min at 95?C followed by cooling to 4?C the fragments were separated on an ABI 
PRISM? 3130xl Genetic Analyzer (Applied Biosystems) according to PowerPlex? Y 
System Kit instructions and analysed using GeneMapperID v3.2 software (Applied 
Biosystems). 
 
Table 2.10   Y-STR PCR Thermal Cycler Conditions 
Temperature (and Ramp Speed) Time (min:sec) 
95?C             11:00 
96?C             01:00 
94?C (ramp 100%) 00:30 
60?C (ramp 29%) 00:30 
70?C (ramp 23%) 00:45 
10 cycles 
90?C (ramp 100%) 00:30 
58?C (ramp 29%) 00:30 
70?C (ramp 23%) 00:45 
20 cycles 
60?C             30:00 
4?C    8 
 
 
 
 
 114 
2.2.3.4 Y-chromosome data analysis 
Using the bi-allelic polymorphisms the Y-chromosomes in the study group were allocated 
to haplogroups according to the nomenclature of Karafet et al., (Karafet et al., 2008). Intra-
 population Y-chromosome variation was calculated using STR-haplotypes to infer Gene 
Diversities in Arlequin v3.11 (Excoffier et al., 2005). 
 
Networks of STR haplotypes were constructed using the Median Joining algorithm (Bandelt 
et al., 1999) of Network v4.5.0.0 (Fluxus-engineering, 2008). Networks were subjected to 
maximum parsimony post-analysis using the Steiner maximum parsimony (MP) algorithm 
(Polzin and Daneschmand, 2003) within Network 4.5.0.0. For network analysis the epsilon 
parameter was set to 0 and the median vector criterion was set to ?Connection Cost?. Loci 
were not weighted differently but repeats at the DYS389II locus were modified. DYS389 is 
a composite locus that contains regions that are phylogenetically informative as well as fast 
evolving regions that obscure phylogenetic structure. To alleviate this problem DYS389I 
was subtracted from 389II to give DYS389c, this excludes some of the uninformative data. 
In further analyses 389I and 389c are then used. 
 
TMRCA for the haplogroups were estimated from the median joining networks using a 
mutation rate of ? = 6.9 x 10-4 per locus per generation with a generation time of 25 years 
(Zhivotovsky et al., 2004). 
 
Individual STR variation was also subjected to distance analysis using the ??2 distance 
measure (Goldstein et al., 1995), as employed in Populations v.1.2.30 (Langella, 2002). 
The ??2 statistic is a genetic distance specifically developed for microsatellite loci, 
incorporating features of the stepwise mutation model. Distance matrices for all 
haplogroups were calculated and used to construct Neighbour Joining (NJ) trees in 
MEGA4 (Tamura et al., 2007) and Multidimensional Scaling (MDS) plots in PAST v.1.54 
(Hammer et al., 2001b). 
 
To visualize haplogroup frequency distributions, haplogroup isofrequency maps were 
generated applying the Kriging method (Oliver and Webster, 1990; Xue et al., 2005) 
incorporated in the Surfer v.8.06.39 program (Golden-Software, 2006). 
 115 
 
The relationships among the population groups were analysed using haplogroup frequency 
data as well as STR haplotype data. Inter population distances were calculated by 
generating Fst distance matrices from haplotype frequency data and Rst (Slatkin, 1995) 
distance matrices for STR haplotypes, both calculated in Arlequin v3.11 (Excoffier et al., 
2005). For both data types, population differentiation were examined using an exact test 
(Raymond and Rousset, 1995) implemented in the Arlequin v3.11. The matrices were 
visualized through PCA plots and cluster analysis in PAST v.1.54 (Hammer et al., 2001b). 
The correlation between the two matrices was tested through a Mantel test applied in 
Arlequin v3.11. 
 
The two genetic distance matrices were also compared to a physical distance matrix 
(Appendix C). This was accomplished by doing a linear regression using R v.2.5.0 (R-
 Project, 2006) on the scatter plot resulting from pairwise comparisons of distance matrices 
based on physical and genetic distances. A Mantel test implemented in Arlequin v3.11 
(Excoffier et al., 2005) was also performed to test the correlation between the two genetic 
distance matrices and the physical distance matrix. 
 
The Fst and Rst distance matrices were both used in AMOVA analysis, performed in 
Arlequin v3.11 (Excoffier et al., 2005). The distribution of variance among three hierarchical 
levels was tested in order to assess relationships among groups of populations. The lowest 
level is the variation contained between individuals within the same population. The next 
level contains the variation that exists between populations (populations in this case were 
the groups defined in Table 2.1). The third level contains the variation between groupings 
of these populations. Different groupings of populations were attempted, which were based 
on geographic distribution, language and self-identification of populations. 
 
 
 
 
 116 
2.2.4 Autosomal SNP methods 
 
To analyse information contained in the autosomes, 220 autosomal SNPs were specifically 
selected in the following way:  
-10 SNPs per chromosome (chromosome 1 to 22) were selected.  
-The 10 SNPs were selected in two groups of 5 linked SNPs.  
-The two groups were completely unlinked from one another (Figure 2.4). 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
The five SNPs in the five SNP group were selected to be on the same haploblock. To 
select the SNPs the software SNPbrowserTM  v3.1 (Applied Biosystems) were used and 
both Hapmap and Applied Biosystems (ABI) SNP databases were considered. In the ABI 
database, haplotype blocks from the African American study group were considered and in 
the Hapmap database, haplotype blocks from the Yoruba study group were considered. 
None of these two study groups are Khoe-San but these were the closest related 
population groups from which sufficient SNP data was available at the time. SNPs were 
selected to be on the same haploblock in the Yoruba group and preferentially also on the 
same haploblock in the African American group. The average distance between 
Two groups of linked 
SNPs are completely 
unlinked to each other 
Five linked SNPs in 
a linked haploblock 
Five linked SNPs in 
a linked haploblock 
Figure 2.4  SNP selection strategy illustrated on a chromosome  
 117 
consecutive selected SNPs in the same haploblock was 4347 bp (SD =3730.8 ; MIN = 192 
bp ; MAX = 22332 bp). The haploblocks that contained the SNPs were not associated with 
any known coding part of the genome. Therefore neutral genetic variation was targeted 
and influence of selection minimized. Furthermore SNPs were selected to have a minor 
allele frequency above 10% in the African population groups in order to try and select 
SNPs that contain polymorphisms in African populations. The full details for selected SNPs 
(their chromosome number, the group they sorted into, their alternate name for the 
analyses, position on chromosome, distances from other SNPs and minimum allele 
frequencies in the Yoruba and African American groups), are listed in Appendix C.  
 
SNPs were selected in this fashion to allow for multiple types of analyses using the same 
dataset. Firstly the selection allows for the compilation of multiple different genotype sets 
with 44 unlinked polymorphisms in each, by selecting one SNP per SNP-group. Analyses 
of these different SNP sets of 44 SNPs can then be compared with one another to see if 
similar results are obtained. If 100 such sets are selected, analysed and compared it will be 
the same as 100 separate studies with 44 unlinked SNPs in each. Furthermore if 
haplotypes are inferred (using haplotype inferring software) for SNPs in the same 
haploblock, these haplotypes can be used in different analyses than that utilized for 
unlinked genotypic SNPs. 
 
All autosomal SNPs were typed by a commercial company (Harvard-Partners Center for 
Genetics and Genomics, Genotyping Facility, Cambridge, Massachusetts, United States). 
The company used Sequenom iPLEX SNP genotyping, which allows interrogation of up to 
40 assays in one well of a 384 well plate and therefore reduces the per genotype cost. The 
technique is based on a multiplexed PCR followed by a minisequencing reaction in a single 
well. iPLEX chemistry involves the extension of minisequencing probes by a single mass-
 modified dideoxynucleotide using a proprietary enzyme from Sequenom. The size of 
reaction products is determined directly by MALDI-TOF mass spectrometry, yielding 
genotype information. Specialized equipment for this work includes a Pre- and Post-PCR 
Biomek FX liquid handling system, a Multimek liquid handler, a nanoliter plotting robot for 
spotting the extension products onto chips, and a Brukker Compact mass spectrometer.  
 
 118 
Multiplex PCR assays were designed by the company using Sequenom 
SpectroDESIGNER software (version 3.0.0.3) by inputting sequence containing the SNP 
site and 100 bp of flanking sequence on either side of the SNP. The SNPs are grouped into 
multiplexes so that the extended product does not overlap in mass with any other 
oligonucleotide present in the reaction mix, and where no primer-primer, primer-product 
non-specific interactions will occur.  
 
Resultant SNP data were downloaded from the company web-based database and edited 
into formats suitable for computational analyses. Seven of the 220 loci were discarded 
because of poor assay quality (indicated in Appendix C). 
 
2.2.4.1 Autosomal SNP data analysis (Genotypic) 
The panel of 220 SNPs (consisting of two unlinked groups of five linked SNPs per 
chromosome) was used to generate 100 different random combinations of 44 unlinked 
SNPs. 
 
The proportion of polymorphic loci, heterozygosity (Weir, 1996a) and gene diversity (Weir, 
1996b) of each of the 100 different SNP datasets were calculated for each of the 14 
populations analysed as well as for the total dataset using GDA v 1.0 (Lewis and Zaykin, 
2001). The averages as well as the standard deviation of these three summary statistics 
were then calculated across the 100 datasets. The heterozygosity estimate is the 
proportion of heterozygous individuals in the population and Gene diversity  (often referred 
to as expected heterozygosity), is defined as the probability that two randomly chosen 
alleles from the population are different. 
 
To test if there was a correlation between the variation between the different runs (standard 
deviation) and the average heterozygosity of each population, a scatter plot was generated 
with the average heterozygosity of each population on the Y-axis and the standard 
deviation (SD) between the heterozygosities in the different datasets on the X-axis. Using 
R v.2.5.0 (R-Project, 2006), a linear regression was done to find the function that best 
described the relationship between the points. 
 
 119 
Analysis of population structure on the 100 different SNP sets was done using a K-means 
clustering approach implemented in STRUCTURE v2.2 (Pritchard et al., 2000; Falush et 
al., 2003; Falush et al., 2007). The STRUCTURE analysis of the 100 sets of 44 SNP were 
conducted as follows: STRUCTURE runs with 10 iterations at K=1 to K=10 were conducted 
with a burn-in of 50K and repeats of a 100K for each set. Allele frequencies were 
correlated and a model with admixture was assumed for all runs. The 10 iterations at each 
K for each of the 100 SNP sets were then collapsed into 1 consensus run using CLUMPP 
(Jakobsson and Rosenberg, 2007). Thereafter the 100 sets of random SNPs were 
collapsed into a consensus run at each K using CLUMPP v1.1.1. Results were visualized 
using DISTRUCT (Rosenberg, 2002). Figure 2.5 illustrates this process.  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
The K value with the highest average likelihood and the highest delta K value (Evanno et 
al., 2005) were calculated and compared to one another to identify the best cluster 
assignment. 
Figure 2.5  Diagram illustrating how STRUCTURE results for 100 SNP sets were condensed into one 
consensus run. Starting at the right with 10 x 100 sample sets for each K value. The 10 iterations for each of 
the 100 runs were condensed into one run leaving 100 different SNP sets at each K. These 100 different SNP 
sets at each K were then condensed into one result at each K value. 
100 Different SNP sets at each K 
combined into 1 run 
10 iterations 
for each of the 
10 SNP sets at 
each K value 
combined into 
1 run 
 120 
 
Results of assignments of K=3 were plotted on a triangle plot using R v.2.5.0 (R-Project, 
2006), while incorporating the ADE4 R-library.  
 
The variation of assignment to K clusters of the 100 different runs was compared to one 
another. The mean population assignments of each run at K=2 to K=5 were plotted on a 
graph using R v.2.5.0 to graphically illustrate the variations between runs. Furthermore the 
differences between the runs were tested by doing pairwise correlations between runs at 
K=3 using Pearson?s correlation coefficient (r) implemented in PAST v.1.54 (Hammer et 
al., 2001b). 
 
The same 100 SNP datasets used in the STRUCTURE analysis was also used in distance 
based analysis. To construct inter-population distance matrices of each of the 100 
datasets, Reynolds distance (Reynolds et al., 1983) was used as implemented in 
Powermarker v3.25 (Liu and Muse, 2005). To condense the 100 different population 
distance matrices into one output, two alternate approaches were followed. In the first case 
100 different NJ trees were constructed in Powermarker v3.25 and were then condensed 
into one Majority Rule consensus tree using CONSENCE implemented in PHYLIP v.3.65 
(Felsenstein, 2004). The tree was then visualized in Dendroscope (Huson et al., 2007). In 
the second approach an average distance matrix was calculated by taking the average of 
each pairwise comparison in the 100 distance matrices. This average distance matrix was 
then used to construct a NJ tree using NEIGHBOUR, implemented in PHYLIP v.3.65 
(Felsenstein, 2004). The average population distance matrix was also further used to do 
PCA in PAST v.1.54 (Hammer et al., 2001b). 
 
The NJ tree consensus tree is useful in illustrating the number of times that a particular 
branch is supported by the 100 separate trees but the branch length of the tree is not an 
indication of distance between populations. The average distance matrix consensus tree 
on the other hand does not tell us the number of times a particular branch is supported by 
the 100 distance matrices but gives us a good indication of the distances between 
populations through mean branch lengths. 
 
 121 
Inter-individual pairwise distance matrices for the 352 individuals in the 100 different SNP 
datasets was also constructed using Reynolds distance (Reynolds et al., 1983) in 
Powermarker v3.25 (Liu and Muse, 2005). The average of the 100 individual distance 
matrices was calculated by taking the average of each pairwise comparison. The average 
individual distance matrix was then used for PCA in PAST v.1.54 (Hammer et al., 2001b). 
 
To investigate the relationship of physical distance and genetic distance using autosomal 
SNPs in the Khoe-San and Coloured populations the composite distance matrix of the 100 
datasets (Reynolds distance) (Reynolds et al., 1983) was compared to a physical distance 
matrix (Appendix C). Pairwise comparisons between physical distance (X-axis) and genetic 
distance Y-axis was plotted on graphs and a linear regression was done using R v.2.5.0 
(R-Project, 2006) to determine the line with the best fit through the points. A Mantel test 
implemented in Arlequin v.3.11 (Excoffier et al., 2005) was also done to test the correlation 
between the two distance matrices. 
 
Five random sets from the 100 datasets were chosen to do AMOVA, implemented in 
Arlequin v3.11 (Excoffier et al., 2005). The average values of the five sets were reported. 
The distribution of variance among three hierarchical levels was tested in order to assess 
relationships among groups of populations. The lowest level is the variation contained 
between individuals within the same population. The next level contains the variation that 
exists between populations (populations in this case was the groups defined in Table 2.1). 
The third level contains the variation between groupings of these populations. Different 
groupings of populations were attempted, which were based on geographic distribution, 
language and self-identification of populations. 
 
2.2.4.2 Autosomal SNP data analysis (Haplotypic) 
For haplotype analysis the five linked SNPs on the same haploblock was used to infer 44 
haplotypes consisting of 5 bp each. The haplotypes were inferred separately for each 
population and each SNP set of 5 using Powermarker v3.25 (Liu and Muse, 2005). The 
frequencies of the different types of haplotypes in the 44 haplotype loci were calculated in 
Powermarker v3.25 (Liu and Muse, 2005) and represented in the form of bar charts using 
Microsoft Excel.  
 122 
 
To try and condense the information from the 44 separate haplotype loci two approaches 
were followed. In the first approach the 88 haplotypes (2 at each loci) of each individual 
was concatenated into two haplotypes for each individual. The order in which two alleles of 
the same locus is combined with the two alleles of any other loci will not be important since 
the alleles at the 44 loci segregate independently in the population. Individuals with >50% 
missing data at any of the 5-SNP loci were excluded from the analysis. Of the 352 
individuals, 298 remained and therefore 596 haplotypes. Since some of the loci were very 
polymorphic and contained many different haplotypes, the combination of several such loci 
will lead to high haplotype diversities. Concatenating haplotypes in individuals led to 594 
unique haplotypes in the total of 596 haplotypes. The individual haplotypes were then used 
to construct distance matrices. Both population and individual distance matrices were 
constructed using the Maximum composite likelihood algorithm in MEGA4 (Tamura et al., 
2007). These distance matrices were then used for PCA and cluster analysis in PAST 
v.1.54 (Hammer et al., 2001b). 
 
In the second approach only the 44 small haplotypes with the highest frequency in each 
specific population at each of the 44 loci were selected. This was then taken as the 44 
representing small haplotypes of each population. The 44 small haplotypes were then 
concatenated into one haplotype sequence for each population. These 14 population 
representative sequences were then used to construct a distance matrix using the 
Maximum composite likelihood method in MEGA4 (Tamura et al., 2007). The distance 
matrix was then used for PCA and cluster analysis in PAST v.1.54 (Hammer et al., 2001b). 
 
The 44 separate haplotypes that are concatenated into one haplotype will have different 
evolutionary histories and a single unique tree will not best characterize the phylogenetic 
representation of the haplotype. An approach should rather be followed where data are not 
forced into a single tree. An approach that employs this strategy is the Neighbour-Net 
method (Bryant and Moulton, 2002). Data are decomposed into several splits and 
represented in the form of a splits graph. Ideal data will yield a tree but data that do not 
support a single unique tree will yield a tree-like network representing different 
incompatible phylogenies. Although this method does not force data into a single tree it 
 123 
gives a good indication how tree-like a dataset is. The dataset with the single 
representative haplotypes of each population was used to generate a Neighbour-Net 
network using SplitsTree4 (Huson and Bryant, 2006). 
 
The population matrices of the two different approaches were also used to test the 
relationship of physical distance and genetic distance using autosomal SNP haplotypes in 
the Khoe-San and Coloured populations. The two genetic distance matrices were 
compared to a physical distance matrix (Appendix C). Pairwise comparisons between 
physical distance (X-axis) and genetic distance (Y-axis) were plotted on graphs and a 
linear regression was done using R v.2.5.0 (R-Project, 2006) to determine the line with the 
best fit through the points. A Mantel test implemented in Arlequin v.3.11 (Excoffier et al., 
2005) was also applied to test the correlation between the distance matrices. 
 
Linkage Disequilibrium (LD) analyses would have been interesting but was not done for the 
present study since the marker coverage was very low. High resolution SNP typing of Khoe 
and San individuals is in process and these studies would give a much better picture of the 
LD patterns in the Khoe-San. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 124 
3. MITOCHONDRIAL-DNA STUDIES 
 
Only few studies thus far concentrated on studying the maternal genetic history of Khoe-
 San people (Vigilant et al., 1991; Chen et al., 2000; Tishkoff et al., 2007; Behar et al., 
2008). These studies covered only three groups of San people, including, the two Ju 
speaking groups: the !Xun that were originally from Angola (now located in Platfontein, SA) 
and the linguistically closely related Ju\?hoansi (from northern Botswana and Namibia) and 
the Khoe-speaking San group, the Khwe (also originally from Angola but now located in 
Platfontein, SA). All three of the San groups were originally from either Angola or northern 
Namibia, positioning them in the northern parts of the original distribution of Khoe-San 
people. This leaves a gap with no studies being done on groups? representative of the 
southern San people and the Khoe people. Furthermore, studies published thus far 
concentrated on studying the L0d and L0k lineages in the San groups as a whole, without 
looking into the unique histories and distributions of the L0d sub-haplogroups. Data 
collected in this study have facilitated an understanding of the sub-structure of the L0d and 
L0k haplogroups and their distribution among various additional groups with Khoe and San 
ancestry. The following sections will present the analyses of this dataset, compare results 
to published data and discuss the relevancy to Khoe-San history. 
 
In the first part of this chapter the results from the minisequencing protocol, which have 
now been published (Schlebusch et al., 2009), will be provided. Thereafter results from the 
analysis of the HVS will be presented. Firstly the haplogroups assignments, phylogenetic 
trees and networks assembled from the sequence data will be shown. Subsequently the 
further analyses of the distribution of L0d/k subgroups (in the form of isofrequency maps) 
and the analyses of haplogroup expansion and contraction signals (in the form of summary 
statistics mismatch distributions and Bayesian Skyline Plots (BSPs) will be provided. All 
results regarding specific L0d/k subgroups will thereafter be discussed in detail. Next, the 
results regarding the genetic relationships between the different Khoe-San groups included 
in the study will be presented and discussed. 
 
 
 125 
3.1 Minisequencing 
 
A minisequencing protocol was designed to distinguish between the seven African mtDNA 
macro-haplogroups (L0-L6) as well as the three non-African macro-haplogroups M, N and 
R (Figure 2.2). The panel types 14 SNPs that define these 10 macro-haplogroups. The 
panel was validated by successfully screening 699 individuals and assigning them into their 
correct macro-haplogroups. These comprised 538 individuals included for mitochondrial 
HVS-I and II analysis plus 161 additional individuals as outlined in Chapter 2. Results were 
compared to HVS based classification using a phylogenetic approach and no 
inconsistencies were found (Table 3.1).  
 
The PCR amplification of the regions that encompass the 14 SNPs where optimized into 
one multiplex reaction that amplify these regions in six amplicons of various sizes (Figure 
3.1). After the minisequencing reaction the products are separated on the genetic analyzer 
and an electropherogram displays the different sized products (Figure 3.2 give example 
electropherograms for haplogroups L0, L1, L3 and M). Due to differences in the 
electrophoretic mobility the detected band size on the electropherogram differs from the 
real size (Table 2.4) for certain products. These differences in electrophoretic mobility are 
influenced by the length of the sequence, the nucleotide composition and the dye that 
labels the extended primer. The effect of nucleotide composition generally has a higher 
influence on the shorter fragments. The difference in our actual sizes was at least five 
nucleotides between successive bands. Even with this difference, however, certain bands 
still migrated on top of one another. This, however, did not affect our classification of the 
sequences.  
 
 
 
 
 
 
 
 
 126 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
As can be seen in Figure 3.2, bands 2 and 3 (resolving branches L1-6 and L3?4 according 
to tree in Figure 2.2) migrated on top of one another but because of the difference in the 
color of the dye for the two polymorphisms (green/blue for L1-6 and red/black for L3?4) they 
are easily distinguished. Furthermore bands 11 and 12 (resolving haplogroups R and L1 
according to tree in Figure 2.2) occasionally migrated on top of one another. When both R 
and L1 are ancestral there are two green peaks, which in some cases cannot be 
distinguished (see L3 in Figure 3.2). When either is derived, however, a clear blue peak 
becomes visible, as can be seen in the L1 picture in Figure 3.2. When the two green peaks 
appear on top of one another there might be the problem of a null allele in one of the 
bands, this problem is, however, overcome by the fact that the other bands in the 
hierarchical typing confirms the position of the two polymorphisms. The final two peaks, 13 
and 14 (resolving haplogroups M and L6 according to tree in Figure 2.2), also occasionally 
overlapped (shown in Figure 3.2; L0). This results in the presence of a single red peak 
(instead of two). This is further exacerbated by the low peak amplitude of the M peak. 
When M is derived, however, the black peak can clearly be seen (Figure 3.2; M). As with 
the case of L1 and R, the presence of other peaks will hierarchically confirm the presence 
Figure 3.1  A 2% agarose gel showing the six amplified fragments that result from the 
multiplex PCR. (100 bp ladder) 
 127 
and state of the M and L6 SNPs. In order to resolve the separation issues at peaks 11 and 
12, and 13 and 14, it would be practical to add one to three bases to the tails of the L1 and 
L6 primers, thereby changing their mobility. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 3.2  Electropherogram examples showing peak profiles of haplogroups L0, L1, L3 and M. 
Peaks from left to right, in format:  
?polymorphism position (defined haplogroup according to Fig. 2.2 and Table 2.4)? are: 
1018G-A (L3),1048C-T (L1-6),7256C-T (L3'4),7521G-A (L3'4'6),8468C-T (L2-6),9347A-G 
(L0),10115T-C (L2),10398A-G (N),10810T-C (L2'3'4'6),12432C-T (L5),12705C-T (R),13789T-C 
(L1),14783T-C (M),15289T-C (L6) 
L0 
L1 
L3 
M 
 128 
Notwithstanding these minor problems in the panel, 699 individuals were successfully 
classified to their correct macro-haplogroups. Haplogrouping, based on HVS variation 
using a phylogenetic approach, were compared to the minisequencing coding region 
classification and no inconsistencies were found. Table 3.1 summarises the haplogroup 
classification based on HVS sequences and the macro-haplogroup classification based on 
the minisequencing coding region classification. There were a few instances where one of 
the bands in the profile failed or reverted. The band failure was most probably due to 
polymorphisms in the primer binding sites and the bands that failed were phylogenetically 
very specific (Table 3.1). None of this, however, had any affect on the classification of the 
implicated sequences. 
 129 
Table 3.1  Results of the minisequencing screening and classification of 699 sequences compared to classification based on HVS sequences 
MtDNA haplogroup 
based on HVS 
sequence 
Number of 
sequences 
Macro-Haplogroup 
identified using 
minisequencing 
Number of 
sequences 
Problems observed during screening 
L0a 40 
L0d 372 
L0k 35 
 
L0 
 
447 
?L5 peak fail in haplogroup L0d1b  
(40% of L0d1b sequences) 
?L2 peak fail in L0d3, due to 10114C                                        
mutation in L0d3 (92% L0d3 sequences); also occurs 
in singletons of L0d1c1 and L0d2a 
L1b 3 
L1c 15 
L1 18 ?L1-6 mutation is positive (incorrectly) in L1c2  
(3 sequences) 
L2* 1 
L2a 63 
L2b 5 
L2c 2 
 
L2 
 
71 
 
?L0 peak fail in L2b3  
(1 sequence) 
L3b 1 
L3c 1 
L3d 44 
L3e 38 
L3f 8 
 
 
L3 
 
 
92 
 
L4 7 L4 7  
L5 3 L5 3  
M 18 M 18  
N 6 N 6  
R 
 
37 
 
 
R 
 
37 
?L2 is positive (incorrectly) in two of the sequences 
of haplogroup H with the same HVS profile (2 
sequences) 
Total 699 Total 699  
 130 
3.2 HVS-I and II variation 
 
The 538 samples used in mitochondrial analysis were first classified into macro-
 haplogroups using the minisequencing method. Further, finer scale classification was 
achieved by analysing HVS-I and II sequences.  
 
A total of 1124 bp in a combined HVS-I and II were analysed (HVS-I: positions 15997-
 16569 and HVS-II: positions 57-607). There were 205 (18.2%) variable positions in the 
combined sequence; HVS-I had 122 (21.3%) variable sites while HVS-II had 83 
(15.1%). Fourteen sites had three different alleles and seven sites had four different 
alleles (16093, 16188, 16265, 16266, 16286, 16291, 16293). The transversion : 
transition ratio was 1 : 5.6. Insertions were observed at four sites (291, 455, 523, 573) 
while deletions occurred at seven sites (16183, 16179, 16325, 247, 249, 498, 523). All 
deletions involved 1 bp except the 523 region of HVS-II that contain an ?AC? repeat 
motif that were inserted or deleted in several sequences. Insertions involved 1 bp 
insertions at 291 and 455, one sequence had a 2 bp insertion at 455. All insertions in 
the poly C repeat track at position 568-573 where taken as a 1 bp C insertion.  
 
Using the coding polymorphisms implemented in the minisequencing procedure as well 
as the 205 variable sites from HVS-I and II, the 538 sequences were classified into 18 
haplogroups encompassing 245 haplotypes (Figure 3.3 and 3.4). A full haplotype list 
with HVS-I and II variant sites and their population assignment is included in Appendix 
E. 
 
 
 
 
 
 
 
 131 
 
 
 
 
 
 
 
 
 
 
 
 
Group N Haplogroup Frequencies 
KAR 30 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
COL 77 0.649 0.078 0 0.013 0 0 0.130 0.026 0 0 0 0 0.013 0.013 0 0.026 0.026 0.026 
CAC 20 0.450 0.150 0 0 0.050 0 0.050 0 0 0 0 0 0 0.100 0 0.100 0.050 0.050 
KHO 57 0.982 0.018 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
CNC 40 0.925 0 0 0 0 0 0 0 0 0.025 0 0 0 0 0 0.025 0 0.025 
XEG 3 0.667 0.333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
DUM 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
NAM 28 0.714 0 0.071 0 0.036 0.036 0 0 0 0 0 0 0.107 0 0.036 0 0 0 
GUG 22 0.909 0 0 0 0.091 0 0 0 0 0 0 0 0 0 0 0 0 0 
NAR 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
JOH 42 0.714 0 0.238 0 0 0 0 0 0 0.048 0 0 0 0 0 0 0 0 
XUN 49 0.653 0 0.265 0 0 0 0.041 0.020 0 0 0 0 0.020 0 0 0 0 0 
KWE 18 0.111 0.056 0.278 0 0.056 0 0.222 0.056 0 0 0 0 0 0.222 0 0 0 0 
DRC 14 0 0.071 0 0.143 0 0 0.071 0 0.071 0 0.071 0 0.071 0.286 0.214 0 0 0 
HER 15 0.067 0 0 0 0.067 0 0 0 0 0 0 0.067 0.600 0.067 0.133 0 0 0 
SOT 22 0.227 0.136 0 0 0.045 0 0.273 0.045 0 0.045 0 0 0.045 0.182 0 0 0 0 
SWZ 5 0.400 0 0 0 0 0 0 0 0 0 0 0 0 0.600 0 0 0 0 
ZUX 36 0.444 0.083 0.028 0 0.056 0.028 0.139 0 0.028 0 0 0 0.083 0.083 0.028 0 0 0 
AFR 21 0.048 0.095 0 0 0 0 0 0 0 0 0.048 0 0 0 0 0.095 0.095 0.619 
EUR 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 
IND 25 0.040 0 0 0 0 0 0 0 0 0 0 0 0.040 0 0 0.480 0.040 0.400 
Total 1 0.589 0.039 0.058 0.006 0.017 0.004 0.054 0.009 0.004 0.010 0.004 0.002 0.037 0.041 0.013 0.040 0.010 0.070 
Seq/HG 538 317 21 31 3 9 2 29 5 2 4 2 1 20 22 7 19 6 38 
Ht/HG 245 111 12 5 1 8 2 10 4 2 4 2 1 6 16 3 18 5 35 
Hd 0.984 0.962 0.905 0.738  0.972  0.852 0.900     0.558 0.957 0.524 0.994 0.933 0.994 
pi 0.012 0.007 0.006 0.001  0.011  0.003 0.003     0.003 0.006 0.002 0.006 0.011 0.008 
 
R L0a L0k L1b L1c L2a L2b L2c L3b L3c L3d L3e L3f L4 L5 M N L0d 
Figure 3.3  Mitochondrial haplogroup tree with nomenclature according to Behar et al., (2008), listing haplogroup frequencies in the different populations in the 
study group. The number of sequences per haplogroup (Seq/HG), number of haplotypes per haplogroup (Ht/HG), Haplotype Diversities (Hd) and Nucleotide 
Diversities (pi) in the different haplogroups are also indicated. 
 132 
0%
 20%
 40%
 60%
 80%
 100%
 KAR COL CAC KHO CNC NAM GUG JOH XUN KWE DRC HER SOT ZUX AFR EUR IND
 R
 N
 M
 L5
 L4
 L3f
 L3e
 L3d
 L3c
 L3b
 L2c
 L2b
 L2a
 L1c
 L1b
 L0k
 L0d
 L0a
 Figure 3.4  Graphical illustration of percentage mitochondrial haplogroup assignment in the populations used in comparative population 
analysis 
 133 
3.3 Haplogroup assignment and structure 
 
 Haplogroups other than L0d were found at very low frequencies in the total sample group 
(no other haplogroup >7% total frequency). High frequencies of these non-L0d 
haplogroups were mostly seen in the comparative groups and not in the Khoe-San or 
Coloured groups. L0d was the most frequent haplogroup in the total sample comprising 
59% of all sequences. L0d had high frequencies in all of the Khoe-San and Coloured 
groups ranging from 45% in the Cape Coloured to 100% in the Karretjie group. L0d 
frequencies in the Coloured groups of South Africa (CAC - 45%, COL - 65%, CNC - 93%, 
KAR - 100%) compared well with frequencies in San (KWE - 11%, XUN - 65%, JOH - 71%, 
GUG - 91%, KHO - 98%) and Khoe (NAM - 71%) groups. 
 
Relationships of haplotypes within the main haplogroups were assessed using maximum 
likelihood trees (Figure 3.5 a and b) and parsimony based network analysis (Figure 3.6). 
Overall, identified sub-haplogroups did group together on the tree (Figure 3.5 a and b). In 
some instances, however, especially within the L3, L4, M, N and R branches the tree 
lacked structure and identified sub-groups did not group together. This again illustrate that 
the high rate of back mutation and lack of variation in the HVS-I and II necessitate the use 
of coding region variation to indicate and direct the overall classification and structure of 
haplogroups. Control region variation should then be used for finer within sub-haplogroup 
structuring.  
 
 
 
 
 
 
 
 
 
 
 134 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Figure 3.5a  Maximum likelihood tree representing the substructure of L1 to L5. Individuals are labeled with 
numbers corresponding to the haplotype list in Appendix E and their classified haplogroup. A Neanderthal sequence 
form the outgroup. Branch support (%) was calculated through aLRT.  
 135 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 3.5b  Maximum likelihood tree showing the relationships of the different mtDNA haplotypes within 
haplogroup L0. Individuals are labeled with numbers corresponding to the haplotype list in Appendix E and their 
classified haplogroup. A Neanderthal sequence form the outgroup. Branch support (%) was calculated through 
aLRT.  
 136 
3.3.1 Haplogroup L0d/k 
 
The relationships of haplotypes in the L0 branch are shown in the phylogenetic tree in 
Figure 3.5b and the in network presented in Figure 3.6. A schematic tree showing the 
substructure and population frequencies of haplogroups L0d and L0k are shown in Figure 
3.7. In addition Figure 3.8 represents the population frequencies of L0d and L0k sub-
 haplogroups in the form of bar charts.  
 
Coalescent times (Time to Most Recent Common Ancestors - TMRCA) for all the L0d/k 
subgroups and times at which their lineages diverged from the other lineages were 
calculated from the network. The ? and ? values as well as the values in years according 
to various mutation rates are represented in Table 3.2. Although years according to all of 
the most widely used rates are represented in Table 3.2, rates according to Ward et al., will 
be used in the description henceforth (Ward et al., 1991). Figure 3.9 represents coalescent 
and divergence times of Table 3.2 in a graphic format. 
 
 
 
 137 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 3.6  Median joining network representing L0 substructure in the different populations of the study group. Stars indicate median vectors that are 
discussed in the text. CRS ? Control Region Sequence, NEAN ? Neanderthal ? Root. Numbers indicate mutations according to HVS base pair number. 
Circles represent haplotypes and are proportional to the number of sequences represented. The colour key indicates from which populations different 
haplotypes originated. 
* 
** 
 138 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Group N Sub-haplogroup frequencies 
KAR 30 0.133 0.067 0.200 0 0 0.600 0 0 0 0 0 0 
COL 77 0.104 0.130 0.143 0.013 0.026 0.208 0.026 0 0 0 0.078 0.273 
CAC 20 0 0 0.250 0 0.050 0.150 0 0 0 0 0.150 0.400 
KHO 57 0.018 0.175 0.263 0.070 0.123 0.333 0 0 0 0 0.018 0 
CNC 40 0.100 0.150 0.200 0.125 0.025 0.300 0.025 0 0 0 0 0.075 
XEG 3 0 0 0.333 0 0 0.333 0 0 0 0 0.333 0 
DUM 1 0 0 1.000 0 0 0 0 0 0 0 0 0 
NAM 28 0.036 0.036 0.214 0.036 0.143 0.214 0.036 0 0 0.071 0 0.214 
GUG 22 0 0.091 0.091 0.682 0 0 0.045 0 0 0 0 0.091 
NAR 2 0 0 0 0.500 0 0.500 0 0 0 0 0 0 
JOH 42 0 0.095 0.310 0.262 0 0 0 0.048 0 0.238 0 0.048 
XUN 49 0.020 0.020 0.061 0.408 0.020 0.082 0 0 0.041 0.265 0 0.082 
KWE 18 0 0 0 0 0 0 0 0 0.111 0.278 0.056 0.556 
BS 92 0.022 0.011 0.087 0.011 0 0.120 0.011 0 0 0.011 0.076 0.652 
OTH 57 0 0 0 0 0.018 0 0 0.018 0 0 0.035 0.930 
Total fq 1 0.039 0.069 0.147 0.110 0.032 0.169 0.011 0.006 0.007 0.058 0.039 0.314 
N Seq 538 21 37 79 59 17 91 6 3 4 31 21 169 
N Ht 245 7 21 29 12 6 27 4 2 3 5 12 117 
Hd 0.984 0.710 0.964 0.926 0.760 0.588 0.722 0.867 0.667 0.833 0.738 0.905 0.988 
? 0.012 0.001 0.004 0.003 0.003 0.001 0.001 0.005 0.004 0.002 0.001 0.006 0.012 
 
 
146C 
263A! 
16320G 
182C! 
152C 
16519C 
195C 
247A 
523delCA 
16129A 
16187T 
16189C 
182T 
16278T 
16311C 
16223T 
73G 
263G 
 
150T 
316A 
523insCA 
16290T 
16300G 
16243C 
498delC 
16278C! 
No CRS variant 
sites 
No CRS variant 
sites 
294A 16212G 
16069T 
16169T 
198T 
597T 
16390A 
456T 
16129G! 
16234T 
 
523insCA 
16239T 
16294T 
199C 
16223C! 
16234T 
16266G 
198T 
207A 
16129G! 
16209C 
 
93G 
146T! 
236C 
16148T 
16188G 
16278C! 
16320T 
16519T! 
189G 
16172C 
188G 
523insCA 
16179 T   
Root 
L1-6 L0 
L0abfk L0d 
L0d1 L0d2 
CRS L0d3 L0a L0k1 L0d1a L0d1b L0d1c L0d2a L0d2b L0d2c L0d2d L0dx 
L0k L0abf 
L0d1,2 
Figure 3.7  L0d structure as published in Behar et al., 2008 (black). Suggested changes according to this thesis are highlighted: In blue ? Two new, 
previously unidentified clades. In red ? Mutations suggested to be removed as clade defining mutations. The table summarise the frequencies, Haplotype 
Diversities (Hd) and Nucleotide Diversities (pi) of the various L0d subgroups as well as L0k1 and L0a. BS ? Bantu-speaking, OTH - Other 
 139 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 3.8  Graphical illustration of percentage L0d/k sub-haplogroup assignment in the populations used in comparative population analysis. 
Published comparative data is according to Table 1.3 
0%
 10%
 20%
 30%
 40%
 50%
 60%
 70%
 80%
 90%
 100%
 KAR
 CO
 L
 C
 AC
 C
 NC
 KHO
 N
 AM
 G
 UG
 JO
 H
 XU
 N
 KW
 E
 BS
  (S
 ala
 s/P
 e
 rreira)
 Ju\'h
 o
 a
 n
 si
  (Vigilla
 nt)
 !X
 u
 n
  (C
 h
 e
 n)
 Kh
 w
 e
  (C
 h
 e
 n)
 !X
 u
 n
 +Kh
 w
 e
 (Tishkoff)
  
L0k1
 L0dx
 L0d3
 L0d2d
 L0d2c
 L0d2b
 L0d2a
 L0d1c
 L0d1b
 L0d1a
 140 
Table 3.2  TMRCA calculated for the L0d/k subgroups. Four different mutation rates are applied 
Split from other 
Ref   Horai 1 Soodyall 2 Ward 3 Foster 4 
Haplogroup ? ? Years SD Years SD Years SD Years SD 
L0d 10.8580 2.2635 138002 28768 116247 24233 96601 20138 53668 11188 
L0d1a 7.1351 2.0865 90685 26519 76389 22338 63480 18563 35267 10313 
L0d1b 4.6709 1.5502 59366 19703 50007 16597 41556 13792 23087 7662 
L0d1c 6.7119 2.2906 85306 29113 71858 24523 59714 20379 33175 11322 
L0d2a 3.8242 1.7380 48604 22089 40942 18607 34023 15463 18902 8590 
L0d2b 8.5000 2.4777 108032 31491 91002 26527 75623 22044 42013 12247 
L0d2c 3.2941 1.5519 41867 19724 35267 16615 29307 13807 16282 7671 
L0d2d 5.0000 1.8559 63549 23588 53531 19869 44484 16512 24714 9173 
L0d3 8.0000 2.6273 101678 33392 85649 28128 71174 23375 39542 12986 
L0dx 4.0000 1.5411 50839 19587 42824 16499 35587 13711 19771 7617 
L0k1 8.5161 2.8144 108237 35770 91174 30131 75766 25039 42093 13911 
           
TMRCA 
 
? ? Years SD Years SD Years SD Years SD 
L0d 9.8580 2.0307 125292 25810 105541 21741 87705 18067 48725 10037 
L0d1 6.4286 1.2576 81706 15984 68825 13464 57194 11189 31775 6216 
L0d2 4.8718 1.6199 61919 20588 52158 17343 43343 14412 24080 8007 
L0d1a 4.1351 1.1634 52556 14786 44271 12455 36789 10351 20439 5750 
L0d1b 3.6709 1.1845 46656 15055 39301 12681 32659 10538 18144 5855 
L0d1c 4.7119 1.8019 59887 22902 50446 19291 41921 16031 23290 8906 
L0d2a 1.8242 1.0143 23185 12891 19530 10859 16230 9024 9016 5013 
L0d2b 6.5000 2.0344 82613 25857 69590 21780 57829 18100 32128 10055 
L0d2c 2.2941 1.1867 29157 15083 24561 12705 20410 10558 11339 5866 
L0d2d 4.0000 1.5635 50839 19872 42824 16739 35587 13910 19771 7728 
L0d3 4.0000 1.7037 50839 21654 42824 18240 35587 15157 19771 8421 
L0dx 3.0000 1.1726 38129 14903 32118 12554 26690 10432 14828 5796 
L0k1 1.5161 0.9596 19269 12197 16232 10274 13488 8538 7494 4743 
 
Years are calculated from ? by multiplying with the specific mutation rate 
Standard deviation (SD) are calculated from ? by multiplying with the specific mutation rate 
1 
 Horai et al., (1995) 
2 
 Soodyall et al., (1996) 
3 
 Ward et al., (1991) 
4 
 Foster et al., (1996)
 141 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 3.9  Graphic representation of coalescent times and times of divergence of the mtDNA sub-
 haplogroups of L0d and L0k. The mutation rate estimated by Ward et al., (1991) was used in these 
estimates. 
 142 
3.3.2 Khoe-San associated haplogroups L0d and L0k ? Further analysis 
 
The sub-haplogroups of the Khoe-San associated haplogroups L0d/k were differentially 
distributed in the different sample groups included in this study (Figure 3.7 and 3.8) 
 
Just by observing the distribution over the different sampling groups in the form of bar-
 charts (Figure 3.8) one could immediately see the differences. It was especially clear 
between the southern-San/Coloured/Khoe groups (KAR, COL, CAC, KHO, CNC, NAM) 
and the San groups located north of them (GUG, JOH, XUN). To further investigate these 
differential distributions, analysis of sub-haplogroup distribution was done. 
 
Sample groups were arranged in a southeast to northwest direction and coloured with 
increasing shade from the southeast to the northwest; the resultant distribution of L0d/k 
subgroups is represented in Figure 3.10. A clear clinal pattern for all of the haplogroups 
was observed. L0d2a and L0d3 seemed to have a more southeastern distribution (lighter 
shades), while L0d2b, L0d2c, L0d1a has and intermediate central pattern. L0d1c, L0k1 as 
well as the few sequences belonging to L0dx and L0d2d, however, was much darker and 
seem to predominate in the northern groups.  
 
 
 
 
 
 
 
 
 
 
 
 
To further investigate this apparent clinal distributions contour plots of the haplogroups 
were constructed with the Surfer v.8.06.39 program and is shown in Figure 3.11. 
Figure 3.10  Bar-graph indicating the clinal distribution of the L0d/k subgroups. Darker 
shades are north-western groups and lighter shades are southern groups 
0%
 20%
 40%
 60%
 80%
 100%
 L0d1a L0d1b L0d1c L0d2a L0d2b L0d2c L0d2d L0d3 L0dx L0k1
 KWE
 XUN
 JOH
 GUG
 NAM
 KHO
 CNC
 CAC
 COL
 KAR
 143 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
The contour plots reflected the distributions of the L0d/k subgroups as discussed in section 
3.1.3.2. Certain haplogroups (L0d3 and L0d2a) had higher frequencies in the southeast 
than the northwest; others had a more gradual and central distribution (L0d1a, L0d1b, 
L0d2c and L0d2b), while some had higher frequencies in the north (L0k1 and to a certain 
extent L0d1c). 
 
In the contour plots of Figure 3.11, all the haplogroups except L0d2b and L0d1c seemed to 
have a unimodal distribution with a single point of highest frequency and then decreasing 
frequencies from there in a clinal fashion. L0d2b showed two peaks represented by the 
NAM and GUG, however, this sub-haplogroup was observed at too low frequencies to 
Figure 3.11  Contour plots indicating the frequency distributions of L0d/k subgroups 
 144 
have any significance. L0d1c also showed a bimodal distribution pattern. To analyse this 
further the L0d1c group was split up into two groups. The first group was the L0d1c1 
sequences as defined by Behar et al., (Behar et al., 2008), and are represented by the 
star-like expansion pattern within L0d1c in the network (Figure 3.6). The second group 
L0d1c- was the remaining L0d1c sequences after the L0d1c1 sequences were removed. 
Contour plots of these groups are presented in Figure 3.12. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
From Figure 3.12 it can be seen that L0d1c originally had a unimodal clinal distribution but 
a subsequent expansion in the L0d1c1 subgroup occurred that caused elevated 
L0d1c1 L0d1c- 
L0d1c 
A 
B C 
Figure 3.12 Contour plots of L0d1c split into two subgroups, L0d1c1 and the remaining 
L0d1c sequences (L0d1c-). 
 145 
frequencies in the XUN. L0d1c1 did not occur in the KWE and were at low frequencies in 
the JOH. This led to the overall bimodal distribution of L0d1c. 
 
To further analyse the individual haplogroup histories, to test if they had notable 
expansions and to date these expansions, mismatch distributions of the sub-haplogroup 
sequences were constructed (Figure 3.13 and Table 3.3) 
 
Table 3.3  Mismatch distribution statistics (haplogroups) 
HG 
Raggedness 
index ? T * Theta0 
Theta0 qt 5%-
 95% Theta1 Theta1 qt 5% - 95% 
Model 
(SSD) p-
 value 
L0d1a 0.013 6.285 27 958 0.017 0.000 - 1.325 31.250 14.805 - 99999.000 0.700 
L0d1b 0.033 6.805 30 271 0.002 0.000 - 1.130 11.646 6.190 - 84.458 0.230 
L0d1c 0.080 5.971 26 561 0.000 0.000 - 0.751 5.188 3.883 - 99999.000 0.160 
L0d2a 0.042 1.545 6 873 0.000 0.000 - 0.366 579.075 3.714 - 99999.000 0.680 
L0d2b 0.276 17.441  0.000 0.000 - 57.600 55.181 39.615 - 99999.000 0.000# 
L0d2c 0.135 0.000  0.000 0.000 - 0.000 428.125 0.000 - 0.000 0.000# 
L0d2d 1.000 10.088  0.002 0.000 - 6.161 99999.000 99999.000 - 99999.000 0.000+ 
L0d3 0.053 4.500 20 018 0.000 0.000 - 0.508 3.483 1.619 - 99999.000 0.600 
L0dx 0.528 8.813  0.002 0.000 - 14.400 9.384 5.060 - 99999.000 0.180# 
L0k1 0.055 1.393 6 197 0.000 0.000 - 0.028 99999.000 4.918 - 99999.000 0.600 
L0a 0.049 15.965 71 019 0.002 0.000 - 4.502 11.197 6.043 - 166.353 0.360 
M 0.019 9.559 42 522 0.290 0.000 - 2.406 86.104 47.432 - 99999.000 0.450 
R 0.014 10.424 46 370 1.752 0.000 - 2.387 34.883 23.632 - 183.008 0.490 
 
*  T ? Time before present that expansion took place (calculation explained in section 2.2.2.3) 
# expansion hypothesis rejected - 95% CI overlap 
+ excluded ? too few sequences 
HG ? Haplogroup, SSD - Sum of Squared deviation 
 
 
Mismatch distribution statistics and the results for a spatial expansion test are shown in 
Table 3.3 (the p-value indicate the probability that the simulated SSD (simulated under an 
expansion scenario) is not significantly different from the observed SSD). All L0d 
subgroups except L0d2b, L0d2c and L0dx tested positive for expansions. The haplogroups 
that indicated expansions with the highest significance was L0d1a and L0d2a. Their 
mismatch distributions showed smooth unimodal distributions with low raggedness values. 
Their ? (Tau) values, however, differed with the ? value of L0d2a indicating a much more 
recent expansion. ? values of the L0d1 haplogroups were similar (indicating expansions of 
around 27 000 years BP) while L0d3 had a smaller ? value and L0d2a and L0k1 the 
smallest (indicating expansions around 6 000 years BP). Both M and R haplogroups 
experienced expansions ~ 40 000 to 50 000 years BP. 
 
 146 
 
Figure 3.13  Mismatch distributions of L0d/k sub-haplogroups and comparative groups. # expansion hypothesis rejected - 95% CI 
overlap 
# 
# 
# 
 147 
Caveats associated with the coalescence analysis employed in mismatch distributions are 
the assumption of a single exponentially growing population and the large degrees of 
statistical uncertainty. Also, by applying these methods earlier population expansions can 
be obscured by recent population bottlenecks (Excoffier and Schneider, 1999). Mismatch 
distributions have been reported previously to have less ability to predict population 
expansions than neutrality test summary statistics such as Tajima?s D (Tajima, 1989), Fu?s 
Fs (Fu, 1997) and the R2 statistic (Ramos-Onsins and Rozas, 2002). Diversity estimates 
together with the neutrality tests for the L0d sub-haplogroups are shown in Table 3.4. Also 
included as comparative samples are other sub-haplogroups in the study group that had 
more than 10 representative sequences. 
 
Table 3.4  Diversity statistics and neutrality tests of L0d/k subgroups and comparative haplogroups 
Group N 
seq 
N 
Ht Hd pi ?S W-?S Ne 
Tajima's 
D 
Tajima's D  
p-value Fs 
Fs 
p-value R2 
R2 
p-value 
L0d 317 111 0.962 0.00732 0,01419 15.342 2730 -1.45547 0.028* -33.984 <0.001*** 0.0421 0.069 
L0d1a 37 21 0.964 0.00436 0.00584 6.468 1151 -0.84168 0.210 -8.871 0.001** 0.0849 0.120 
L0d1b 79 29 0.926 0.00279 0.00551 6.072 1081 -1.51856 0.040* -17.250 <0.001*** 0.0488 0.040* 
L0d1c 59 12 0.760 0.00248 0.00388 4.305 766 -1.09831 0.136 -1.628 0.273 0.0714 0.186 
L0d2a 91 27 0.722 0.00110 0.00463 5.116 910 -2.29659 <0.001*** -23.082 <0.001*** 0.0240 0.005** 
L0d2b 6 4 0.867 0.00458 0.00393 4.380 779 1.03370 0.864 1.229 0.701 0.2429 0.738 
L0d2c 17 6 0.588 0.00129 0.00239 2.662 474 -1.65319 0.032* -1.475 0.134 0.1135 0.128 
L0d2d 3 2 0.667 0.00357 0.00359 4.000 712 na Na na Na na na 
L0d3 21 7 0.710 0.00121 0.00124 1.390 247 -0.08107 0.511 -1.287 0.170 0.1327 0.432 
L0dx 4 3 0.833 0.00238 0.00244 2.727 485 na na na na na na 
L0k1 31 5 0.738 0.00110 0.00089 1.001 178 0.58176 0.742 -0.044 0.518 0.1522 0.721 
     
 
        
L0a 21 12 0.905 0.00590 0.00526 5.837 1039 0.30935 0.676 -1.317 0.290 0.1475 0.725 
L2a 29 10 0.852 0.00332 0.00367 4.074 725 -0.29380 0.430 -0.739 0.382 0.1153 0.472 
L3d 20 6 0.558 0.00281 0.00253 2.819 502 0.41429 0.700 2.092 0.849 0.1506 0.662 
L3e 22 16 0.957 0.00575 0.00545 6.035 1074 0.08399 0.601 -5.350 0.056 0.1345 0.611 
M 19 18 0.994 0.00620 0.01340 14.592 2596 -2.17227 0.004** -11.280 <0.001*** 0.0462 <0.001*** 
R 38 35 0.994 0.00793 0.01702 19.326 3439 -1.93058 0.010** -27.889 <0.001*** 0.0466 <0.001*** 
 
*     p-value < 0.05 
**    p-value < 0.005 
***   p-value < 0.001 
 
The effective population size of females (Ne) was estimated from W-?s as explained in 
section 2.2.3. The two non-African macro-haplogroups had very big effective population 
sizes while the African haplogroup Ne was smaller. In the L0d subgroups the largest Ne 
was detected in L0d1a, L0d1b and L0d2a while L0d3 and L0k1 had the smallest Ne.  
 
Under neutral expectations with random mating, constant population sizes and no selection 
pi and ?
  
should be equal (Jobling et al., 2004c). Neutrality tests were done to detect 
deviations from the assumptions of neutrality and constant population size. Significantly 
negative Tajima D and Fs values and significantly positive R2 values indicate population 
 148 
growth and/or positive selection. The Fs and R2 statistic have been reported to detect 
population expansions very successfully (Ramos-Onsins and Rozas, 2002; Pilkington et 
al., 2008). Fs is based on the probability of drawing a number of haplotypes that is greater 
or equal to the observed number of samples drawn from a population of constant size. R2 
is based on the difference between the average number of nucleotide differences and the 
number of singleton mutations. The R2 statistic is especially powerful when sample sizes 
are small (~10) and Fs have a greater ability to detect population expansions when sample 
sizes are large (~50) (Ramos-Onsins and Rozas, 2002; Pilkington et al., 2008). 
 
In the comparative groups the non-African haplogroups M and R tested positive for 
population expansion in all three neutrality tests with highly significant P-values. The L0d 
group as a whole had significant D and Fs values but not for R2. R2, however, does not 
perform reliably at large sample sizes (Ramos-Onsins and Rozas, 2002). Of the L0d 
subgroups L0d2a had the highest significance in all three neutrality tests. L0d1b also 
attained significance in all three tests while L0d1a had a very significant Fs value but did 
not reach significance in the Tajima?s D and R2 tests.  
 
While neutrality tests are widely employed to test hypotheses of population expansion 
events, recent improvements in coalescence inference methods led to increased accuracy, 
without the need to assume a single exponential growth curve (Shapiro et al., 2004; 
Atkinson et al., 2008). One of these methods, Bayesian Skyline Plots (BSPs) (Drummond 
et al., 2005), were employed to further visually represent the changes in Ne through time,  
were constructed for each haplogroup (Figure 3.14). 
 
The BSPs of all the L0d sub-haplogroups, except L0d1a, indicated a recent increase in Ne 
(Figure 3.14). L0d1a had an increase that started around 25 000 ? 30 000 years BP and a 
recent decrease that started around 5 000 years BP. L0d1c had a constant population size 
over a extended period and then similar to L0d1a, started to decrease around 5 000 years 
BP. Around a 1 000 years BP, however, it increased rapidly. L0d1b had an increase that 
started around 14 000 years BP and a further increase recently. Despite a shallow 
coalescence time, L0d2a showed a dramatic increase from 8 000 years BP onwards and a 
further recent increase. The L0d3 BSP profile that included east African and the Kuwait 
 149 
haplotypes (L0d3+) showed a slow decline over an extended period followed by a recent 
increase in Ne. The L0d3 profile that only included the southern African L0d3 haplotypes 
(L0d3-) showed a more intense decline and an increase that started later than in the L0d3+ 
profile. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 150 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 3.14  Bayesian Skyline plots of haplogroups showing changes in Ne through time. A log scale of Ne is 
represented on the Y-axis, while years before present is represented on the X-axis, with the present indicated 
by 0. L0d3+ is L0d3 including the east African and Kuwait sequences. L0d3- includes only L0d3 sequences 
from the present study. The black bold vertical lines indicate the coalescence date and the lighter vertical lines 
the 95% confidence intervals for the coalescence. The blue lines indicate the 95% confidence intervals for the 
plot-lines 
L0d3- L0d3+ 
L0d2a L0d1b 
L0d1c L0d1a 
 151 
3.3.3 Discussion of analyses of Khoe-San associated haplogroups L0d and 
L0k  
 
In the following section each of the L0d/k sub-haplogroups will be discussed with regard to 
the different analyses presented in previous sections. The placing of the haplogroup on the 
network and tree, the TMRCA dating and the frequencies in the different populations will be 
discussed and compared to published findings. Phylogenetic results and dating might differ 
from published studies, especially the whole genome sequencing studies, due to different 
lengths of sequence investigated and different methods employed. These differences are 
also highlighted in following sections. Furthermore patterns in the network, the 
geographical spread of the haplogroup together with the evidence of population growth 
signals are interpreted and linked with evidence from other disciplines such as 
archaeology, linguistics and ethnography to infer possible histories for the lineages 
involved. 
 
L0k 
 While many sequences (31) belonged to haplogroup L0k, they were represented by only 
five different haplotypes, all belonging to L0k1. The L0k branch on the tree in Figure 3.5b 
grouped with the L0d branch and not with the L0a branch as was established previously 
doing whole genome sequencing (Behar et al., 2008). Both branches that separated L0k 
from L0a, however, had aLRT branch support of just above 60%. Comparing the network 
(Figure 3.6) with Behar et al., (represented in Figure 3.7) classification (Behar et al., 2008) 
one can see that the 263 and the 146 mutations separating L0k and L0a on the network 
should move to be L0 defining mutations. Instead L0a and L0k should group on the same 
branch with 189G and 16172C defining the common branch (in the network they were on 
separate L0a and L0k branches). Furthermore the L0k clade defining mutations in the 
network (16166, 16209, 16214, 16291, 198) compared well with Behar et al., (Behar et al., 
2008). The only exception being 207A which also should be a clade defining mutation and 
then revert in a subgroup of the sequences according to Behar et al. In the present network 
it should thus move to precede the group of KWE haplotypes and then subsequently revert 
to the ancestral type in this group only.  
 
 152 
L0k has two sub-haplogroups L0k1 and L0k2, that separated ~40 000 years BP (Behar et 
al., 2008). As yet there have been only one report of L0k2 in an individual from Yemen 
(Behar et al., 2008), while L0k1 was found exclusively in the San groups (Vigilant et al., 
1991; Chen et al., 2000; Tishkoff et al., 2007; Behar et al., 2008). All sequences in the 
current sample group were L0k1 and coalesced 13 488 years BP (+/- 8 538) and diverged 
from other sequences in L0 75 766 years BP (+/-25 039) (Table 3.2). This is shallow times 
compared to what was found previously for L0k (39 683+/-8 730 for the coalescence and 
142 860+/-11 905 for the split) (Behar et al., 2008).  
 
In the present study, L0k was limited to the northern Khoe-San groups whilst the southern 
groups and the central Kalahari groups from Botswana showed no presence of the L0k 
haplogroup (Figure 3.3, 3.4). The northern San groups contained the highest percentages 
while the Nama contained lower levels (Figure 3.3, 3.4). This can be explained by the fact 
that the Nama originally came from an area currently known as the northern parts of the 
Cape Province (SA) and recently moved into the Namibia area (Barnard, 1992). They 
therefore could be regarded as a southern Khoe-San group rather than a northern group. 
The L0k1 found in the Nama is thus most likely because of recent gene flow from the San 
people of northern Namibia (such as the Ju\?hoansi and !Xun).  
 
The frequencies of L0k1 in the !Xun and Khwe from this study (27%, 28%) (Figure 3.3, 3.4) 
agreed with previous studies that reported frequencies of 26% in the !Xun and 23% in the 
Khwe (Chen et al., 2000) (Table1.2). Tishkoff et al. reported frequencies of 22% in a 
combined !Xun and Khwe group (Table1.2) (Tishkoff et al., 2007). Our results for the 
Ju\?hoansi (JOH in Figure 3.3, 3.4), however, differed somewhat from what was found 
previously where only 4% of the Ju\?hoansi lineages were resolved into haplogroup L0k1 
while L0d was previously found to be the most prevalent haplogroup (96%)  (Vigilant et al., 
1991) (Table1.2). In our study L0k1 represented 24% Ju\?hoansi group and L0d 71%. The 
two Ju\?hoansi groups were not from the same locations. While the group of the present 
study was sampled in Tsumkwe, the published group was sampled in Botswana (Dobe) as 
well as in Namibia. 
 
 153 
Since previous studies only reported on the mtDNA haplogroup frequencies of the three 
northern San groups (!Xun, Khwe and Ju\?hoansi) the low frequency of L0k1 in the Khoe 
and the absence in the southern San and Coloured groups have never been noted before. 
Salas et al., however, noted its complete lack in southeastern-Bantu-speakers contrasting 
with L0d (Salas et al., 2002). Previously, it was thought that the history of L0d and L0k is 
closely intertwined and synonymous with Khoe-San history (Salas et al., 2002; Behar et al., 
2008; Atkinson et al., 2009). From the present study it was clear that, although all groups in 
this study with Khoe-San ancestry had L0d in common, L0k was only associated with the 
northern Khoe-San groups (Figure 3.3, 3.4, 3.11). 
 
The history of the L0k1 haplogroup might be closely tied up with the Khwe rather than the 
rest of the San groups. It was the haplogroup with the highest frequency in the Khwe while 
in the other northern San groups (!Xun and Ju\?hoansi) it was secondary to L0d groups 
(Figure 3.3, 3.4) and might have been introduced to these groups through gene flow with 
the Khwe and other Khoe-speaking San groups. The low L0k1 haplogroup diversities 
suggest only few founders (Figure 3.3). In the network and tree (Figure 3.6 and 3.5b) it 
could be seen that all the Khwe sequences belonged to one haplotype and that the Khwe 
haplotype was ancestral to the haplotypes observed in the !Xun, Ju\?hoansi and Nama. 
This then suggested that L0k1 was originally a Khwe haplogroup and spread to the other 
northern San groups, where it diverged further. In the study by Chen et al., L0k1 also was 
the predominant haplogroup in the Khwe (Table 1.2) (Chen et al., 2000). Furthermore all 
seven L0k1 sequences identified in the Khwe by Chen et al., was identical to the L0k1 
Khwe haplotype of the present study while ten of the eleven L0k1 sequences in the !Xun 
was derived from the ancestral Khwe haplotype (one !Xun sequence had the Khwe 
haplotype) (Chen et al., 2000). 
 
It is unclear where the Khwe originally came from. Theories are that they are Khoe-San 
groups with extensive Bantu-speaking admixture, Bantu-speakers that lost their cattle, 
another pastoralist population closely related to Bantu-speakers who occupied the region 
before the Bantu expansions or maybe a mixture of various refugee groups driven from the 
grazing grounds into the Okavango swamps (Cashdan, 1986). Genetic results from the 
present study indicated that the maternal lines of the Khwe showed contributions from 
 154 
southeastern Bantu-speakers and Khoe-San (Figure 3.8). In addition they might have had 
a unique contribution from an unknown pastoralist or hunter-gatherer population that 
carried the L0k1 maternal lineage, whose identity has since been lost. The discovery of the 
L0k2 haplogroup in an individual from Yemen (Behar et al., 2008) suggests that the L0k 
haplogroups might have had an extensive spread in prehistoric Africa but remnants of the 
haplogroup in other populations have been lost due to drift or has not been detected due to 
insufficient sampling.  
 
It would be interesting to know the L0k1 frequency in the other Khoe-speaking San groups. 
Linguistically, the Khwe belong to one of the three main groups in the western branch of 
the Khoe-speaking San groups, the other two groups being the Naro and the /Gui and 
//Gana (G?ldemann, 2006b). No L0k1 haplogroups were found in the group of /Gui + 
//Gana + Kgalagari individuals (GUG), and the two Naro individuals had only L0d 
haplogroups. Furthermore, serogenetic studies showed that the Naro was genetically more 
similar to the Ju\?hoansi and ?X?ao//??esi rather than to the Khwe (Jenkins, 1982). To date, 
no genetic studies have been done on the eastern Khoe-speaking San groups including 
the Tshua and Shua of eastern Botswana. They have more in common phenotypically to 
the Khwe, than the western Khoe-speaking San in that they resemble Bantu-speakers 
(Dornan, 1975; Barnard, 1992). The Tshua and Shua may be genetically closer related to 
the Khwe even though the Naro and the /Gui and //Gana are linguistically more related. A 
very interesting linguistic connection is that one of the eastern Khoe-speaking San 
languages, Hietshware, is the Khoisan language that is closest related to the extinct Kwadi 
language of western Angola, which in turn is connected to the click language of the 
Sandawe of eastern Africa (G?ldemann, Forthcoming-b; G?ldemann and Elderkin, 
Forthcoming). No traces of L0k have been found in Tanzania and Kenya so far but the 
presence of L0k2 in Yemen suggests a trans African spread of this haplogroup. If they 
contain high frequencies of L0k1, such as found in the Khwe, it might be indicative of 
another ancient hunter-gatherer population that lived northeast of the Khoe-San groups, 
prior to the spread of the Bantu-speaking-groups. This group might have had linguistic and 
genetic connections with both the Khoe-San and Sandawe. As discussed in section 
1.2.2.4, an ideal candidate for such a group might be the Pygmy groups that lived north of 
the Khoe-San before the Bantu-expansions. Autosomal studies show genetic similarities 
 155 
between Mbuti Pygmy groups of east Africa and the Khoe-San. MtDNA studies, however, 
have found no L0d or L0k in the Pygmy groups studied so far (Quintana-Murci et al., 2008). 
Mostly Pygmy groups are assigned to a specific L1c haplogroup. The southern Ba-Twa 
Pygmies have, however, not been studied genetically and it is possible that they might 
contain maternal genetic connections to the Khwe.  
 
All Pygmy groups lost their original language, which make linguistic connections 
impossible. The linguistic connection between Hietshware, Kwadi and Sandawe is 
extremely ancient and barely distinguishable (G?ldemann, Forthcoming-b; G?ldemann and 
Elderkin, Forthcoming). The limit of tracing relationships between languages is ~10 000 
years. If there is a genetic counterpart to the linguistic connection between the Sandawe 
and Khoe-San, the age of convergence of genetic lineages cannot much be older than this 
limit. The TMRCA of all L0k1 lineages from this study was between 7 000 and 19 000 
years BP depending on the mutation rate employed (Table 3.2). For L0k1 to be the 
maternal genetic counterpart to the linguistic connection between the Sandawe and Khoe-
 San, more sampling of African sequences is necessary to establish if and where other L0k 
sequences are found in Africa.  
 
L0d 
Altogether, 317 sequences were resolved into haplogroup L0d and its sub-haplogroups 
and 111 unique haplotypes were identified. Haplotypes were grouped into the seven sub-
 haplogroups according to Behar et al., (Behar et al., 2008) and two extra previously 
unidentified haplogroups (Figure 3.7). Overall, there was good agreement in the resolution 
of haplotypes in the present study (Figure 3.6) with the study based on whole genome 
sequences (Figure 3.7) published by Behar et al., (Behar et al., 2008). The L0d clade-
 defining mutation, 16243 T-C, is a very stable mutation and did not reoccur or revert in our 
sample set of 538 sequences.  
 
The L0d haplogroup was estimated to have a coalescence time of 87 705 years BP (+/- 18 
067) and diverged from the other L0 groups 96 601 years BP (+/- 20 138) (Table 3.2). This 
compared well to whole genome studies (Behar et al., 2008), which calculated the 
coalescence at 100 795 (+/-10 317) and the divergence at 152 384+/-12,698. 
 156 
 
All of the Khoe-San and Coloured groups, with the exception of the Khwe, had L0d as their 
most frequent haplogroup. The Khwe L0d frequencies were lower than that found in the 
southeastern Bantu-speaking groups (Figure 3.3 and 3.4). In the remaining Khoe-San and 
Coloured groups the frequencies of L0d ranged from exclusive (Karretjie people -100%), to 
very high (?Khomani - 98%, Coloured-Northern Cape - 93%, /Gui + //Gana + Kgalagari -
 90%), moderate (Nama - 71%, Ju\?hoansi - 71%, !Xun - 65%, Coloured-Coleberg - 65%) 
and lower (Coloured-Wellington - 45%) (Figure 3.3 and 3.4).  
 
Other studies found similar L0d frequencies in the !Xun and the Khwe. Haplogroup L0d 
was found in the !Xun and Khwe at frequencies of 51% and 16%, respectively (Chen et al., 
2000) (Table 1.2) while in the present study the frequencies were 65% and 11%. Tishkoff 
et al. reported frequencies of 61% in a combined !Xun and Khwe group (Table1.2) (Tishkoff 
et al., 2007). 
 
Again as was noted for L0k, the L0d frequencies in the Ju\?hoansi from our study (71%) 
(Figure 3.3, 3.4) did not compare to what was published for a different Ju\?hoansi group 
(96%) (Vigilant et al., 1991) (Table1.2). Although both Ju\?hoansi groups showed little 
admixture from Bantu-speaking groups (Figure 3.3, 3.4, Table1.2) the Ju\?hoansi from the 
present study had proportionally more L0k1 and less L0d contribution than the Ju\?hoansi 
from the published study (Vigilant et al., 1991) (Table1.2). As the published group was 
sampled in Botswana (Dobe) as well as in Namibia while the group of the present study 
was sampled in Tsumkwe, it is possible that the Tsumkwe group had more admixture with 
the neighboring !Xun (Figure 1.2). 
 
Interesting patterns are overlooked by only considering the distribution of the L0d group as 
a whole among the different Khoe-San and Coloured groups. The distribution of L0d sub-
 haplogroups in published studies was by no means homogenous for the different groups 
(Table 1.3). In the present study differential distribution of the L0d sub-haplogroups were 
also observed and their distributions are visually represented by different contour plots 
(Figure 3.11). The most striking feature was the absence or low frequencies of L0d1c in the 
southern groups as well as L0d3 and L0d2a in the northern groups. Furthermore the L0d/k 
 157 
sub-haplogroups showed that they had different associated histories when comparing their 
expansion dynamics through looking at mismatch distributions, neutrality tests and 
Bayesian Skyline Plots (BSPs) (Table 3.3, 3.4 and Figure 3.13). Their varied distribution 
coupled to dissimilar expansion patterns clearly indicated that the history of L0d as a whole 
is not homogenous over sub-groups and may not correctly represent individual dynamics. 
Rather, each sub-haplogroup must be studied separately in accordance with the histories 
of its carrier population group and region of occurrence. 
 
L0d3 
L0d3 have been identified as the oldest L0d clade previously (Behar et al., 2008) and also 
occurred as the earliest L0d branch on our tree and network (Figure 3.5 and 3.6). The 
divergence from the other L0d sequences dated to 71 174 years BP (+/- 23 375) (Table 
3.2). This was the earliest split in the L0d branch (excluding the problematic L0d2b branch 
which will be discussed later). L0d3 sequences coalesced 35 587 years BP (+/- 15 157). 
According to the whole genome study of Behar et al., it separated from the other L0d3 
groups ~100 000 years BP (Behar et al., 2008). Only two L0d3 sequences formed part of 
the whole genome study, one from a San individual and one from an individual from 
Kuwait, these two sequences coalesce ~31 000 years BP (Behar et al., 2008).  
 
The seven clade-defining mutations (150T, 316A, 523insCA, 16290T, 16300G) that were 
previously proposed were based on 2 sequences (Behar et al., 2008). The L0d3 clade in 
the present study is represented by seven haplotypes and based on data displayed in the 
network (Figure 3.6), it is suggested that 16300G and 523delCA should be removed as 
clade defining mutations for L0d3 since earlier sequences in our network did not contain 
these mutations.  
 
Prior to the Behar et al., (Behar et al., 2008) study, Gonder et al., did whole genome 
studies on a wide range of African sequences including the Platfontein !Xun and Khwe and 
the Sandawe from Tanzania (Gonder et al., 2007). The !Xun/Khwe and Sandawe were the 
only groups that contained L0d sequences (Table 1.2). When the sequences from Gonder 
et al., were classified according to the classification introduced by Behar et al., all the 
!Xun/Khwe L0d sequences belonged to L0d1 or 2 (Table 1.3), while all the Sandawe L0d 
 158 
sequences belonged to L0d3. Since L0d sub-haplogroup classification was not formalized 
at that time, Gonder et al., coined the L0d1+2 and L0d3 groups they observed in their 
phylogeny, L0d-South Africa and L0d-Tanzania. According to the Gonder et al., whole 
genome study the two L0d branches separated ~58 000 years BP, the L0d-South Africa 
branch coalesced 90 000 years BP and the L0d-Tanzania branch 31 000 years BP 
(Gonder et al., 2007). 
 
L0d3 sequences (L0d-Tanzania from Gonder et al.) were identified in various Khoe-San 
groups in the present study. When analyzing these L0d3 sequences with the L0d3 
sequences from Tanzania and Kuwait (Behar et al., 2008), two clear separate groups were 
formed. The Tanzania/Kuwait group formed a subgroup of the southern Africa L0d3 group 
(Figure 3.15).  It is suggested that the Tanzanian/Kuwait sequences is coined as L0d3 
subgroup, L0d3a, that is defined by the 16129, 16274 reversion and 16399 mutations. The 
closest related haplotype in the southern Africa branch of L0d3 to the Tanzania/Kuwait 
branch (L0d3a) occurred in the Karretjie and Coloured groups from Colesberg. A haplotype 
found in the Karretjie and Coloured groups was directly ancestral to the L0d3a branch. 
When the Tanzanian, Kuwait and present study L0d3 sequences were put together, the 
whole L0d3 clade diverged from other L0d sequences ~83 000 years BP while all the L0d3 
sequences coalesced 47 000 years BP (Table 3.2). The divergence of the L0d3a sub-
 haplogroup from the southern Africa haplotypes was dated at ~41 000 years BP. L0d3a 
sequences converged ~28 000 years BP. 
 
The present study confirmed that the L0d3 branch was not limited to Tanzania (Gonder et 
al., 2007; Tishkoff et al., 2007). L0d3 was present in the southern Khoe-San and Coloured 
groups but almost absent in the northern groups (Figure 3.7 and 3.8). Although L0d3 had 
low frequencies compared to the other L0d subgroups, in all the southern groups, its 
distribution clearly showed a south-north cline. L0d3 had the highest frequency in the 
southern groups, it declined northwards in the central groups and was absent in the 
northern groups (only one !Xun individual was assigned to L0d3) (Figure 3.7 and 3.8). The 
earliest haplotypes in the L0d3 clade were the !Xun individual and a BS individual.  The BS 
individual, however, was from the south, a Zulu from the Drakensberg area where the 
 159 
Duma San individuals were collected (in close proximity to the Karretjie and Coloured 
groups). 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
The low frequency in the northern groups are confirmed by the Gonder et al., and Tishkoff 
et al., studies that found no L0d3 sequences in their group of !Xun and Khwe (Gonder et 
al., 2007; Tishkoff et al., 2007). Also when the haplogroups of the !Xun and Khwe from the 
Chen et al., study (Chen et al., 2000) and the Ju\?hoansi group from Vigilant et al., (Vigilant 
et al., 1991) were classified according to the Behar et al., nomenclature (Behar et al., 2008) 
, no L0d3 sequences were observed. When the results from the three studies mentioned 
above were taken together with results from the present study, a group of 225 !Xun, Khwe 
and Ju\?hoansi were screened and only one !Xun individual (from the present study) 
contained an L0d3 sequence. This indicates an extremely low incidence of L0d3 in the 
northern San groups. 
 
Figure 3.15  L0d3 branch after adding comparative published 
sequences. Tanzanian (dark green), Kuwait (Purple), other colours 
according to Figure 3.6. Yellow clade - southern African branch. Light 
green clade ? Tanzanian and Kuwait branch 
 160 
Tishkoff et al., discussed the possibility that the linguistic connection between the Sandawe 
and northern Khoe languages was associated with the L0d genetic connection (Tishkoff et 
al., 2007). They concluded that the maternal genetic connection between the two groups 
was very deep (>15 000 years) and it was unlikely that linguistic trace can be detected that 
far back. 
 
From the present study it was clear that it is unlikely that the linguistic connection of the 
Sandawe to the northern Khoisan-speaking groups was associated with the L0d3 lineage. 
Although L0d3 was the exclusive L0d lineage in the Sandawe, it was almost completely 
absent in the northern Khoisan-speaking groups. In contrast to this absence in the northern 
groups, the southern groups contained higher L0d3 frequencies and the frequencies were 
the highest is in the Karretjie group from Colesberg. Furthermore L0d3 sequences were 
detected in a Bantu-speaking individual from Mozambique (Salas et al., 2002) as well as 
an individual from northern Kenya (Watson et al., 1997) and an individual from Kuwait 
(Behar et al., 2008). This suggests an L0d3 spread along the eastern part of Africa forming 
a connection between the southeastern Khoe-San groups and the Tanzanian Sandawe 
rather than between the northwestern Khoe-San groups and the Sandawe as the linguistic 
connection suggests. 
 
To investigate the expansion history associated with the L0d3 haplogroup that led to its 
geographic spread, the expansion dynamics where investigated. While the expansion 
hypothesis was not rejected using mismatch distributions, all three neutrality tests used, 
rejected an expansion for L0d3 (Table 3.4, Figure 3.13 and Table 3.3). BSP analysis 
(Figure 3.14) indicated that the southern African L0d3 show a steady decline from the 
coalescence point onwards with a sharp increase starting ~2 000 years BP. When the east 
Africa sequences were included the decrease was not as severe and the recent expansion 
started earlier (~4 000 years BP). The recent expansion phases of the haplogroup 
correlated with the introduction of pastoralism in east Africa and southern Africa, 
respectively. It is therefore likely that the populations that carried L0d3 either adopted the 
herding economy or benefited from it.  
 
 161 
L0d1 and L0d2 
The L0d1,2 branch was separated from their ancestral node by two mutations, 489delC 
and 16278C! (! Indicates back mutation compared to the Cambridge Reference Sequence) 
(Figure 3.6 and 3.7). No HVS mutations differentiate between L0d1 and L0d2 (Figure 3.7), 
they are, however, defined by 5 and 6 coding region mutations, respectively (Behar et al., 
2008). When analyzing only the HVS, the whole level separating L0d1 from L0d2, 
collapses. This can be seen in the network where all subgroups met at two central nodes 
(marked with * and ** in Figure 3.6). The absence of the 523insCA mutation (seen as an 
L0d1a-L0d1b defining mutation in Figure 3.7) separated the L0d2 sequences from the 
L0d1 sequences in the network. In Figure 3.7 L0d1a/b had the 523insCA mutation but 
L0d1c did not have the 523insCA mutation. In the network all three L0d1 sub-clades as 
well as the newly identified L0dx had the 523insCA mutation but in L0d1c it was seen as a 
back-mutation early in the clade. All of this caused the L0d2 groups to converge at node ** 
in Figure 3.6 and L0d1 groups at node *.  
 
L0d1 
In the network (Figure 3.6) L0d1c and L0d1a grouped on one branch because of the 
common 16234T mutation. According to whole genome sequencing, however, this was not 
a common ancestral event in these two branches and should rather be separate events on 
each branch (Figure 3.7). The 523insCA should group L0d1a and L0d1b together and not 
occur in all three L0d1 clades and then be lost due to back-mutation in L0d1c as seen on 
the network and described earlier. 
 
Results from the present study indicated that the three L0d1 sub-haplogroups (L0d1a, 
L0d1b and L0d1c) showed differential spread among the different Khoe-San and Coloured 
groups and had different associated histories. According to whole genome studies, L0d1 
diverged from L0d2 ~90 000 years BP and all L0d1 sequences coalesce 53 000 years BP 
(Behar et al., 2008). The present study dated the coalescence at 57 000 years BP (Table 
3.2).  
 
 162 
L0d1a 
L0d1a was further defined by 16223C!, 199C and 16266G. The 199C mutation occurred in 
one other African sequence (one L0dx sequence), four haplogroup M sequences and one 
haplogroup N sequence. The 16223 and 16266 mutations are both highly reoccurring 
mutations. The 16266G position mutated to a 16266A further on in the L0d1a clade in a 
subset of sequences. 
 
The haplotype diversity for L0d1a (0.96) was the highest of all the L0d sub-haplogroups in 
the study (Figure 3.7). The present study contained 37 HVS sequences that included 21 
haplotypes. The 21 haplotypes converged 37 000 years BP (Table 3.2 and Figure 3.9). 
This is a much later date than the whole genome study indicated (Behar et al., 2008). The 
whole genome study included three L0d1a sequences (one Khoe-San and two Bantu-
 speaking), which converged ~18 000 years BP. This can be explained by the fact that the 
whole genome study did not include any of the haplotypes from the early branches of 
L0d1a identified in the network compiled from the present study (Figure 3.6). 
 
L0d1a had a central distribution with the highest frequencies in the regions occupied by the 
?Khomani (Figure 3.7, 3.8 and 3.11). This haplogroup had low frequencies in most of the 
populations (<20%) but was geographically widespread and present in most groups (Figure 
3.7 and 3.8). Even though the L0d1a frequencies were much lower in the northern groups 
than in the central and southern groups, the northern groups contained the oldest L0d1a 
haplotypes when looking at the network and trees (Figure 3.5b and Figure 3.6). The two 
earliest branches of L0d1a contained only individuals from northern groups while the later 
branches contained mostly central and southern groups with no northern group haplotypes. 
The northern group haplotypes were not directly ancestral to the central and southern 
haplotypes but were closer related to the common ancestor (Figure 3.6).  
 
The BSP of L0d1a (Figure 3.14), showed a clear indication of an expansion that started 
between 25 000 and 30 000 years BP as well as a recent decline in population size from 3 
000 ? 4 000 years BP to present. The L0d1a network (Figure 3.6) showed a star-like 
expansion pattern associated with the southern groups. The central haplotype of the 
pattern was small and derivative sequences accumulated several mutations. This pattern is 
 163 
indicative of an older expansion in which the central haplotype declined over time and 
derivative haplotypes accumulated mutations. A mismatch distribution of L0d1a showed a 
smooth unimodal distribution with a low raggedness index that indicated a single expansion 
of the haplogroup some time in the past indicated by the ? value (Figure 3.13 and Table 
3.3). When ? was converted to years the expansion was dated to ~28 000 years BP. Of the 
three neutrality tests employed, only the Fs test showed a significant indication of an 
expansion (Table 3.4). The Fs statistic, however, is used widely and several studies 
showed it to be an accurate indicator of expansions (Ramos-Onsins and Rozas, 2002; 
Ramirez-Soriano et al., 2008).  
 
The high genetic diversity of L0d1a, widespread geographic distribution patterns and 
location specific expansion patterns, suggested population fragmentation, isolation and re-
 expansion. The expansion of L0d1a is chronologically associated with the start of the LSA 
(20 000 to 30 000 years BP) in the archaeological record. The archaeological record 
indicates technological innovation and the emergence of belief systems during this time. 
Certain sites including Lesotho, southern Cape, Caledon valley, southern Namibia and the 
southern Kalahari have indications of ?higher energy? human settlement (Deacon and 
Deacon, 1999; Mitchell, 2002). These sites more or less overlap with the distribution of 
L0d1a (Figure 3.11). A period of population growth was clearly indicated in the BSP of 
L0d1a and the carriers of this haplogroup contributed to population expansions during this 
period. In a period spanning ~25 000 years the L0d1a Ne increased from 20 000 to more 
than 100 000. The BSP of L0d1a furthermore showed a recent decline that started ~4 000 
years BP. The archaeological record, however, indicates a drastic further increase in 
population size in the last 4 000 years BP. This increase was more prominent from 2 000 
years BP onwards when herding was introduced in southern Africa (Deacon and Deacon, 
1999; Mitchell, 2002). Reasons for the decline might be that groups carrying the L0d1a 
haplogroup in high frequencies were out-competed and displaced by other groups that 
expanded during this stage. These might be population groups moving in from other areas, 
or drift effects of other haplogroups increasing within the same population.  
 
The L0d1a BSP thus demonstrated the complete opposite picture from L0d3. While L0d3 
showed no evidence of a population expansion during the technological innovation period 
 164 
of the early LSA a clear expansion pattern was observed for L0d1a. The population 
dynamics of the last 4 000 years BP was also reversed. While the Ne of L0d3 showed a 
sharp increase during the last 4000 years and at time of introduction of pastoralism, the 
L0d1a Ne showed a decline. Thus while the populations that carried L0d3 benefited from 
the introduction of pastoralism (either by directly adopting the lifestyle or benefiting from it 
through trade relations) the groups that carried L0d1a were negatively affected by 
pastoralism. It could be that L0d1a was the predominant group in the hunter-gatherer 
people that were displaced by the pastoralist groups or lifestyle. 
 
L0d1b 
The L0d1b clade is further defined by 16239T and 16294T (Figure 3.7). A large subset of 
sequences that were analysed (5 haplotypes containing 10 sequences), however, did not 
contain the 16239 mutation. They grouped on the network (Figure 3.6) as a sister group to 
an early branch in the L0d1b group that had not yet acquired the 16239 mutation. Again it 
might be that this group, rather than being an early branch without the mutation, lost the 
16239 mutation. Again, this hypothesis will have to be ascertained with whole genome 
sequencing, but for now it is suggested that 16239 should not be used as a clade defining 
mutation for L0d1b. Only four haplotypes were used in the whole genome classification of 
L0d1b (Behar et al., 2008) and such a back-mutation was not present in these haplotypes. 
The 16239 mutation occurred furthermore in four L3e1g sequences and one sequence in 
haplogroup U and H each. The 16294T mutation is a highly reoccurring mutation. 
 
The coalescence times of the 29 HVS haplotypes from the present study (~33 000 years 
BP) and the four haplotypes employed in the whole genome study (~35 000 years BP) 
(Behar et al., 2008) were similar (Table 3.2 and Figure 3.9).  
 
L0d1b had a distribution that is concentrated in the south and declined towards the north 
(Figure 3.11). The Cape Coloured group had L0d1b as their most prevalent L0d 
haplogroup, while in the other southern groups it was the second most prevalent (Figure 
3.7 and 3.8). Interestingly it was also the most prevalent group in the Ju\?hoansi while 
frequencies in the other northern groups were lower (Figure 3.7 and 3.8). Published studies 
 165 
also found L0d1b to be the predominant haplogroup in the Ju\?hoansi (Vigilant et al., 1991) 
while occurring at low frequencies in the !Xun (Chen et al., 2000) (Table1.3). 
 
The expansion dates for L0d1b indicated by the BSP and mismatch distributions matched 
the LSA archaeological record of the southern parts of Africa perfectly. According to 
archaeological sites the population density increased markedly from 13 500 years ago and 
particularly in the last 4 000 years (Deacon and Deacon, 1999; Mitchell, 2002). This is 
almost exactly the pattern observed for the L0d1b BSP (Figure 3.14). The first expansion 
began ~14 000 years BP and the second expansion ~3 000 years BP. In the first 
expansion the female Ne increased from ~20 000 to ~70 000 in a period of ~12 000 years. 
The second expansion was more rapid and the female Ne increased from ~70 000 to ~110 
000 in a period of ~3 000 years (Figure 3.14). The network also showed several star-like 
expansion patterns indicating that the haplogroup went through more than one phase of 
population growth (Figure 3.6). Furthermore, the mismatch distribution indicated more than 
one expansion (a recent and an older expansion) through a multimodal distribution (Figure 
3.13). Additionally, all three neutrality tests significantly supported statistics that indicated 
expansion (Table 3.4).  
 
From the L0d1b network, the groups involved in the expansions could be identified (Figure 
3.6). The older expansion (~14 000 years BP) consisted of two star-like expansion 
patterns. A smaller expansion that involved mainly the northern groups and a larger 
expansion involving mainly the southern groups. Derivative haplotypes of the larger 
expansion, however, included individuals from northern groups indicating a possible 
migration of individuals from the southern groups to the northern groups. The later 
expansion pattern (~3 000 years BP) had Ju\?hoansi haplotypes in the center haplotype 
and southern group haplotypes as derivative haplotypes. The central haplotype, however, 
was not as big as one would expect from a recent expansion. It might be that a population 
group that contains high frequencies of the central haplotype was not sampled in this 
study. 
 
Thus L0d1b was associated with the southern groups as well as the Ju\?hoansi but 
occurred at low frequencies in the other northern groups. The network suggested several 
 166 
instances of migration between the southern groups and the Ju\?hoansi. Furthermore the 
expansion times of the L0d1b haplogroup reflected expansions noted in the archaeological 
history of southern Africa.  
 
L0d1c 
In addition to 16234, L0d1c was further defined by 456T and 16129G!. While 16129 is a 
highly reoccurring mutation, 456T only appeared in two other sequences (one L0d1b 
sequence and one L0d2a sequence).  
 
The L0d1c haplogroup contained 59 L0d1c HVS sequences described by 12 unique 
haplotypes. The 12 haplotypes coalesced ~42 000 years BP and separated from L0d1a/b 
~60 000 years BP (Table 3.2 and Figure 3.9). According to the whole genome study L0d1c 
separates from L0d1a/b ~53 000 years BP and the six genomes studied coalesced 24 000 
years BP (Behar et al., 2008). 
 
L0d1c was completely absent or at very low frequencies in the southern groups (Figure 
3.11). It increased northwards in the central groups but the highest frequencies were in the 
northern groups, where it is the predominant L0d group (except in the Ju\?hoansi) (Figure 
3.7 and 3.8). Interestingly the L0d1c frequency was lower in the Ju\?hoansi and L0d1b 
rather was the most prevalent group (Figure 3.7 and 3.8). Results from the present study 
were also supported by published results, where L0d1c was the predominant group in the 
!Xun (Chen et al., 2000) and were undetected in the Ju\?hoansi (Vigilant et al., 1991) (Table 
1.3). 
 
Behar et al., furthermore classified a sub-clade of L0d1c, namely, L0d1c1 defined by the 
16242T, 16167T and 198T mutations (Behar et al., 2008). Four of the six L0d1c haplotypes 
in the whole genome study belonged to the sub-haplogroup L0d1c1 (Behar et al., 2008). In 
our study 6 of the 12 haplotypes belonged to L0d1c1, however, most L0d1c sequences fell 
into L0d1c1 (Figure 3.7). L0d1c1 could be seen on the network as a large star-like pattern 
at the tip of the L0d1c network, which indicated a recent expansion (Figure 3.6). The 
L0d1c1 sub-group contained more !Xun haplotypes than the earlier L0d1c haplotypes. 
When the contour plot of L0d1c was split between the early L0d1c haplotypes and the 
 167 
L0d1c1 haplotypes it was apparent that the early L0d1c haplotypes had its highest 
frequencies in the central /Gui + //Gana + Kgalagari group while almost absent in the !Xun 
(Figure 3.12). L0d1c1 haplotypes, however, had its highest frequency in the !Xun. It 
therefore seems that L0d1c was originally present in the /Gui and //Gana and then spread 
to the !Xun before L0d1c1 expanded. L0d1c1 was also present in the ?Khomani and 
Ju\?hoansi, although at lower frequency.  
 
The low frequency in the Ju\?hoansi was surprising given that the Ju\?hoansi is 
geographically located between the !Xun and the /Gui + //Gana. The Ju\?hoansi and !Xun 
lifestyles, however, are vastly different. While the Ju\?hoansi continued to live a foraging 
lifestyle the !Xun adopted crop cultivation and herding from the local Ovambo population 
with whom they have lived in close association for centuries (De Almeida, 1965; Barnard, 
1992). This can be very clearly observed in the BSP for L0d1c (Figure 3.14). The BSP 
indicated that the Ne started to decline at the time of the introduction of pastoralism to the 
area but then turned around dramatically about 1 000 years BP and increased rapidly. It 
might be that the groups carrying L0d1c did not initially adopt the pastoralist lifestyle and 
were outcompeted by pastoralists. The situation, however, dramatically switched when the 
groups (such as the !Xun) adopted pastoralism and this led to a fast increasing Ne. In this 
short period the Ne doubled from ~20 000 to ~40 000 (Figure 3.14).  
 
In the mismatch distribution the expansion hypothesis was not rejected, but it was the 
lowest value to be accepted of all the L0d haplogroups (Table 3.3). The ? value, however, 
indicated that the expansion evaluated was around the start of the LSA and not the recent 
expansion (Table 3.3). The recent expansion was, however, noticeable in the mismatch 
graph (Figure 3.13). The recent expansions of L0d3 and L0d1b could also be seen in the 
mismatch graph but the LSA expansions were evaluated rather (indicated by the ? value) 
(Figure 3.13 and Table 3.3). Thus the mismatch distributions showed the recent 
expansions in the mismatch graphs but did not test their significance or note their ? value. 
Therefore, when the expansion hypothesis for L0d1c was not rejected it was based on an 
expansion during the LSA transition. The BSP plot (Figure 3.14) showed a slight increase 
in population size from around 25 000 years BP until 5 000 years BP. This increase, 
 168 
however, was not comparable to the dramatic increase seen in L0d1a and L0d1b. This 
could also be seen in the Model (SSD) p-value of the mismatch distribution (Table 3.3), 
where the value of L0d1c was much lower than the values of L0d1a and L0d1b, although 
the time frames were more or less the same. 
 
The expansion hypothesis was rejected in all three neutrality tests (Table 3.4). The recent 
expansion in L0d3 was, however, also not detected by the neutrality tests and it might be 
that neutrality tests, similar to mismatch distributions, are not sensitive to recent 
expansions. 
 
To summarise, L0d1c showed slight evidence of a LSA transition population growth. This 
stage was, however, not as prominent as observed for L0d1a. The reaction of the L0d1c Ne 
upon the introduction of pastoralism in the area was more complex than seen for the other 
L0d haplogroups. Initially the groups that carried L0d1c were negatively affected by this 
stage but the situation turned around resulting in a steep increase in Ne. This turnaround is 
likely to be due to the adoption of pastoralism and cultivation practices as seen in the !Xun, 
in whom the L0d1c1 haplogroup was the predominant haplogroup. 
 
L0d2 
Three subgroups within L0d2 were previously identified; L0d2a, L0d2b and L0d2c (Behar 
et al., 2008). L0d2c split first from L0d2a/b (Figure 3.7). The present study had 
representation across these three L0d2 haplogroups and also identified a fourth group; 
henceforth called L0d2d (Figure 3.7). L0d2d grouped with L0d2a/b and all three these 
groups were defined by the 16212G mutation. The 16212 mutation is relatively stable and 
occurred in only one other sequence in the sample group (haplogroup M) and was seen to 
revert to the ancestral state in one of the L0d2a sequences. In the present study all L0d2 
haplotypes coalesced 43 000 years BP (Table 3.2). The whole genome study calculated 
coalescence to ~64 000 years BP (Behar et al., 2008).  
 
L0d2a 
L0d2a was further defined by 597T and 16390A (Figure 3.6). Behar et al., (Behar et al., 
2008) suggested 198T as an L0d2a defining mutation as well but one of the L0d2a 
 169 
sequences did not contain the mutation. It might, however, be that this sequence contained 
a back mutation rather than being an ancestral sequence to the other L0d2a sequences as 
seen in the network (Figure 3.6). The 16390 mutation is a reoccurring mutation that 
occurred in several other non-L0d2a sequences. The 597 mutation occurred only one other 
place in the total sample, in one L2a1f sequence. 
 
The 27 L0d2a HVS haplotypes of the present study had a TMRCA of 16 000 years BP 
(Table 3.2 and Figure 3.9). Coalescence analysis (applied in the BSP - Figure 3.14), 
however, dated the coalescence of L0d2a at ~8 000 years BP. Based on the eleven 
haplotypes in the whole genome study, L0d2a coalesced 9 000 years BP (Behar et al., 
2008). 
 
The L0d2a haplogroup had a distribution concentrated in the south (Figure 3.11). Its 
highest frequency was in the Karretjie where it was the most prevalent L0d group (Figure 
3.7 and 3.8). It was also the most prevalent L0d group in all the southern groups except the 
Cape Coloured where it was the second most prevalent subsequent to L0d1b. All the 
southern groups had either L0d2a or L0d1b as their most prevalent and second most 
prevalent groups (Figure 3.7 and 3.8). L0d2a was absent in most northern groups and at 
low frequencies in the !Xun. Interestingly L0d2a was the L0d group that had the highest 
incorporation into the Bantu-speaking groups (Figure 3.7 and 3.8).  
 
L0d2a formed a large star-like expansion pattern that is indicative of a recent expansion in 
the population groups represented in the haplogroup. In the network L0d2a had the most 
pronounced star-like expansion pattern of all the L0d/k haplogroups (Figure 3.6). This 
indicated a massive expansion associated with the southern groups. All three neutrality 
tests detected an expansion in L0d2a with the highest associated significance of all the 
L0d/k subgroups (Table 3.4). The mismatch distribution did not reject an expansion 
hypothesis and the mismatch graph shows a smooth unimodal curve that indicated a 
recent expansion (Figure 3.13 and Table 3.3). The ? value dated the expansion to around 7 
000 years BP (Table 3.3). Looking at the BSP (Figure 3.14) one can see an immediate 
dramatic increase in L0d2a Ne from the coalescence date (~8 000 years BP) onwards until 
present. A further, recent expansion (~1 000 years BP) was also evident. In the span of 8 
 170 
000 years the Ne increased from between ~5 000 to ~110 000 (Figure 3.14). This 
remarkable increase necessitates an explanation.  
 
In most haplogroups the introduction of pastoralism apparently led to abrupt increases in 
population sizes (Figure 3.14). The expansion of L0d2a, however, predated these 
expansions. L0d2a did have a more recent expansion phase that correlated with the 
introduction of pastoralism, however, the major part of the L0d2a expansion phase 
predated the introduction of sheep into the southern regions. The steep expansion in Ne 
indicated that carriers of L0d2a had a distinct advantage over other L0d haplogroup-
 carriers during this time and L0d2a Ne increased more rapidly. This rapid increase might be 
part of the increase noted in the archaeological record that occurred from 14 000 years BP 
onwards. This was, however, before the coalescence time indicated on the BSP. From 
archaeological and paleoenvironmental studies we do know that the period between 10 
000 BP and 5 000 BP is associated with the reach of maximum temperatures after the 
LGM and the completion of the rise in sea level. It might be that these events concentrated 
populations and increased social networking, which led to the spread of technologies 
between groups in the south. Expansions into new habitats and elaboration of material 
culture and technology, especially in the Cape Fold Belt and Thukela basin are noted in the 
archaeological record from 4 000 years BP onwards.  
 
It is difficult to judge which populations contained ancestral haplotypes to the L0d2a 
expansion haplotype. It seemed that such an ancestral haplotype is present in the Karretjie 
and Coloured groups from Colesberg. This was based on just one mutation (198) and the 
fact that this haplotype did not contain the mutation could have been due to a reversion. 
Whole genome sequencing is important to see if this haplotype was indeed ancestral to the 
haplotype central to the expansion. 
 
The L0d2a haplotypes had a very shallow coalescence time (~ 8000 years BP) and the 
L0d2a haplogroup should be much older than this date. L0d2 diverged from the other L0d2 
groups ~34 000 years BP (Table 3.2). We, however, do not have representative haplotypes 
from these earlier times. They might be present in other populations that have not been 
 171 
studied yet. If these earlier haplotypes were incorporated, one would get a clearer picture 
of where and when the L0d2a expansion started.  
 
L0d2b 
The defining mutations of L0d2b were 16069T and 16169T. Both mutations are stable. The 
16069 mutation occurred in no other African sequence but did occur in the European 
haplogroup J. Only one other sequence in the sample group (an L4b2a2 sequence) 
contained the 16169 mutation. 
 
Four of the six sequences in L0d2b were separated by a very long branch (9 mutations) 
from the other two sequences, indicative of a very long separation time between the 
sequences. The L0d2b node as a whole also had a high coalescence and divergence time, 
higher than the L0d2 branch as a whole. This is indicative of an inconsistency. A possible 
explanation might be that the mutations 16182, 16183 (and possibly 16187) were one 
mutational event. Also the 152 mutation is a highly reoccurring mutation. If the weights of 
these mutations were decreased it would reduce the coalescence times. Another 
explanation might be that the terminal nodes in this group were grouped incorrectly within 
L0d1b. They did contain the 16212, 16069 and 16169 mutations but if you look at the 
coalescence times and how they compared with other haplogroups, these haplotypes 
might well be representative of another L0d haplogroup (not a subgroup within L0d2b) 
(Table 3.2 and Figure 3.9). Whole genome sequencing for these samples needs to be 
done to precisely assess their relationship to other groups. In the whole genome study 
(Behar et al., 2008), the one L0d1b sample included, only contained the 16212, 16069 and 
16169 mutations and not the additional mutations that defined the three terminal groups in 
the L0d2b clade on the network. 
 
L0d2b was detected at very low frequencies in the present study (only six sequences in 
total that represents four haplotypes) (Figure 3.7 and 3.8). The L0d2b haplogroup was 
represented by only one haplotype in the whole genome study (Behar et al., 2008). It was 
also not detected previously in the !Xun, Khwe or Ju\?hoansi (Vigilant et al., 1991; Chen et 
al., 2000). In the present study L0d2b was detected at levels < 5% in four groups and had 
its highest prevalence in the /Gui + //Gana + Kgalagari and Nama (Figure 3.7 and 3.8). Due 
 172 
to the low frequency of this haplotype it was impossible to draw any conclusions about the 
history of the haplotype, but this group seemed to be associated more with the central 
groups in our study group. 
 
L0d2d 
L0d2d is a new group suggested in this thesis. It grouped in the same clade as L0d2a and 
L0d2b (defined by 16212G) and was further defined by 188A-G. It did not contain the 
16390, 597 and 16069, 16169 clade-defining mutations of L0d2a and L0d2b. The 188G 
mutation, however, is a reoccurring mutation and whole genome sequencing of the 
representative sequences of L0d2d would be necessary to affirm its position in the L0d 
clade.  
 
Although L0d2d was not identified in the study of Behar et al., (Behar et al., 2008) 
haplotypes that can be classified as L0d2d using the new nomenclature were reported 
previously in Bantu-speakers (Salas et al., 2002) and in the !Xun/Khwe (Tishkoff et al., 
2007). In the present study L0d2d was confined to the Ju\?hoansi, where it represented 5% 
of L0d/k haplogroups (Figure 3.7 and 3.8). Interestingly, this rare haplogroup was also 
found in an Indian individual. Frequencies of the haplogroup were too low to extract any 
information regarding the history of the haplogroups, however, its distribution did seem to 
be limited to the northern San groups. 
 
 
L0d2c 
The L0d2c sub-haplogroup consisted of 17 sequences grouped into six haplotypes. The 
coalescence of the L0d2c haplotypes was dated to ~20 000 years BP. The divergence 
from L0d2abd was dated to ~29 000 years (Table 3.2 and Figure 3.9). This date was much 
more recent than the date from the whole genome study. In the whole genome study, four 
L0d2c haplotypes were included that coalesced ~21 000 years BP and split from L0d2a/b 
~64 000 years BP (Behar et al., 2008). The more recent date of the present study can be 
explained by the relative little HVS variation that defines the L0d2c haplogroup compared 
to other haplogroups. Only the HVS-II mutation, 294a distinguishes the L0d2c from the 
 173 
L0d1?2 core haplotype (Figure 3.7). Several coding region mutations, however, separate 
L0d2c from L0d2abd and also L0d2 from L0d1?2 (Behar et al., 2008). 
 
L0d2c were found at lower frequencies in the sample group (Figure 3.7 and 3.8). L0d2c 
had its highest frequencies in the ?Khomani and Nama, in other groups it was < 5% of 
L0d/k haplogroups (Figure 3.7 and 3.8). A star-like expansion pattern in the network could 
be seen and seemed to be associated with the ?Khomani group (Figure 3.6). Due to the 
low frequency of the haplotype the expansion hypothesis was rejected in the mismatch 
distribution and only one neutrality test detected evidence for an expansion (Table 3.3 and 
3.4). A signature of a recent expansion could, however, be observed in the mismatch graph 
(Figure 3.13). This recent expansion in L0d2c correlated temporally with the recent 
expansions in L0d1c and L0d3 and is likely to be associated with the introduction of 
pastoralism. 
 
L0dx 
L0dx is the second new haplogroup suggested in this thesis. Its position on the tree was, 
however, unresolved and can only be affirmed through whole genome sequencing. It did 
appear that it might group with L0d1a and L0d1b due to the presence of the 523insCA 
mutation. This cannot, however, be said with certainty due to the instability of this length 
repeat mutation, therefore the preliminary designation L0dx. This group was further defined 
by the 16179T mutation, which only reoccurred once in the total sample set, in one L0d2a 
sequence. 
 
L0dx, was found only in the two northern-most groups, Khwe (11%) and !Xun (4%) (Figure 
3.7 and 3.8). L0dx was the only L0d haplogroup found in the Khwe. In the study of Chen et 
al., L0dx was found at similar frequencies in the !Xun (6%) but at much higher frequency in 
the Khwe (42%), where it also was the only L0d haplogroup (Chen et al., 2000).  
 
Both the present study and the study of Chen et al., (Chen et al., 2000) thus found L0k and 
L0dx to be the only non-Bantu-speaking haplogroups in the Khwe. From the network 
(Figure 3.6) it could be seen that the Khwe all belong to one haplotype and a !Xun 
haplotype was ancestral to the Khwe haplotype. It therefore seemed that L0dx was an 
 174 
original !Xun haplotype and through geneflow moved to the Khwe. This was the reverse 
situation as was seen for L0k. The representative haplotypes of L0dx was, however, very 
low and more L0dx haplotypes need to be sampled before any deductions can be made 
with certainty (the !Xun and Khwe L0dx haplotypes reported in Chen et al., (Chen et al., 
2000) did not include the 16399 and 574 regions (see network in Figure 3.3 and 3.4) and 
therefore could not be resolved further). 
 
3.3.4 Summary of haplogroup histories 
 
All the L0d1 haplogroups (L0d1a, L0d1b, and L0d1c) showed signs of expansion during the 
LSA period that coincides with the development of advanced technologies and belief 
systems. The two haplogroups with a current southern distribution, L0d1a and L0d1b, had 
stronger expansion signals than the L0d1c haplogroup, which are currently associated with 
northern San groups. L0d1a had a growth signal that precedes the L0d1b growth phase by 
at least 10 000 years.  
 
It was difficult to judge the start of expansion in L0d2a because of the shallow coalescence 
time of haplotypes. It might be that the L0d2a growth started in the same timeframe as 
L0d1b. L0d2a and L0d1b was the main groups in the southern populations and might share 
similar histories. The growth curve in L0d2a was, however, much steeper than in L0d1b. 
Overall it seemed that the population growth signals of the early and middle LSA had a 
stronger association with the haplogroups presently found in the southern Khoe-San 
groups. 
 
In contrast to the above-mentioned haplogroups, L0d3, showed no evidence of the LSA 
associated expansions. Yet, it also had southern distribution. Drift effects could cause this 
haplogroup to decrease while other haplogroups in the same populations increased. 
Another explanation could be that this haplotype was not subjected to similar conditions as 
the other L0d haplotypes during the early and middle LSA and thus might have only been 
introduced to these territories after this stage. 
 
 175 
Most haplogroups showed expansions during the start of the Iron Age accompanied by the 
introduction of pastoralism to the southern parts of Africa. An exception was haplogroup 
L0d1a that showed a decrease during this time. This decrease again could be due to drift 
or could indicate that the groups carrying L0d1a in high frequencies were negatively 
affected by this stage. It is historically known that when pastoralists enter a territory they 
displace the hunter-gatherers to fringe areas, which is unsuitable for their animals. This 
would then impact on the success of the hunter-gatherer population measured through 
population growth. From this it is deduced that carriers of L0d1a could possibly have been 
populations that continued their hunter-gatherer lifestyles and did not adopt pastoralism or 
enter in to favorable relationships with pastoralists. Initially L0d1c also started to decline, 
similar to L0d1a, but then turned around. This turnaround might be associated with the 
recent adoption of pastoralism practices in the !Xun (See discussion above).  
 
3.3.5 Haplogroup contributions from neighboring population groups 
 
In addition to the L0d/k groups in the Khoe-San and Coloured groups there were also a 
contribution of haplogroups resulting from admixture from Bantu-speakers and Eurasian 
groups (Figure 3.3 and 3.4). From the groups that represent the people with southern 
Khoe-San ancestry, the Karretjie and ?Khomani groups had almost exclusive L0d maternal 
lines, while the Coloured people from the Northern Cape also had very high percentages of 
L0d. The Coloured group with the largest proportion of admixture was the sample group 
from Wellington, with 20% Eurasian admixture and 35% Bantu-speaking admixture. The 
Colesberg Coloured group also had large proportions of Bantu-speaking (27%) and 
Eurasian (8%) admixture. The Coloured group from the Northern Cape had 5% Eurasian 
admixture and 2.5 % Bantu-speaking admixture. The three Coloured groups were the only 
groups with Eurasian admixture, the admixture in the remaining Khoe-San groups were 
due to gene-flow with the Bantu-speaking groups (Figure 3.3 and 3.4).  
 
The Khwe group had the largest input from Bantu-speaking-groups (61%) with the two of 
the most common southeastern Bantu-speaking associated haplogroups, L2a and L3e 
making up the largest part (22% each) (Figure 3.3 and 3.4). The Nama had 21.5 % Bantu-
 176 
speaker admixture and in this case the Bantu-speaker-admixture was indicative of 
admixture with southwestern Bantu-speakers with L1c, L3d and L3f haplogroups 
contributing. The remaining San groups had < 10% Bantu-speaker admixture (Figure 3.3 
and 3.4). 
 
3.4 Mitochondrial genetic relationships between different Khoe, San, 
Coloured and neighboring groups 
 
The previous section investigated the properties and differential distribution of the 
haplogroups in the various sample groups. To further investigate the population group 
diversities and their relationship to each other and their neighbours, genetic distances 
between the groups were considered. 
 
To investigate the genetic differentiation and gene flow between groups Fst values 
between the different groups were calculated. Table 3.5 give the Fst values and Figure 
3.16 and 3.17 give graphical representations of the Fst distance matrix in the form of PCA 
plots with minimum spanning trees and a cluster analysis tree. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 177 
Table 3.5  Mitochondrial population pairwise Fst values 
 AFR CAC COL DRC EUR CNC GUG HER IND JOH KAR KHO KWE NAM SOT XUN ZUX 
AFR 0.000                 
CAC 0.155* 0.000                
COL 0.240*** 0.010 0.000               
DRC 0.078* 0.110* 0.193** 0.000              
EUR 0.031 0.273 0.326** 0.206 0.000             
CNC 0.398*** 0.094* 0.031 0.365*** 0.495*** 0.000            
GUG 0.457*** 0.204*** 0.149*** 0.418*** 0.563*** 0.128*** 0.000           
HER 0.217*** 0.216*** 0.260*** 0.082** 0.363*** 0.432*** 0.489*** 0.000          
IND 0.034 0.213 0.278*** 0.110 0.082 0.433*** 0.489*** 0.245** 0.000         
JOH 0.393*** 0.129*** 0.089*** 0.346*** 0.495*** 0.078*** 0.108*** 0.410*** 0.435*** 0.000        
KAR 0.518*** 0.187** 0.081 0.487*** 0.633*** 0.036** 0.263*** 0.557*** 0.547*** 0.179*** 0.000       
KHO 0.468*** 0.152** 0.073*** 0.450*** 0.567*** 0.016*** 0.182*** 0.511*** 0.505*** 0.122*** 0.057*** 0.000      
KWE 0.211*** 0.055*** 0.106*** 0.092*** 0.320** 0.227*** 0.272*** 0.193*** 0.258*** 0.165*** 0.319*** 0.308*** 0.000     
NAM 0.322** 0.033 0.012** 0.257 0.436 0.023** 0.147*** 0.307 0.366* 0.057*** 0.084*** 0.057*** 0.128*** 0.000    
SOT 0.127** 0.006 0.063* 0.044* 0.229 0.193*** 0.268*** 0.129*** 0.177* 0.208*** 0.277*** 0.274*** 0.024*** 0.113*** 0.000   
XUN 0.428*** 0.178*** 0.125*** 0.369*** 0.523*** 0.125*** 0.079*** 0.428*** 0.461*** 0.038*** 0.204*** 0.180*** 0.172*** 0.102*** 0.235*** 0.000  
ZUX 0.181** 0.000 0.013* 0.100 0.280 0.095*** 0.186*** 0.163** 0.230 0.124*** 0.152*** 0.158*** 0.037** 0.029 0.000 0.146*** 0.000 
 
Abbreviations: 
*  significant difference, P<0.05 
**  significant difference, P<0.01 
*** significant difference, P<0.00 
 
 178 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 3.16  A ? Principal component analysis of Fst values between different populations in the study group. A minimum spanning tree connects 
populations. Component 1 = 73.6% of the variation, Component 2 = 19.2% of the variation. B ? Loadings for Component 1, C ? loadings for Component 2. 
 
A 
-0.2827
 0.05204
 0.1534
 -0.2213
 -0.3202
 0.30340.2806
 -0.195
 -0.2932
 0.2664
 0.36 0.3419
 -0.004907
 0.2274
 -0.05572
 0.2675
 0.06073
 A
 F
 R
 C
 A
 C
 C
 O
 L
 D
 R
 C
 E
 U
 R
 C
 N
 C
 G
 U
 G
 H
 E
 R
 I
 N
 D
 J
 O
 H
 K
 A
 R
 K
 H
 O
 K
 W
 E
 N
 A
 M
 S
 O
 T
 X
 U
 N
 Z
 U
 X
 -0.9
 -0.8
 -0.7
 -0.6
 -0.5
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 B 
-0.2226
 -0.3015
 -0.2149
 -0.3617
 -0.1707-0.1349
 -0.09898
 -0.3641
 -0.1948
 -0.1532-0.1394
 -0.1029
 -0.3608
 -0.2218
 -0.3475
 -0.1319
 -0.2896
 A
 F
 R
 C
 A
 C
 C
 O
 L
 D
 R
 C
 E
 U
 R
 C
 N
 C
 G
 U
 G
 H
 E
 R
 I
 N
 D
 J
 O
 H
 K
 A
 R
 K
 H
 O
 K
 W
 E
 N
 A
 M
 S
 O
 T
 X
 U
 N
 Z
 U
 X
 -0.9
 -0.8
 -0.7
 -0.6
 -0.5
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 L
 o
 a
 d
 i
 n
 g
 C 
 179 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
As can be seen from Figure 3.17, the first split was between populations with considerable 
amount of Khoe-San ancestry (CNC, KHO, NAM, COL, KAR, GUG, JOH and XUN) and 
other populations (SOT, ZUX, DRC, HER, EUR, AFR, IND and CAC and KWE). Additional 
in this last cluster was the KWE and CAC. Evident from their haplogroup frequencies 
(Figure 3.3 and 3.4) both these groups had high amounts of admixture from Bantu-
 speakers (KWE) or Europeans and Bantu-speakers (CAC) causing them to group with 
these groups rather than Khoe-San groups. The PCA plot (Figure 3.16) also summarised 
this variation in the first component that contained 74% of the total variation. Reflected in 
the loadings of the first component (Figure 3.16 - B), the Khoe-San and Coloured 
populations were separated from the non-African and the HER and DRC. The KWE, CAC 
and also the ZUX and SOT was intermediate because of their various amounts of 
input/admixture from the Khoe-San populations. 
 
Figure 3.17  Cluster analysis tree representing mitochondrial Fst values between different 
populations in the study group. 
 180 
The second component on the PCA plot contained 19 % of the variation (Figure 3.17). This 
component summarised the variation between Bantu-speakers and the rest of the groups 
(Figure 3.17 - C). 
 
Similarly in the tree (Figure 3.17), the subsequent split was between non-African groups 
(AFR, EUR, IND) and Bantu-speakers (SOT, ZUX, DRC, HER and CAC, KWE). Thereafter 
the northern San groups (GUG, JOH, XUN) split from the southern Khoe-San-Coloured 
groups (CNC, KHO, NAM, COL, KAR).  
 
The northern San groups, JOH and XUN grouped together with GUG on its own. On the 
other branch, containing the southern Khoe-San-Coloured groups, CNC and KHO grouped 
together, while COL and NAM grouped together, with KAR forming its own group. 
 
Furthermore the DRC and HER grouped together while the southeastern BS (ZUX and 
SOT) grouped with the KWE and CAC. This showed the considerable admixture from 
southeastern Bantu-speakers in CAC and interestingly showed that the KWE group rather 
grouped with southeastern BS than with the southwestern BS or a central African BS 
group.  
 
From the graphical visualizations of Fst values it appeared that there might be an 
association between genetic distance and physical distance within the Khoe-San?Coloured 
groups. Groups that were genetically closely related was also not far removed form one 
another when looking at physical distance. To see if the genetic distances and physical 
geographic distances (km) show a correlation within the Khoe-San?Coloured groups, the 
genetic distance matrix (Table 3.5) was correlated with a physical distance matrix 
(Appendix C) to see how the two relate to one another (For this part of the analysis the 
KWE group was not included as part of the Khoe-San?Coloured groups because previous 
analysis showed their maternal lineages to be genetically more similar to BS than Khoe-
 San?Coloured). In Figure 3.18 pairwise comparisons between physical distance (X-axis) 
and genetic distance (Y-axis) was plotted on a graph. A linear regression was done to 
determine the line with the best fit through the points. The best fit to the graph was a 
straight line with a slope of 0.00005263 (p = 0.0149). Furthermore a Mantel test was 
 181 
conducted to further affirm the relationship between the two distance matrices. It confirmed 
the relationship between genetic and physical geographic distance (p = 0.027) with 16% of 
the genetic distance being explained by physical geographic distance and a correlation 
coefficient (r) of 0.402750 between the two matrices. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Some of the Coloured and Khoe groups (especially CAC, COL, and NAM) had a 
considerable amount of admixture from Bantu-speakers and or non-African groups (see 
Figure 3.4). This would be reflected in their Fst values, which would be reproduced in 
Figure 3.18  Pairwise comparisons between physical geographic distance (X-axis) and 
mitochondrial Fst genetic distance (Y-axis) 
 182 
graphical representations of Fst values such as the PCA plot (Figure 3.16) where CAC, 
COL and NAM grouped closer to these groups. This recent admixture of BS-groups into 
San groups might obscure historical relationships between Khoe-San groups that existed 
before the BS expansions and non-African influx. To investigate this historical relationship 
between putative Khoe-San groups before BS and European admixture all non-L0d/k 
groups were removed from the sample. The different Khoe-San and Coloured groups were 
again compared with one another to see if the relationship between them changes (Figure 
3.19 and 3.20). It is acknowledged that this might not be a true representation of what the 
haplogroup structure might have looked like before the influx of BS. It is, however, 
generally seen that groups with less BS admixture have higher L0d percentages and that in 
BS-populations the amount of L0d admixture increases to the southern parts of Africa 
where there was contact with Khoe-San people. For this part of the analysis, it was 
therefore assumed that L0d and L0k might have been predominant mitochondrial 
haplogroups of the Khoe-San before the Bantu-expansions and it was investigated how 
these L0d/k carriers might have been related to one another. 
 
 
 
 
 
 
 
 183 
 
 
 
 
 
 
 
 
 
 
Figure 3.19  A ? Principal component analysis of L0d/k Fst values between different populations in the study group. Component 1 = 74.5% of the variation, 
Component 2 = 13.9% of the variation. B ? Loadings for Component 1, C ? loadings for Component 2.  
A 
-0.4967
 -0.3318
 -0.345
 -0.04285 -0.02122
 -0.4205
 -0.3763
 0.3211
 -0.3013
 0.09815
 C
 A
 C
 C
 N
 C
 C
 O
 L
 G
 U
 G
 J
 O
 H
 K
 A
 R
 K
 H
 O
 K
 W
 E
 N
 A
 M
 X
 U
 N
 -0.9
 -0.8
 -0.7
 -0.6
 -0.5
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 B 
0.07838
 -0.02733
 0.05023
 -0.852
 -0.1106
 0.1488
 0.01084
 0.3897
 0.07898
 -0.2686
 C
 A
 C
 C
 N
 C
 C
 O
 L
 G
 U
 G
 J
 O
 H
 K
 A
 R
 K
 H
 O
 K
 W
 E
 N
 A
 M
 X
 U
 N
 -0.9
 -0.8
 -0.7
 -0.6
 -0.5
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.6
 L
 o
 a
 d
 i
 n
 g
 C 
 184 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
In Figure 3.19 and 3.20 the divide between the northern San groups (GUG, JOH, XUN) 
and southern Coloured and Khoe-San groups (KAR, KHO, NAM, CAC, CNC, COL) can be 
seen clearly. With the removal of the BS and non-African admixture, CAC moved to the 
southern San-Khoe-Coloured cluster. The KWE group, however, was still a clear outlier. 
Groups within the southern San-Khoe-Coloured group clustered very closely together while 
the northern San (GUG, JOH, XUN) was more distant from one another.  
 
The best fit to the graph of genetic vs. physical distance in this case was still a straight line 
with a slope (p= 0.00587) (Graph included in Appendix F). The slope of the line when non-
 L0dk sequences were removed was 0.00009086 (p = 0.00587) and was steeper than the 
gradient when non-L0dk sequences were included (0.00005263). The Mantel test also 
showed a significant relationship between the two distance matrices (p= 0.034). In this 
case 20% of the genetic distance was explained by physical distance and r = 0.449854. 
This correlation was stronger than in the case with non-L0dk sequences included 
(determination of genetic distance by physical distance = 16% and r=0.402750).  
Figure 3.20   Cluster analysis tree representing L0d/k Fst values between different 
populations in the study group. 
 185 
 
To see what the influence of the presence of L0k is on the separation of northern groups 
from the southern groups (L0k were found in the northern groups but not in the southern 
groups) the analysis was repeated with only L0d sequences and is shown in Figure 3.21 
and 3.22). 
 
 
 
 
 
 
 
 
 
 
 
 
 
 186 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 3.21  A ? Principal component analysis of L0d Fst values between different populations in the study group. Component 1 = 63.2% of the 
variation, Component 2 = 27.1% of the variation. B ? Loadings for Component 1, C ? loadings for Component 2. 
A 
-0.5373
 -0.2671
 -0.3114
 0.2522
 -0.07492
 -0.436
 -0.3227
 -0.08387
 -0.37
 0.1887
 C
 A
 C
 C
 N
 C
 C
 O
 L
 G
 U
 G
 J
 O
 H
 K
 A
 R
 K
 H
 O
 K
 W
 E
 N
 A
 M
 X
 U
 N
 -0.9
 -0.8
 -0.7
 -0.6
 -0.5
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.6
 0.7
 B 
-0.2753
 -0.06998
 -0.0226
 -0.5451
 -0.1985
 -0.1006 -0.06559
 0.5378
 -0.07466
 -0.5225
 C
 A
 C
 C
 N
 C
 C
 O
 L
 G
 U
 G
 J
 O
 H
 K
 A
 R
 K
 H
 O
 K
 W
 E
 N
 A
 M
 X
 U
 N
 -0.9
 -0.8
 -0.7
 -0.6
 -0.5
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.6
 0.7
 0.8
 C 
 187 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
With the removal of L0k, the KWE group was still an outlier but has to be viewed with 
caution because it is represented by only two L0d sequences. The JOH group now rather 
grouped with the southern Coloured and Khoe-San groups than with the GUG and XUN 
but was still removed from them. This could also be seen by observing the L0d haplogroup 
frequencies (Figure 3.7 and 3.8). JOH had far lower L0d1c frequencies than GUG and 
XUN but higher L0d1b frequencies. The other southern Coloured and Khoe-San groups 
still grouped tightly together in a cluster. 
 
In the graph of genetic vs. physical distance for L0d sequences only, the best-fitted line 
was still a straight line with a slope (p= 0.00742) (Appendix F). The slope of the line when 
only L0d sequences were included is 0.00009472 (p = 0.00742) and was steeper when 
non-L0dk sequences were included (0.00005263) and marginally steeper than the slope for 
the L0dk sequences (0.00009086). The Mantel test again showed a significant relationship 
between the two distance matrices (p= 0.033). When only L0d sequences were used 19% 
Figure 3.22  Cluster analysis tree representing L0d Fst values between different 
populations in the study group. 
 188 
of the genetic distance was explained by physical distance and r = 0.439081. This 
correlation was higher than in the case with non-L0dk sequences included and slightly 
lower than the case where non-L0dk sequences were excluded (For non-L0dk included: 
?determination of genetic distance by physical distance? = 16% and r=0.402750. For non-
 L0dk excluded ?determination of genetic distance by physical distance? = 20% and 
r=0.449854). 
 
To test the apportionment of variation in the different groups, AMOVA analysis was done to 
see how much variation is contained between defined groups, between the different 
populations in the study and also within the populations. Table 3.6 give the results of the 
AMOVA analysis with various different groupings of the highest-level group. 
 
Table 3.6  Results from mitochondrial AMOVA analysis using different groupings on the first level 
 
 
Grouping Grouping of first level [Groups] 
 
Between 
groups 
Between 
populations  
within 
groups 
 Between 
individuals 
within 
populations 
 
A 
 
[afr, eur, ind]  
[col, cnc, kar, kho, nam, joh, xun, gug, kwe, cac] 
[drc, her, sot, zux] 21.95 7.7 70.35 
 
B 
 
[afr, eur, ind] 
[col, cnc, kar, kho, nam, joh, xun, gug] 
[drc, her, sot, zux] 24.46 6.69 68.85 
 
C 
[afr, eur, ind] 
[col, cnc, kar, kho, nam, joh, xun, gug, drc, her, sot, zux] 29.28 10.03 60.69 
 
D 
[col, cnc, kar, kho, nam, joh, xun, gug] 
[drc, her, sot, zux, afr, eur, ind] 21.82 9.09 69.09 
 
 
E 
[ afr, eur, ind ]  
[ col, cnc, kar, kho, nam ]  
[ drc, her, sot, zux ] 
[ gug, joh, xun ]  21.16 4.74 74.1 
 
F 
[col, cnc, kar, kho, nam, joh, xun, gug]  
[drc, her, sot, zux] 13.25 5.35 81.39 
 
G 
[col, cnc, kar, kho, nam] 
[joh, xun, gug] 8.3 4.94 86.76 
 
H 
[col, cnc, kar]  
[nam, joh, xun, gug, kho] 1.93 8.42 89.65 
 
I 
[col, cnc, kar] 
[nam]  
[joh, xun, gug, kho] 1.15 8.8 90.05 
 
 
 189 
For the first analysis, three groups were assigned, namely, BS, non-African and Khoe-San-
 Coloured (grouping A in Table 3.6). For this analysis 22% of variation was between these 
three groups, 8% between populations within the groups and 70% between individuals 
within the populations. CAC and KWE were assigned to the Khoe-San-Coloured (KSC) 
group but as was seen previously their placement in the Khoe-San-Coloured group is 
ambiguous because the maternal line in KWE was closer related to BS-groups and CAC 
had a very admixed origin. When they were left out more of the variation could be assigned 
to variation between the three main groups (24.5%)  (grouping B in Table 3.6). These two 
groups were left out the subsequent AMOVA analyses.  
 
When two groups were assigned, African and non-African it resulted in an among group 
variation of 29% the highest of all the AMOVA analyses (grouping C in Table 3.6). Next, 
the KSC groups were split into northern San and southern KSC (grouping E in Table 3.6). 
The variation between the resultant four groups (non-African, BS, northern San and 
southern KSC) was 21% and not much different from BS, KSC and non-African. When 
non-African groups were left out of the analysis the variation between BS and KSC was 
13% between groups, 5 % between populations and 81% within populations (grouping F in 
Table 3.6). The BS was then left out and the KSC group was split into the northern San 
group and southern KSC groups (as previous analyses suggested such a split) (grouping G 
in Table 3.6). In this scenario more of the variation was still contained on group level (8%) 
vs. population level (5%). 
 
When populations were split into the groups that self identify as Coloured and those that 
self identify as Khoe-San only 2% variation was explained on the group level (grouping H in 
Table 3.6). In this case, inter population variance explained more of the variance (8%). 
Furthermore, if the Khoe was split from the San group to give the populations that self 
identify as San, Khoe and Coloured even less variation (1%) was explained on group level  
(grouping I in Table 3.6). 
 
To investigate the expansion dynamics of the different groups involved in the study 
mismatch distributions of the nucleotide variation in the sequences involved were 
 190 
constructed. Figure 3.23 show the mismatch distributions of the 10 Khoe-San / Coloured 
populations and two comparative groups ZUX (Bantu-speaking) and EUR (European).  
 
Other Bantu-speaking and non-African groups (not shown) showed similar unimodal 
distributions to the two groups used as comparative data. The AFR, however, was slightly 
multimodal due to admixture from African groups. In the CAC, COL, NAM, JOH, ZUX and 
EUR the model of demographic expansion was not rejected. The model was rejected in 
KHO, KAR and KWE (p=0.01) as well as CNC (p=0.05) and GUG (95% CI of ?0 and ?1 
overlaps). The model was not rejected in XUN but will be rejected in at a 10% level (p=0.1). 
Table 3.7 show the statistics of the mismatch distributions and the tests for demographic 
expansion for all of the groups. 
 
The raggedness index can also be used as an indicator of rapid demographic growth 
(Table 3.7) but does not correlate in all cases with the confidence test for demographic 
expansions (see the low raggedness values for KAR and CNC ? yet expansion model was 
rejected). 
 
?1 divided by ?0 give an indication of the magnitude of the expansion while ? gives an 
indication of the time of the expansion. Of the seven groups that does not reject the 
expansion hypothesis (NAM, JOH, SOT, CAC COL, DRC, ZUX, EUR), the BS groups 
appeared to have had the greatest expansions (?0 is 16 to 19 times smaller than ?1) while 
the magnitude in the Khoe-San / Coloured groups was lower (?0 is 7 to 10 times smaller 
than ?1). ? was converted to time BP (T) when the expansion happened as outlined in 
section 2.2.2.3 using the mutation rate of 2.5 x 10-6 per nucleotide per generation (Ward et 
al., 1991) (Table 3.7). The expansion times of the CAC and NAM populations seemed to 
have happened ~40 000 years BP. The BS populations had earlier expansion times of 
around 90 000 years BP for ZUX and SOT and 60 000 years BP for DRC. The 
representative haplogroups of the JOH and COL also showed expansion times of 75 000 ? 
90 000 years BP. The European expansion time dated to around 40 000 years BP. 
 
 
 191 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 3.23  Mismatch distributions of populations in the study group. # expansion hypothesis rejected - 95% CI overlap. * Expansion hypothesis 
rejected on 99% confidence level. ** Expansion hypothesis rejected on 95% confidence level. (*) Expansion hypothesis will be rejected on a 90% 
confidence level 
(     ) 
 192 
Table 3.7  Mismatch distribution statistics (Groups) 
Group 
Raggedness 
index Tau T $ Theta0 
Theta0  
qt 5%-95% Theta1 
Theta1  
qt 5% - 95% 
Model 
(SSD)  
p-value 
KWE 0.068 20.330  5.462 0.000 - 10.642 66.709 35.889 - 1839.209 0.010** 
HER 0.106 19.438  0.004 0.000 - 4.676 20.012 14.233 - 179.074 0.020* 
IND 0.006 10.488  0.000 0.000 - 2.496 59.219 32.827 - 99999.000 0.040* 
KHO 0.027 12.559  0.007 0.000 - 2.960 37.056 23.384 - 267.368 0.040* 
CNC 0.025 10.656  5.054 0.000 - 13.212 49.175 28.076 - 2319.175 0.050* 
XUN 0.030 20.305  0.004 0.000 - 4.470 26.216 19.036 - 198.403 0.060(*) 
NAM 0.011 9.168 40 783 10.158 0.000 - 27.229 99.863 40.376 - 99999.000 0.180 
JOH 0.021 20.098 89 404 2.902 0.000 - 7.597 29.502 17.932 - 109.268 0.230 
SOT 0.020 20.805 92 549 5.161 0.000 - 16.367 83.696 50.707 - 99999.000 0.270 
GUG 0.076 37.102  0.000 0.000 - 12.057 13.971 7.276 - 99999.000 0.290# 
CAC 0.017 9.070 40 347 16.432 0.000 - 37.615 116.484 46.565 - 99999.000 0.440 
AFR 0.028 6.938  8.903 0.000 - 24.725 51.567 24.391 - 99999.000 0.490# 
COL 0.003 16.949 75 396 5.908 0.000 - 5.950 58.853 36.606 - 162.915 0.630 
DRC 0.027 14.445 64 257 4.243 0.000 - 13.044 66.885 41.692 - 99999.000 0.640 
ZUX 0.005 20.234 90 009 3.614 0.000 - 11.990 70.107 45.382 - 436.357 0.730 
EUR 0.025 9.262 41 201 2.949 0.000 - 4.967 25.376 18.518 - 99999.000 0.810 
Mean 0.030 15.167  4.165 0.000 - 11.765 5933.746 5903.052 - 53264.869 0.290 
SD 0.028 8.264  4.491 0.000 - 10.013 24240.077 24214.474 - 51097.852 0.280 
 
T ? Time before present that expansion took place (calculation explained in section 2.2.2.3) 
SSD - Sum of Squared deviation 
# Expansion hypothesis rejected - 95% CI overlap.  
* Expansion hypothesis rejected on 99% confidence level,  
** Expansion hypothesis rejected on 95% confidence level.  
(*) Expansion hypothesis will be rejected on a 90% confidence level 
 
 
Summary statistics such as Tajima?s D (Tajima, 1989), Fu?s Fs (Fu, 1997) and the R2 
statistic (Ramos-Onsins and Rozas, 2002), have been reported to have greater sensitivity 
in detecting population expansions than mismatch distributions (Pilkington et al., 2008). 
Table 3.8 shows summary statistics in the form of diversity estimates and neutrality tests 
for the different groups. 
 
 
 
 
 
 
 
 193 
Table 3.8   Diversity statistics and neutrality tests for populations in the study group 
Group N 
seq 
N 
Ht Hd pi ?S W - ?S 
Ne 
(?S/2?) 
Tajima's 
D 
Tajima's 
D  
p-value 
Fs Fs 
p-value 
R2 R2 
p-value 
KAR 30 13 0.864 0.00611 0.00756 8.330 1482 -0.82038 0.230 -0.392 0.462 0.0961 0.237 
COL 77 49 0.971 0.01245 0.01855 19.941 3548 -1.13707 0.120 -18.138 <0.001*** 0.0683 0.152 
CAC 20 15 0.963 0.01349 0.01667 18.040 3210 -0.70963 0.248 -1.186 0.310 0.1038 0.199 
KHO 57 23 0.947 0.00750 0.01176 12.794 2277 -1.25000 0.089 -2.663 0.209 0.0714 0.137 
CNC 40 23 0.927 0.00914 0.01435 15.516 2761 -1.26573 0.087 -3.828 0.116 0.0709 0.078 (*) 
XEG 3 3 - - -   - - - - - - 
DUM 1 1 - - -   - - - - - - 
NAM 28 23 0.984 0.01035 0.01543 16.703 2972 -1.24601 0.093 -7.691 0.007** 0.0753 0.048* 
GUG 22 8 0.853 0.00843 0.01154 13.027 2318 -0.98584 0.174 3.736 0.930 0.1025 0.224 
NAR 2 2 - - -   - - - - - - 
JOH 42 17 0.943 0.00980 0.01087 12.087 2151 -0.32461 0.432 0.638 0.635 0.1056 0.499 
XUN 49 23 0.894 0.00934 0.01345 14.578 2594 -1.06493 0.138 -2.155 0.262 0.0750 0.149 
KWE 18 9 0.889 0.01468 0.01692 18.316 3259 -0.53662 0.317 3.946 0.940 0.1199 0.355 
DRC 14 12 0.978 0.01158 0.01382 15.094 2686 -0.61070 0.290 -1.812 0.171 0.1134 0.169 
HER 15 6 0.648 0.00809 0.01094 11.994 2134 -1.12248 0.128 4.011 0.948 0.1031 0.094 (*) 
SOT 22 18 0.970 0.01540 0.01887 20.582 3662 -0.77810 0.230 -2.593 0.134 0.1020 0.232 
SWZ 5 5 - - -   - - - - - - 
ZUX 36 31 0.989 0.01377 0.01918 20.983 3734 -1.07663 0.133 -11.823 0.002** 0.0791 0.115 
AFR 21 18 0.981 0.01050 0.01486 16.121 2869 -1.08335 0.143 -5.193 0.023* 0.0872 0.066 (*) 
EUR 11 11 1.000 0.00757 0.00990 10.925 1944 -1.02790 0.155 -4.759 0.010* 0.0918 0.011* 
IND 25 25 1.000 0.00965 0.01924 20.657 3676 -1.96876 0.01* -17.291 <0.001*** 0.0492 <0.001*** 
All 538 236 0.984 0.01239 0.02967 31.052 5525       
 
*    p <   0.05 
**   p<   0.01  
***  p <  0.001 
(*)  p <  0.01 
 
Haplotype diversities were high in most of the groups. Groups with lower diversities in 
ascending order were HER, GUG, KAR, KWE and XUN. Nucleotide diversities were 
generally higher in African populations than in non-African populations. With the exception 
of HER, BS populations had higher nucleotide diversities than Coloured-Khoe-San 
populations. Populations with lower nucleotide diversities included KAR, KHO, HER, GUG, 
CNC, XUN and JOH. Theta was estimated using segregating sites (?s per site) and 
Watersons-?s (W-?s per sequence). The female effective population size (Ne) was 
estimated from W-?s as explained in section 2.2.3. Smaller effective population sizes were 
present in KAR, JOH, HER, GUG, KHO. Under neutral expectations with random mating, 
constant population sizes and no selection pi and ?
  
should be equal (Jobling et al., 2004c). 
A neutrality test was done to detect deviations from the assumptions of neutrality and 
constant population size. Significantly positive Tajima?s D values indicate balancing 
selection and / or population subdivision while significantly negative values indicates 
population growth and /or positive selection (Jobling et al., 2004b). All of the Tajima?s D 
values were negative, only the IND value, however, reached significance indicating 
population growth. KHO, CNC and NAM would have been significant on a 10% level. 
 
 194 
The Fs and the R2 statistic have been reported to detect population expansions very 
successfully (Ramos-Onsins and Rozas, 2002; Pilkington et al., 2008). Fs is based on the 
probability of drawing a number of haplotypes that is greater or equal to the observed 
number of samples drawn from a population of constant size. R2 is based on the difference 
between the average number of nucleotide differences and the number of singleton 
mutations. The R2 statistic is especially powerful when sample sizes are small (~10) and 
Fs have a greater ability to detect population expansions when sample sizes are large 
(~50) (Ramos-Onsins and Rozas, 2002). 
 
Fs was negative for all samples except HER, GUG and KWE. All three non-African 
populations (AFR, EUR, IND) had significantly negative values. The only other populations 
that had significantly negative values were ZUX, NAM, COL. In addition to their significantly 
negative Fs values IND, EUR and NAM had significantly positive R2 values as well. AFR 
almost reached significance and would have been significant on a 10% level. HER would 
also have been significant on a 10% level. The HER had a very insignificant Fs value, 
however, Fs does not perform good when sample sizes are small (Ramos-Onsins and 
Rozas, 2002). CNC is another group that would have reached significance for R2 on a 10% 
level; also the CNC Fs P-value was the lowest P-value that did not reach significance. 
Although ZUX and COL had very significant Fs values it did not reach significant R2 
values. This might be because R2 performs better at smaller sample sizes (~10) and both 
ZUX (36) and COL (77) had better sample sizes. 
  
3.4.1 Summary: Genetic Affinities between the Khoe-San and Coloured 
groups as inferred from mtDNA analysis 
 
Using the maternally transmitted mtDNA marker the affinities between the different Khoe-
 San and Coloured groups were examined using haplogroup frequencies and the 
phylogenetic relationships of the haplogroups to one another. Since the associated 
haplogroups of the Bantu-speakers and non-African groups are very distinct from those 
commonly found in the Khoe-San, admixture from these populations will have a great 
influence in the resultant tree that represents relationships between the different population 
groups. This can be seen in Figure 3.17 where the Bantu-speaking admixture in the Cape 
 195 
Coloured and Khwe groups cause them to group with the Bantu-speaking-group. The effect 
of the admixture can also be seen in the PCA plot (Figure 3.16) where the first two 
components represents Africa vs. non-African variation and Bantu-speaking vs. non-Bantu-
 speaking variation. While an inclusive comparison is representative of the current genetic 
composition of the groups studied, it should not be used to make inferences about Khoe-
 San history and Khoe-San group relations before the Bantu-expansions and the influx of 
non-African colonists.  
 
In attempt to infer group relations between the Khoe-San and Coloured groups that existed 
before the pastoralist influx, haplogroups previously associated with Bantu-speakers (and 
the Bantu-expansions) as well as non-African haplogroups were removed from the Khoe-
 San and Coloured groups. Remaining Khoe-San associated haplogroups (L0d and L0k) 
were then used to infer relationships between the Khoe-San groups that might have 
existed in the past. The resultant PCA plot (Figure 3.19) showed that the southern groups 
were closely associated with each other while the northern groups are separate from the 
southern groups. The Khwe was different from all the groups. The association between 
physical and genetic distance remained and was even stronger than the situation where 
Bantu-speaking and non-African haplogroups were included. 
 
Due to the possibility that L0k was not part of the original Khoe-San haplogroup pool but 
rather introduced by other hunter-gatherer groups that were displaced because of the 
Bantu-expansions (such as previously discussed for the Khwe), L0k was also removed and 
only L0d based group relations was tested (Figure 3.21 and 3.22). The Khwe group was 
still the most different from the other groups because it only contained the L0dx 
haplogroup, which is absent in all other groups except the !Xun where it occurs at low 
frequency. Interestingly, the Ju\?hoansi group moved closer to the southern groups due to 
the higher frequency of L0d1b and the lower frequency of L0d1c. The correlation between 
physical and genetic distance remained for the L0d based group comparison. 
 
Cluster and PCA observations were reaffirmed from AMOVA (Table 3.6). The largest part 
of the variation in the groups was explained between non-African vs. African variation and 
Bantu-speaking vs. Khoe-San variation. AMOVA, however, also supported that a 
 196 
considerable amount of variation could be summarised as variation between northern San 
groups and southern Khoe-San groups. The current group classifications of Khoe-San vs. 
Coloured and Khoe vs. San vs. Coloured explained very little of the variation. This 
classification is thus not supported by a maternal-line genetic analogue. 
 
The following deductions can therefore be made about the maternal line genetic 
composition of the groups included in the study. Firstly, various levels of admixture from 
both Bantu-speakers and non-African groups are present in the different groups. Secondly, 
the Khwe group is different from the other Khoe-San and Coloured groups and might 
represent remnants of another extinct hunter-gatherer group that were displaced by the 
Bantu-expansions and became associated with the San. This may include the introduction 
of the Khoe linguistic group by them, gene-flow of the L0k haplogroup from them to the 
!Xun and the L0dx haplogroup from the !Xun to the Khwe. Thirdly, there is a distance 
based genetic relationship between the remaining groups. Fourth, the haplogroup 
distribution between the southern and northern groups is different and the Nama cluster 
with and are similar to the southern groups. Fifth, other factors such as the adoption of 
pastoralism also might have had an important role and the rapid spread of haplogroups 
associated with populations that accepted pastoralism would have influenced original 
haplogroup distributions. 
 
 
 
 
 
 
 
 
 
 
 
 
 197 
4. Y-CHROMOSOME STUDIES 
 
Forty-six bi-allelic polymorphisms and 12 Y-STR markers on the Y-chromosome were 
examined in 353 unrelated males to investigate the paternal affinities between three 
Coloured (KAR, COL, CNC), one Khoe (NAM) and five San (KHO, GUG, JOH, XUN, KWE) 
groups. Their affinities to neighbouring Bantu-speaking (DRC, HER, SOT, ZUX) and non-
 African populations (IND, AFR and EUR) were examined.  
 
4.1 Haplogroup allocation and geographic distribution 
 
The Y-chromosomes in the sample examined were assigned to 29 haplogroups using the 
bi-allelic polymorphisms according to the nomenclature of Karafet et al., (Karafet et al., 
2008) (Figure 4.1). Eleven major haplogroups were represented at differing frequencies in 
the groups studied (Figure 4.2). The haplogroup with the highest frequency in the total 
study group was haplogroup E-M2 (E1b1a*) (20%) followed by haplogroup A-M51 (A3b1) 
at 15% (Figure 4.1).  
 
By observing the haplogroup distributions in the different population groups in the form of 
bar-charts (Figure 4.2) a differential distribution in the different Khoe-San and Coloured 
population groups was noted. To further investigate the geographic distributions of the 
haplogroups and their sub-haplogroups, contour plots were constructed and are shown in 
Figure 4.3. 
 
 
 
 
 
 
 
 198 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
   A2* A2a A2b A3b1 B2a1a B2b* B2b1 B2b4a C* E2* E2b1 E1b1a* E1b1a1 E1b1a4 E1b1a7 E1b1b1* E1b1b1a E1b1b1c1 H* I* J* J2 K2 L* P, Q* R* R1a1 R1b R2 
Group N Gd Haplogroup frequencies 
KAR 19 0.610 0 0 0 0.105 0.053 0 0 0 0 0 0.158 0.263 0 0 0.158 0 0.053 0 0 0.053 0 0 0 0 0 0 0.053 0.105 0 
COL 35 0.647 0 0 0 0.029 0.057 0 0.029 0 0 0 0.057 0.257 0.057 0 0.171 0.057 0 0 0 0.029 0.029 0.029 0 0 0 0.029 0 0.171 0 
CAC 3 0.778 0 0 0 0 0 0 0 0 0.333 0 0 0.333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.333 0 
KHO 37 0.681 0 0.027 0 0.243 0.027 0 0 0 0 0 0 0.081 0.027 0 0.027 0.270 0 0.027 0 0.027 0 0.054 0 0 0 0 0.027 0.162 0 
CNC 23 0.631 0 0 0 0.348 0 0 0 0 0 0 0 0.217 0 0 0.087 0.087 0.043 0 0 0.087 0 0 0 0 0 0 0 0.130 0 
XEG 3 0.417 0 0 0 0 0.333 0 0 0 0 0 0 0.667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
NAM 14 0.619 0 0 0.071 0.214 0 0 0 0 0 0 0.071 0 0 0 0.357 0.214 0 0 0 0 0 0 0 0 0 0 0 0.071 0 
GUG 19 0.403 0 0 0 0.053 0.474 0 0 0 0 0 0 0.421 0 0 0.053 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
NAR 2 0.917 0.500 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.500 0 
JOH 28 0.717 0.071 0.107 0.143 0.250 0 0 0.321 0.036 0 0 0 0.036 0 0 0 0.036 0 0 0 0 0 0 0 0 0 0 0 0 0 
XUN 48 0.651 0.021 0.042 0.042 0.354 0 0.021 0.042 0.042 0 0.021 0 0.188 0 0 0.083 0.146 0 0 0 0 0 0 0 0 0 0 0 0 0 
KWE 13 0.563 0 0 0 0.077 0 0 0 0 0 0 0.077 0.231 0 0.154 0 0.462 0 0 0 0 0 0 0 0 0 0 0 0 0 
DRC 14 0.463 0 0 0 0 0 0 0 0 0 0 0.071 0.500 0.071 0 0.357 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
HER 15 0.522 0 0 0 0 0 0 0.067 0 0 0 0.067 0.133 0.267 0 0.333 0 0 0 0 0 0 0 0 0 0 0 0.067 0.067 0 
SOT 21 0.541 0 0 0 0.095 0.095 0 0 0 0 0 0.048 0.238 0 0.095 0.381 0 0 0 0 0 0 0 0.048 0 0 0 0 0 0 
SWZ 2 1.000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
ZUX 30 0.528 0 0 0 0.033 0.100 0 0 0 0 0 0.200 0.333 0 0.100 0.233 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
AFR 13 0.488 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.077 0 0 0.077 0 0 0 0 0 0 0.077 0.769 0 
EUR 3 0.361 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.000 0 
IND 11 0.695 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.091 0 0 0.364 0 0.182 0.091 0 0.182 0 0.091 
TOT fq 1 
 0.011 0.017 0.020 0.147 0.054 0.003 0.037 0.008 0.003 0.003 0.045 0.198 0.023 0.020 0.139 0.088 0.008 0.003 0.003 0.017 0.003 0.020 0.003 0.006 0.003 0.003 0.017 0.096 0.003 
Seq/HG 353  4 6 7 52 19 1 13 3 1 1 16 70 8 7 49 31 3 1 1 6 1 7 1 2 1 1 6 34 1 
HT/HG 268  4 5 5 33 8 1 7 3 1 1 9 53 7 5 46 20 3 1 1 6 1 7 1 2 1 1 6 29 1 
Gd  0.657 0.500 0.189 0.179 0.577 0.145 0.000 0.465 0.361 0.000 0.000 0.214 0.360 0.286 0.246 0.414 0.291 0.278 0.000 0.000 0.550 0.000 0.472 0.000 0.250 0.000 0.000 0.306 0.362 0.000 
                                
M60 
M91 
M114 
M14 
M23 
P28 
M51 
SRY10831.1 
 
M168 
M130 M40 
M1 
M182 
M150 
M152 
M112 
P6 P7 
P8 
M75 
M85 
P2 
M2 
M35 
M58 M154 M191 
M78 M123 
M34 
M89 
M213 
M170 p12f2 
M172 M70 M20 
M11 
M69 M9 
M74 
M207 
M124 M343 
 SRY 
10831.2 
M17 
M198 
Figure 4.1    Y-chromosome haplogroup tree with nomenclature according to Karafet et al., (2008), listing haplogroup frequencies in the 
different populations in the study group. The Gene diversities (Gd) in the different haplogroups are also indicated. (Tot fq ? Total 
frequency, Seq/HG ? Sequences per haplogroup, HT/HG ? Haplotypes per haplogroup, Gd ? Gene diversity 
 199 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.2  Graphical illustration of percentage Y-chromosome haplogroup assignment in the 
populations used in comparative population analysis 
0%
 20%
 40%
 60%
 80%
 100%
 KAR COL KHO CNC NAM GUG JOH XUN KWE DRC HER SOT ZUX AFR EUR IND
 R
 Q
 L
 K
 J
 I
 H
 E
 B
 A
 Figure 4.3  Contour plots indicating the frequency distributions of Y-chromosome haplogroups in the 
Khoe-San and Coloured populations 
 200 
The contour plots clearly showed that haplogroups had specific geographic patterns 
(Figure 4.3). A-M114, A-M14-P28 and B-M112-P6-P8 were limited to the northern groups 
with its highest frequency in the Tsumkwe area (JOH group). A-M51 seemed to have a 
wide geographic distribution; though its absence in the east of southern Africa should be 
confirmed by more extensive sampling, as no groups were sampled in this area. The 
pattern formed by B-M152 was caused by its high frequency in the GUG group and its 
absence in the XUN, JOH, KWE and NAM. E-M35 also displayed a northern distribution 
with its highest frequency in the KWE followed by the KHO, but was completely absent in 
the JOH. It had lower frequencies in the southeast. E-M2 and its derived groups were 
widely distributed but displayed higher frequencies in the east than in the west. Eurasian 
haplogroups were distributed in the southern parts of the region indicating the direction 
from which Eurasian settlers came. 
 
4.2 Haplogroup diversity 
 
The 29 haplogroups were further resolved into 268 Y-STR haplotypes. The full list of 
haplotypes is included in Appendix G. The genetic diversity of the whole study group was 
0.657 (Figure 4.1). The Khoe-San and Coloured groups (with the exception of GUG) 
generally had higher genetic diversities than BS groups and non-African groups (Figure 
4.1). AFR and EUR had low diversities compared to African groups. The GUG group had a 
lower genetic diversity than all of the groups except EUR. No instance of full haplotype 
homoplasy between haplogroups was observed in the study samples. The haplotype with 
the highest frequency is Ht051 (B-M152), which occurred a total of 12 times, of which eight 
instances were in the GUG group and one each in the XEG, ZUX and KAR. The high 
occurrence in the GUG might indicate a founder effect / bottleneck in this particular group 
sampled. This haplotype, however, was found in a KAR as well as a XEG individual 
indicating a wide geographic spread. It was also found in a BS individual indicating gene 
flow. Subsequently, eight different haplotypes occurred four times each, representing the 
second highest haplotype frequency. 
 
 
 201 
4.3 African haplogroup analyses and discussion 
 
To further investigate the structure within the haplogroups, phylogenetic networks, ??2 
distance based NJ trees and MDS plots were constructed from STR profiles (see section 
2.3.4 for full description of methodologies employed to construct networks and trees). The 
ages of different haplogroups were also determined from the networks. Dating of Y-
 chromosome haplogroups was done with ? = 6.9 x 10-4 per locus per generation with a 
generation time of 25 years (Zhivotovsky et al., 2004). The following sections will present 
these results and results will be discussed in conjunction with results regarding haplogroup 
geographical distribution and haplotype diversities presented in the previous sections 
(section 4.1 and 4.2). Furthermore results will be related to the published literature for each 
haplogroup. 
 
Haplogroup A ?  Internal structure 
The network and NJ tree for haplogroup A clearly separated A-M51 from the other three 
haplogroups (Figure 4.4 and 4.5). The other three haplogroups did not show the expected 
topology (see Figure 4.1) but individuals belonging to the same sub-haplogroup did cluster 
together. An exception is a Naro A-M14 individual that clustered within the A-M114 group.  
 
In the tree A-M51 seemed to have KAR and COL individuals at its root that connected with 
the other A subgroups (Figure 4.4). The remaining A-M51 samples were split into two 
branches. The one containing mostly KHO, XUN and JOH individuals and the other one 
containing mostly KHO and NAM individuals with one GUG and one COL individual.  
 
In the network A-M51 also connected with the other subclades through COL and KAR 
individuals, however, the two branches within A-M51 seen in the NJ tree were not as 
apparent (Figure 4.5). Instead there was a reticulation of KHO individuals that formed the 
base of three branches. The clustering of population groups seen in the NJ tree (KHO + 
NAM; JOH + XUN) was still present.  
 
The position of the KWE individual was ambiguous, presenting as an early branch at the 
base of A-M51 in the tree, while being located at the tip of a deeper branch in the network. 
 202 
 
To further investigate these relationships the distance matrix used for the NJ tree was also 
visualised by doing a MDS plot  (Figure 4.6). The MDS plot clearly separated the three 
sub-haplogroups (A-M51, A-M114 and A-M14/A-P28). Furthermore it showed that the 
COL/KAR A-M51 haplotypes grouped closer to the other A-subgroups than the rest of the 
A-M51 haplotypes. The COL/KAR, KHO/CNC and NAM haplotypes were located centrally 
in the A-M51 cluster closest to the other subgroups while the XUN, JOH and KWE A-M51 
haplotypes were located on the periphery of the A-M51 cluster, further away from the other 
subgroups. The uniqueness of the KWE haplotype was also evident since it was removed 
by a large distance on both the X and Y-axis from the other A-M51 sequences. 
 
The TMRCA of haplogroup A was 65 857 (+/-10 007) years BP. The oldest sub-haplogroup 
was A-M51, TMRCA = 54 000 (+/- 9 593) years BP, which was five times older than A-
 M14+P28 (TMRCA = 11 473 +/- 3 960 years BP) and A-M114 (TMRCA = 8 052 +/- 3 102 
years BP). 
 
 
 
 
 
 
 
 
 
 
 203 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.4  Neighbour Joining tree representing the substructure of Haplogroup A. Individuals are colour coded according to the key. The rectangular 
phylogram and unrooted radial phylogram have sub-haplogroup branches of corresponding colours. 
 204 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.5  Median joining network representing Haplogroup A substructure in the different populations of the study group.  
* 
 205 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.6  MDS plot visualizing the ??2 distance matrix for haplogroup A (also used for the Neigbour Joining tree). Individuals are colour coded 
according to the key. 
 206 
Haplogroup A - Discussion 
The oldest haplogroup, haplogroup A, was found at its highest frequencies in the northern 
San groups (Figure 4.1 and 4.2). Although Coloured and Khoe groups in general had lower 
frequencies of this haplogroup than San groups, frequencies were mostly higher than 
found in the Bantu-speakers. Also, groups with higher frequencies of haplogroup A were 
groups that are known to have had lower admixture with Bantu-speakers and non-African 
populations. This could be seen in the lower Haplogroup A frequencies in the Karretjie and 
Colesberg-Coloured groups, who have had substantial paternal line admixture from Bantu-
 speakers and non-African populations. As discussed in the introduction it is known from 
historical records that male San individuals, living in the Karoo area, were severely 
persecuted in the 1700-1800s, while females were relocated to farms. Subsequently it was 
mostly the local Xhosa males and white farm owners that contributed to the male line 
genetic variation of the resultant Coloured population. In fact their haplogroup frequencies 
for haplogroup A were less than that found in some of the Bantu-speakers. Higher 
frequencies were seen in the ?Khomani and Coloured-Northern Cape groups. The area 
that these groups occupied were not as severely targeted by colonists as was the case for 
the area that the ancestors of the Karretjie and Coloured groups occupied. 
 
Another group with a low frequency of haplogroup A was the /Gui + //Gana + Kgalagari 
group. While the maternal lines of this group was mostly the Khoe-San associated 
haplogroup L0d, most of the paternal lines seemed to be from Bantu-speaking individuals. 
This group was a mixed group with /Gui and //Gana (San) and Kgalagari (Bantu-speakers) 
ancestry. From results it seemed that the Kgalagari contributed mostly to the male line 
while the female lines came from the San groups. 
 
The sub-haplogroups within haplogroup A also had different representation patterns among 
the groups (Figure 4.1 and 4.3). A-M51 was wide-spread with representation in northern as 
well as southern Khoe-San and Coloured groups. A-M14 and its derived haplogroups, 
however, seemed to be concentrated in the northern San groups. Except for single males 
in the ?Khomani and Nama group, the A-M14 derived haplogroups was completely absent 
in population groups representative of southern Khoe-San (Figure 4.1 and 4.3). The two 
haplotypes found in the ?Khomani and Nama group had type-sharing or were close 
 207 
neighbours to the Ju\?hoansi haplotypes, indicating that the gene flow to the southern 
groups came from the Ju\?hoansi group (Figure 4.5).  
 
Published studies have only concentrated on the northern groups, !Xun, Khwe and 
Ju\?hoansi (Table 1.4). The one study that included Nama did not differentiate haplogroup 
frequencies from the !Xun, Khwe and Ju\?hoansi. Similar to results from the present study, 
the published studies found high frequencies of haplogroup A in the !Xun, and Ju\?hoansi 
but not in the Khwe. A-M51 was the most common A-haplogroup in both the !Xun and 
mixed Ju\?hoansi and !Xun group (Table 1.4). 
 
From the network and MDS analyses it could be seen that A-M14 and its derived groups 
cluster closely together and have a lower diversity when compared to A-M51 (Figure 4.5 
and 4.6). The A-M51 group had high haplotype diversities and internal structure could also 
be observed within this haplogroup. Haplotypes from the southern groups had a central 
position and were closer related to the A-M14 derived haplotypes. Surprisingly the northern 
!Xun and Ju\?hoansi groups occupied the peripheral regions of the A-M51 cluster and were 
more distantly related to the A-M14 derived haplotypes. This is unexpected since A-M14 
was largely restricted to northern groups and one would think the A-M51 lineages in the 
northern groups would be more related to the A-M14 lineages. This thus indicates that the 
A-M14 lineage did not split from the A-M51 lineage while present in the northern San 
populations. A better explanation would be that there was a very ancient split in haplogroup 
A before populations had their current distribution and designations. Haplogroup A-M51 
then developed its north-south distribution and A-M14 was subsequently incorporated into 
the northern groups from somewhere else. Another possibility can be that the A-M51 ? A-
 M14 split represent an ancient north-south split in haplogroup A in which A-M51 was a 
southern haplogroup and A-M114 a northern haplogroup. A subsequent cline then 
developed within haplogroup A-M51 when some of the A-M51 haplotypes migrated 
northwards to join A-M14 and differentiated from the southern A-M51 haplotypes. 
 
 
 208 
Haplogroup B ? Internal structure 
Within haplogroup B, both the network and tree clearly separated B-M152 and B-M112 
(and derived subgroups) into two separate groups (Figure 4.7 and 4.8). While B-M152 was 
predominantly made up of GUG individuals, they all had the same haplotype. Most unique 
B-M152 haplotypes occurred in BS.  
 
Further, the network and trees showed P8 and P6 as subgroups of B-M112. B-P8 formed a 
monophyletic clade with three haplotypes (two XUN and one JOH). Interestingly on both 
the network and tree, two B-P6 haplotypes (one COL and one HER), did not form a 
monophyletic clade with the rest of the B-P6 haplotypes (all XUN and JOH). On the 
network these two haplotypes possibly grouped closer to the B-P8 haplotypes than the 
other B-P6 haplotypes, while on the trees the COL haplotype grouped as a early branch of 
B-P6 while the HER haplotype grouped at the base of B-P8. To further investigate the 
relationships between the haplogroup B subgroups a MDS plot of the distance matrix used 
to construct the NJ tree was created (Figure 4.9).  
 
The MDS plot clearly separated B-M112, B-M152, B-P8 and two clusters in B-P6 from one 
another. B-M152 and B-P8 each form tight haplotype clusters. B-P6 is, however, divided 
into two clusters, the one containing the XUN and JOH haplotypes and the other the HER 
and COL haplotypes. 
 
Haplogroup B had a TMRCA of 54 432 (+-10 005) years BP, therefore ~10 000 years 
younger than Haplogroup A. B-M112 is the oldest haplogroup B subgroup (TMRCA = 36 
763 +/- 7 325 years BP), while B-M152 converged at 12 236 +/- 5 512 years BP. Of the two 
B-M112 subgroups B-P6 (TMRCA = 29 961 +/- 7 436 years BP) was older than B-P8 
(TMRCA = 9 058 +/- 3 629 years BP). 
 
 
 209 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.7  Neighbour Joining tree representing the substructure of Haplogroup B. Individuals are colour coded according to the key. The rectangular phylogram 
and unrooted radial phylogram have sub-haplogroup branches of corresponding colours. 
 210 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
* 
Figure 4.8  Median joining network representing Haplogroup B substructure in the different 
populations of the study group.  
 211 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.9  MDS plot visualizing the ??2 distance matrix for haplogroup B (also used for the Neigbour Joining tree). Individuals are colour coded 
according to the key. 
 212 
Haplogroup B - Discussion 
The Khoe-San associated B-haplogroup, B-M112 and its derived groups B-P6 and B-P7, 
was found at its highest frequencies in the northern San groups (Figure 4.1 and 4.2). 
Except for one individual in the Colesberg-Coloured group, this haplogroup was absent 
from southern Khoe-San and Coloured groups. It was one of the highest frequency 
haplogroups in the Ju\?hoansi with similar frequencies to the A-M14 and A-M51 groups. In 
the !Xun it had lower frequencies than the A haplogroups. B-M112 haplotypes have also 
previously been identified in the Pygmy populations of central Africa and the Hadza from 
east Africa (Underhill et al., 2000; Cruciani et al., 2002; Semino et al., 2002; YCC, 2002). 
 
The fact that B-M112 is present in the Pygmy and Hadza populations and northern San 
groups but not in the southern San groups was indicative that B-M112 is not a general 
Khoe-San haplogroup. Rather it could have been a haplogroup present in another ancient 
hunter-gatherer population north of the northern San groups. This group would have lived 
south of the Pygmy and Hadza groups before the Bantu-expansion. An ideal candidate 
would be the Ba-Twa Pygmy group. Further support for this theory comes from the study of 
rock art. The San rock art tradition ends at the Angola-Namibian border and another rock-
 art zone begins. This zone is termed the Schematic Art Zone and is significantly different 
from San rock art (Smith, 2006). The zone stretches into the DRC and Tanzania and is 
bordered in the north by the Saharan art zone. The Schematic Art Zone has been linked to 
the Ba-Twa Pygmy group (Smith, 2006). The Ba-Twa groups could have had connections 
and gene-flow from the San, Hadza and other Pygmy groups. B-M112 is common in the 
Pygmies and Hadza, therefore B-M112 might also have been a Ba-Twa associated 
haplogroup in the past that got incorporated into the northern San groups because of gene-
 flow between the two groups that did not reach the southern San groups. Physical 
anthropological studies on fossils from this region found no significant overlap with one 
specific modern group (Morris and Ribot, 2006). Instead, the morphological features for 
fossils from this region were unique to LSA people from the region itself. The sample size 
of Pymy representatives were however very small (Morris and Ribot, 2006). 
 
Most of the B-M112 haplotypes in the study group belonged to either B-P6 or B-P8, only 
one haplotype belong to the ancestral B-M112*. B-P6 and B-P8 have been reported 
 213 
previously in the Ju\?hoansi (YCC, 2002) and were restricted to Khoe-San populations. In 
the MDS plot (Figure 4.9) all the B sub-haplogroup haplotypes clustered closely together 
except B-P6 which formed two clusters, a !Xun / Ju\?hoansi cluster and a cluster of two 
haplotypes one belonging to a Herero individual and one belonging to a Colesberg-
 Coloured individual (Figure 4.9). From network and tree analysis it appeared that these two 
haplotypes were located at the base of the B-P6 and B-P8 split (Figure 4.8). This situation 
was similar to the Haplogroup A situation where southern Colesberg-Coloured/Karretjie 
haplotypes were at the base of a haplogroup split in the northern !Xun and Ju\?hoansi.  
 
The B-M152 haplogroup is a Bantu-speaking associated haplogroup and was found at 
frequencies around 10% in the Bantu-speakers of the present study group (Figure 4.1). It 
was, however, also found at very high frequencies in the /Gui + //Gana + Kgalagari. The B-
 M152 representation in the /Gui + //Gana + Kgalagari was, however, from one haplotype 
and probably indicates a strong recent founder effect in this group (Figure 4.8). 
 
While haplogroup A was found to be the oldest haplogroup with a TMRCA of 65 857 years,  
haplogroup B dated to 54 432 years. The TMRCA of these two oldest haplogroups falls 
within the range of the TMRCA for the Y-chromosome (46 000?91 000 years) as 
determined by microsatellites (Wilson and Balding, 1998; Pritchard et al., 1999).  
 
Haplogroup E ? Internal structure 
To investigate finer structure within the haplogroup E subclades three different networks 
and NJ trees were constructed for the following haplogroups and their derivative 
subclades: Haplogroup E-M75, Haplogroup E-M2 and Haplogroup E-M35. 
 
Haplogroup E-M75 
E-M75* was only represented by one XUN sequence, the rest of this haplogroup was 
contained in the sub-group E-M85. E-M85 was mostly represented by BS and Coloured 
groups (Figure 4.1). The E-M75 tree and network are represented by Figure 4.10 and 4.11. 
 
 214 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.10  Neighbour Joining tree representing the substructure of Haplogroup E-M75. Individuals are colour coded according to the key. 
The rectangular phylogram and unrooted radial phylogram have sub-haplogroup branches of corresponding colours. 
 215 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.11  Median joining network representing Haplogroup E-M75 substructure in the different populations of the study group.  
 216 
Haplogroup E-M2 
An NJ thee and network shows the internal structure within haplogroup E-M2 (Figure 4.12 
and Figure 4.13). The represented sub-haplogroups of E-M2 (E-M58, E-M154 and E-
 M191) did not form monophyletic clades in the trees or the network. The network also 
exhibited a high degree of reticulation. Overall there was a high level of partial homoplasy 
between the haplotypes of these sub-haplogroups. 
 
In the tree E-M191 formed one large monophyletic clade but also had two smaller clades 
within E-M2, while in the network there was one large clade and one smaller clade within 
E-M2. E-M58 grouped together in the tree and network with the exception of one sample. 
E-M154 formed one clade and two separate samples on the tree and two clades in the 
network. 
 
Members of the different population groups were spread throughout the network, however, 
members of the same population groups were often neighbours in the network. 
 
Haplogroup E-M2 had a TMRCA of 39 701 +/- 8 263 years BP. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 217 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.12  Neighbour Joining tree representing the substructure of Haplogroup E-M2. Individuals are colour 
coded according to the key. The rectangular phylogram and unrooted radial phylogram have sub-haplogroup 
branches of corresponding colours. 
 218 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.13  Median joining network representing Haplogroup E-M2 substructure in the different populations of the study group.  
* 
 219 
Haplogroup E-M35 
The tree and network (Figure 4.14 and 4.15) separated the two subgroups of E-M35 (E-
 M78 and E-M34) from the rest of E-M35. There was, however, a clade of E-M35* 
containing three XUN and one KWE individual that clustered closer to the E-M78 and E-
 M34 samples. To visualise this relationship better, a MDS plot was constructed using the 
distance matrix used for the NJ tree (Figure 4.16). The MDS plot also clearly showed that 
this E-M35 cluster was separate from the rest of E-M35, and grouped closer to E-M34 and 
E-M78. There were also COL/KAR and a KWE haplotypes that were separated by large 
distances from the core E-M35* haplotypes. 
 
E-M35* was associated only with Coloured, Khoe and San groups with no representation 
of BS individuals. The highest representation by far of E-M35 occurred in the KWE group 
(46%). KHO and NAM also had high frequencies (21-27%) and the COL, CNC and XUN 
had lower frequencies (5-15%). The GUG group, however, contained no individuals who 
were E-M35 and the JOH contained only one individual. 
 
The TMRCA for haplogroup E-M35 was 23 205 (+/- 6 303) years BP. The E-M35* 
haplotypes converged at 17 921 (+/- 5148) years BP while the E-M78 subgroup was 
younger 8 052  (+/- 3 183) years BP. 
 
 
 
 220 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.14  Neighbour Joining tree representing the substructure of Haplogroup E-M35. Individuals are colour coded according to the key. The rectangular 
phylogram and unrooted radial phylogram have sub-haplogroup branches of corresponding colours. 
 221 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.15  Median joining network representing Haplogroup E-M35 substructure in the different 
populations of the study group.  
* 
 222 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.16  MDS plot visualizing the ??2 distance matrix for haplogroup E-M35 (also used for the Neigbour Joining tree). Individuals are colour coded 
according to the key. 
 223 
Haplogroup E - Discussion 
The E haplogroups, except E-M35, is associated with Bantu-speakers and presence in 
Khoe-San and Coloured groups is indicative of Bantu-speaking admixture (see section 
1.2.2.3). I will refer to these E haplogroups as BS-haplogroup E. The Bantu-speaking group 
from the DRC exclusively belonged to BS-haplogroup E (Figure 4.1). The Bantu-speakers 
from southern Africa had lower frequencies of BS-haplogroup E (76-87%), most likely 
because of incorporation of hunter-gatherer haplogroups during their migration from east 
Africa to southern Africa. The non-Bantu-speaking groups with the highest frequencies of 
BS-haplogroup E was the Colesberg-Coloured and Karretjie groups followed by the /Gui + 
//Gana + Kgalagari, Khwe, Nama, Coloured-Northern Cape, !Xun, ?Khomani and the 
Ju\?hoansi. The high male line Bantu-speaking admixture in the Karretjie, Colesberg-
 Coloured and /Gui + //Gana + Kgalagari groups was discussed previously. The Ju\?hoansi 
had only a single individual belonging to BS-haplogroup E. Since it is known that the 
Ju\?hoansi group was very isolated from outside influence until recently, this furthermore 
support the notion that BS-haplogroup E was introduced into the Khoe-San groups through 
recent admixture. 
 
None of the Bantu-speaking groups from the present study had E-M35 representation 
(Figure 4.1). Most E-M35 haplotypes were not classified into the E-M35 subclades and 
were E-M35*. The few haplotypes that did belong to the E-M35 subclades (E-M78 and E-
 M34) were from the Coloured groups and were most likely introduced through admixture 
from Europeans.  
 
The remaining E-M35* haplotypes were divided into two groups, one group are closely 
associated with the E-M78 and E-M34 subgroups and the other far larger group was 
separate from these (Figure 4.15 and 4.16). At the time that the experimental work, which 
form part of this thesis, was done the E-M293 marker (Henn et al., 2008) was not identified 
yet. As discussed in the Introduction (see section 1.2.2.3) the E-M293 was found to 
encompass all the !Xun and Khwe E-M35* haplotypes from that study (Henn et al., 2008). 
Furthermore, closely related E-M293 haplotypes was identified in the Hadza and Sandawe 
at high frequencies. The study linked the E-M293 marker to the introduction of pastoralism 
to the southern parts of Africa (Henn et al., 2008). Without representation of more Khoe-
 224 
San groups the study, however, could not address the question of how pastoralism spread 
after it reached south-central Africa (Henn et al., 2008). The present study offers haplotype 
frequencies for additional population groups, however, they were not typed with E-M293 
and are classified as E-M35*. It is uncertain if all of the E-M35* haplotypes in the present 
study belong to E-M293. In the MDS plot and network analysis E-M35* clustered into two 
separate groups (Figure 4.15 and 4.16). Either one or both of these clusters may be E-
 M293. In the smaller cluster, only Khwe and !Xun were represented, while the larger cluster 
included haplotypes from all of the Khoe-San and Coloured groups. It may be that both 
these E-M35* clusters were introduced by a pastoralist group migrating from east Africa. 
 
Henn et al., also typed STRs for their E-M293 samples (Henn et al., 2008). They noted that 
most of the !Xun and Khwe had DYS389I -10 while the east African populations had a 
range of repeat amounts at this locus (only one Khwe individual did not have DYS389I-10, 
it was not stated which repeat number this individual had at DYS389I). In the large 
haplotype cluster (Figure 4.15 and 4.16), all haplotypes contained DYS389I-10 (except one 
!Xun individual who had DYS389I ?11, which could have been a recent increase). In the 
smaller cluster all the individuals had 14 or 13 repeats. This smaller cluster could have 
been an accompanying haplotype of the M293(DYS389I-10) haplotype in its journey from 
east Africa. It is unlikely that only one haplotype would have migrated south and Henn et 
al., admitted that it is possible that other male individuals who did not carry M293 were also 
involved (Henn et al., 2008). Furthermore the Sandawe, Hadza and Datog individuals who 
had E-M35* had a range of haplotypes both with and without M293(DYS389I-10). It is thus 
definitely possible that this smaller E-M35* cluster also migrated from east Africa whether it 
contains M293 or not. Furthermore, the small cluster was located between E-M34 and E-
 M78 and it is known that these two haplogroups originated in east Africa before spreading 
to the Middle East and Europe (Semino et al., 2004). 
 
The smaller cluster haplotypes, did not spread to the south, while the larger cluster, 
DYS389I-10 haplotypes, did (Figure 4.15). It is, however, not likely that the spread of 
pastoralism was a clear-cut demic or cultural diffusion towards the south. Rather some E-
 M35* male individuals probably integrated in the southern tribes and took with them the 
pastoralist practice and possibly also their language. This could be deduced from the 
 225 
distribution of the E-M35* haplogroup (Figure 4.1 and 4.3). The highest percentage was in 
the Khwe (46%). The group that introduced pastoralism to the southern parts might well 
have been the ancestral group to the Khwe population. Aside from their Bantu-speaking 
admixture the Khwe have a very different Y-chromosome as well as mtDNA profile 
compared to the other Khoe-San groups. Furthermore the Khwe speak a language from 
the western Khoe division. It is very important to establish their genetic relationships with 
the eastern Khoe-speaking San groups (Shua and Tshua), which they phenotypically 
resemble. As discussed in the mtDNA result section it is one of these eastern Khoe-
 speaking San groups that harbor the linguistic link to Sandawe through the extinct Kwadi 
language. The Khwe groups of today are not pastoralists, however, they live in a Tsetse fly 
invested area. The Shua and Tshua, however, are pastoralists and cultivators or live in 
close trade relations with Bantu-speakers  (Mafisa contracts) in which they tend to their 
cattle. 
 
Following the Khwe, the groups with the highest E-M35* frequencies were the ?Khomani 
(27%) and Nama (21%) (Figure 4.1 and 4.3). Their frequencies were not as high as in the 
Khwe and unlike the Khwe they contained high frequencies of haplogroups A and B-M112. 
This suggested not a full population diffusion of the pastoralists but rather incorporation into 
other resident hunter-gatherer populations. The Nama group adopted the pastoral practice 
and also speak a Khoe language, however, they still retained a large proportion of original 
Khoe-San haplogroup A (29%) and had a mtDNA profile similar to the other southern 
Khoe-San groups. It is difficult to know if the ancestors of the ?Khomani adopted the 
pastoral tradition since this grouping of people today have resulted from various disrupted 
groups. Reports of older individuals, however, indicated that they were hunters and it is 
known that these groups have spoken the southern San, Tuu division of languages. It 
might have been that there was a movement of individuals between the southern San 
hunter-gatherers and the Khoe pastoralists who occupied the same area. 
 
The Colesberg-Coloured (6%) and Coloured-Northern Cape (9%) had lower percentages 
of E-M35* and the haplogroup was absent from the Karretjie group. The Colesberg-
 Coloured and to a certain extent the Karretjie and Coloured-Northern Cape would have a 
large input from the southern Khoe groups such as the Griqua and Cape KhoeKhoe. 
 226 
Historically, it is known that these groups were pastoralists and spoke Khoe languages. In 
the Colesberg-Coloured, Karretjie and Coloured-Northern Cape groups, E-M35 
frequencies, however, were much lower than in the ?Khomani, Nama and Khwe groups. 
This could have been due to the purging of Khoe-San male lineages that happened in the 
1700-1800s as discussed previously. When this was taken into account and only Khoe-San 
associated lineages (A, B-M112, E-M35) were considered (see Figure 4.24) lower 
frequencies were still seen in these groups. It thus seem that, even though language and 
pastoralism did transfer from the incoming pastoralists to the southern groups, the male 
line genetic input in the form of E-M35* declines from north to south and a female line 
genetic input is almost absent (only two L0k mtDNA haplogroups in Nama, which likely 
came from recent admixture with the Ju\?hoansi or !Xun)  
 
Incorporation of E-M35* into the !Xun (15%) was lower than into the ?Khomani and Nama 
(Figure 4.1). The !Xun also did not adopt the Khoe language nor the  pastoralism tradition. 
Even though the !Xun today have a pastoralist tradition, historical records indicates that 
they adopted the tradition from neighbouring Bantu-speakers. The relationship between the 
!Xun and Khwe (here considered to be the remnant of the east African pastoralists) genetic 
profile was also different from the profile between the Khwe and southern Khoe-San 
groups. The southern groups adopted language and pastoralism together with an small 
male-only genetic contribution. The !Xun, on the other hand, did not adopt either the 
language or pastoralism culture but genetically had more female line (~30% L0k and L0x) 
than male line (15% E-M35*) contributions from the Khwe.  
 
Similarly the Ju\?hoansi, a group that never adopted pastoralism, had only one E-M35* 
individual. This Ju\?hoansi E-M35* haplotype had ?Khomani individuals as its closest 
neighbours on the network and thus probably did not come from gene-flow with the Khwe 
directly. The Ju\?hoansi did, however, have substantial (24%) contribution in the female line 
in the form of L0k and, as discussed in the mtDNA result section, the Khwe L0k haplotypes 
were ancestral to the !Xun and Ju\?hoansi L0k haplotypes. 
 
Thus to summarise the hypothesis deduced from Y-chromosome and mtDNA results 
discussed above: The east African pastoralists settled in present-day northern and perhaps 
 227 
eastern Botswana. Today their remnant genetic variation can be seen in the Khwe and 
possibly also in the eastern Khoe-speaking San groups. Males from the pastoralist group 
were incorporated into the southern San groups and transferred their language and 
pastoralist culture to some of the groups, these groups became known as the Khoe. Some 
of the other Khoe-speaking San groups might have also been involved and genetic studies 
on more Botswana San groups is necessary to confirm how exactly the pastoralist culture 
and language transfer between the Khwe and the Khoe groups took place. The /Gui + 
//Gana + Kgalagari group from this study did not have E-M35* even though they speak a 
Kalahari-Khoe language. The E-M35* status of the other central and southern Botswana 
San groups such as the Naro, !X??, ?H??, Tshua, Shua and more /Gui and //Gana groups 
will be able to help resolve this question. With neighbouring northern San groups, the 
pastoralists exchanged mostly female and very little genetic male genetic variation. 
Traditionally the female adopts the culture of the group she relocates to, therefore neither 
the language nor the pastoralist culture was transferred to the Ju\?hoansi and !Xun 
(Barnard, 1992). 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 228 
4.4 Eurasian haplogroups 
 
Haplogroup R ? Internal structure 
Despite the non-African nature of haplogroup R, this haplogroup exhibited ample 
representation in the groups studied, especially the Coloured group due to recent 
admixture. To illustrate haplotype sharing and close neighbours, networks and NJ trees 
were constructed (Figure 4.17 and 4.18) 
 
The network and tree separated the various subgroups of haplogroup R. Haplogroup R-
 M343 was most common, with a high degree of reticulation in the network.  
 
Most of the non-Eurasian representation in the R network was from the KHO/CNC and 
KAR/COL Coloured groups. Furthermore there was one CAC, one NAR, one NAM and two 
HER individuals. Within R-M343 there was a high degree of type sharing and close 
neighbours between the KAR/COL and KHO/CNC individuals and the AFR individuals 
(only three individuals in the network were EUR the rest of the AFE representation were 
AFR). 
 
Eurasian haplogroups - Discussion 
The Eurasian haplogroups found in the study (H, J, I, K, L, P/Q, R) were mostly 
incorporated into the southern Khoe-San groups by in-moving colonists and slaves from 
Asia (Figure 4.1). The groups that are presently known as the Coloured population was a 
result from these unions. The network of the most common Eurasian haplogroup, 
Haplogroup R, showed many instances of haplotype sharing between Afrikaner and 
Coloured groups (Figure 4.18). The male and female line contributions in the Coloured 
populations was very asymmetrical with most of the female lines in the Coloured groups 
coming from Khoe-San and most of the male lines from the Eurasian and Bantu-speaking 
input. 
 229 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.17  Neighbour Joining tree representing the substructure of Haplogroup R. Individuals are colour coded according to the key. The rectangular 
phylogram and unrooted radial phylogram have sub-haplogroup branches of corresponding colours. 
 230 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.18  Median joining network representing Haplogroup R substructure in the different populations of the study group.  
 231 
4.5 Analyses of Y-chromosome genetic relationships between different 
Khoe, San, Coloured and neighbouring groups 
 
The genetic relationships between the 15 populations in the Y-chromosome study group 
were assessed using exact tests of population differentiation in combination with Fst 
genetic distances on haplogroup frequency data and Rst genetic distances on STR 
haplotype data (Table 4.1). Data from the two types of datasets correlated well when tested 
using the Mantel test. It confirmed the relationship between haplogroup frequency 
distances and STR haplotype distances (p > 0.00001) with 88% of the haplogroup 
frequency distance being explained by STR haplotype distance and a correlation 
coefficient (r) of 0.937 between the two matrices. 
 
The two distance matrices were used to construct PCA plots (Figure 4.19 and 4.20) and 
trees based on cluster analysis (Figure 4.21 and Figure 4.22). 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 232 
Table 4.1  Pairwise genetic distances between the 15 study groups calculated from Y-chromosome data 
 
a) Matrix of Fst genetic distances calculated from HG frequency data 
 
 
 
 
 
 
 
 
 
b) Matrix of Rst genetic distances calculated from STR haplotype data 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Abbreviations: 
*  significant difference, P<0.05 
**  significant difference, P<0.01 
*** significant difference, P<0.001 
 
 AFE COL DRC CNC GUG HER IND JOH KAR KHO KWE NAM SOT XUN ZUX 
AFE 0.000               
COL 0.251** 0.000              
DRC 0.640*** 0.032 0.000             
CNC 0.341*** 0.040 0.140** 0.000            
GUG 0.653*** 0.115** 0.183** 0.195*** 0.000           
HER 0.419*** 0.017 0.053 0.115*** 0.241*** 0.000          
IND 0.480*** 0.129*** 0.285*** 0.176*** 0.318*** 0.155*** 0.000         
JOH 0.445*** 0.131*** 0.270*** 0.090*** 0.286*** 0.160*** 0.178*** 0.000        
KAR 0.317*** 0.000 0.027 0.014 0.112** 0.028 0.123*** 0.117*** 0.000       
KHO 0.280*** 0.059** 0.212*** 0.012 0.228*** 0.129*** 0.135*** 0.099*** 0.062** 0.000      
KWE 0.558*** 0.096* 0.218*** 0.107* 0.269*** 0.189*** 0.218*** 0.187*** 0.104* 0.038 0.000     
NAM 0.437*** 0.056 0.156** 0.049 0.299*** 0.047* 0.176*** 0.120*** 0.048 0.036 0.106* 0.000    
SOT 0.478*** 0.021 0.010 0.078** 0.147** 0.021 0.190*** 0.173*** 0.006 0.131*** 0.162** 0.031* 0.000   
XUN 0.421*** 0.076*** 0.156** 0.000 0.203*** 0.131*** 0.181*** 0.061** 0.051*** 0.025** 0.086 0.045 0.091** 0.000  
ZUX 0.460*** 0.017* 0.004 0.097*** 0.108** 0.055** 0.194*** 0.189*** 0.000 0.147*** 0.131*** 0.099*** 0.000 0.112*** 0.000
  AFE COL DRC CNC GUG HER IND JOH KAR KHO KWE NAM SOT XUN ZUX 
AFE 0.000               
COL 0.236 0.000              
DRC 0.593 0.047 0.000             
CNC 0.250 0.022 0.065 0.000            
GUG 0.632*** 0.091*** 0.233*** 0.150*** 0.000           
HER 0.462 0.051 0.033 0.067 0.307*** 0.000          
IND 0.220 0.092 0.265 0.104 0.353*** 0.206 0.000         
JOH 0.264*** 0.067*** 0.181** 0.088*** 0.226*** 0.152** 0.071** 0.000        
KAR 0.219* 0.008 0.065 0.013** 0.139*** 0.086* 0.113 0.091*** 0.000       
KHO 0.180 0.044 0.155* 0.020 0.199*** 0.129* 0.081* 0.086*** 0.053*** 0.000      
KWE 0.414 0.064 0.146 0.080* 0.233*** 0.164* 0.182 0.139*** 0.059** 0.049** 0.000     
NAM 0.323 0.034 0.091 0.009 0.236*** 0.059 0.110 0.093** 0.059 0.009 0.056* 0.000    
SOT 0.431 0.022 0.013 0.026 0.138*** 0.020 0.183 0.132*** 0.017 0.113** 0.098 0.062 0.000   
XUN 0.296** 0.032*** 0.061* 0.037*** 0.210*** 0.034*** 0.090** 0.075*** 0.037*** 0.040*** 0.056** 0.020* 0.049** 0.000  
ZUX 0.447 0.016 0.036 0.047* 0.129*** 0.051 0.209 0.144*** 0.026 0.120*** 0.108* 0.087 0.000 0.070*** 0.000 
 233 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.19  A ? Principal Component Analysis of Y-chromosome Fst values between different populations in the study group. A minimum spanning tree 
connects populations. Component 1 = 68.7% of the variation, Component 2 = 16.2% of the variation, Component 3 = 7.6% of the variation. B ? Loadings 
for Component 1, C ? loadings for Component 2, D ? loadings for Component 3 
A 
-0.2253
 0.1646
 0.4079
 0.2041
 0.3181
 0.2627
 0.19130.19720.2124
 0.1365
 0.29440.27130.31630.2394
 0.295
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.6
 0.5765
 -0.005364
 -0.2242
 0.2347
 -0.3
 0.01972
 0.2173
 0.3144
 -0.006894
 0.3321
 0.22560.267
 -0.09356
 0.2382
 -0.1667
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.6
 0.6796
 0.1887
 0.3236
 -0.02682-0.008441
 0.2677
 -0.1516
 -0.2472
 0.1367
 -0.1588
 -0.2868
 0.02025
 0.206
 -0.1872
 0.1929
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.6
 0.7
 A
 F
 E
 C
 O
 L
 D
 R
 C
 C
 N
 C
 G
 U
 G
 H
 E
 R
 I
 N
 D
 J
 O
 H
 K
 A
 R
 K
 H
 O
 K
 W
 E
 N
 A
 M
 S
 O
 T
 X
 U
 N
 Z
 U
 X
 B 
C 
D 
 234 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.20  A ? Principal Component Analysis of Y-chromosome Rst values between different populations in the study group. A minimum spanning tree 
connects populations. Component 1 = 66.9% of the variation, Component 2 = 22.9% of the variation, Component 3 = 5.7% of the variation. B ? Loadings 
for Component 1, C ? loadings for Component 2, D ? loadings for Component 3 
A 
-0.2691
 0.1688
 0.4499
 0.1784
 0.34360.3492
 0.02883
 0.099650.15870.08169
 0.25880.2214
 0.3329
 0.2048
 0.3369
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.6969
 0.04938
 -0.01159
 0.1325
 -0.1934
 0.1184
 0.4314
 0.2432
 0.09523
 0.261
 0.18660.2295
 -0.02295
 0.1746
 -0.04292
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.6
 0.7
 0.8
 0.9
 0.1762
 0.06497
 -0.225
 0.01994
 0.8091
 -0.4119
 0.05792
 0.1279
 0.058220.1089
 0.1437
 -0.08415-0.0795
 -0.142
 0.01297
 -0.7
 -0.6
 -0.5
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.6
 0.7
 0.8
 0.9
 1
 A
 F
 E
 C
 O
 L
 D
 R
 C
 C
 N
 C
 G
 U
 G
 H
 E
 R
 I
 N
 D
 J
 O
 H
 K
 A
 R
 K
 H
 O
 K
 W
 E
 N
 A
 M
 S
 O
 T
 X
 U
 N
 Z
 U
 X
 -0.9
 B 
C 
D 
 235 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.21  Cluster analysis tree representing Y-chromosome Fst values between 
different populations in the study group. 
Figure 4.22  Cluster analysis tree representing Y-chromosome Rst values between 
different populations in the study group. 
 236 
The first component on both PCA plots (Figures 4.19 and 4.20) separated the non-African 
groups from the African groups and represented 67-68% of the variation. The second 
component in both cases separated the BS groups from the non-African and Khoe-San-
 Coloured (KSC) groups. In the Fst PCA plot (Figure 4.19) this component represented 16% 
of the total variation and in the Rst PCA plot (Figure 4.20) it represented 23%. Component 
3 for the Fst PCA plot contained 8% variation and seemed to be a component that 
separates BS and non-African groups from the KSC group. In the Rst PCA plot the third 
component (6% variation) separated the GUG group from other groups. 
 
For the cluster analysis, with both haplogroup and haplotype data the AFE, IND, GUG and 
JOH populations were separated from the other populations. These populations were also 
significantly different from most other groups in the study group. The separation in the AFE 
and IND groups was expected. In the GUG group the low level of haplogroup and 
haplotype diversity together with the relative high frequency of haplogroup B-M152 resulted 
in isolation from other groups. The JOH group consisted mostly of Haplogroup A and B 
with a very small contribution of haplogroup E. This separated them from the other groups 
who all have substantial contributions from haplogroup E.  
 
The rest of the groups formed a monophyletic group with similar internal structures in both 
datasets. The groups were divided into two branches, the KAR and COL were grouped 
with the BS groups, while the CNC, NAM, KHO, XUN and KWE grouped on the other 
branch. KAR and COL had higher contributions from haplogroup E (excluding E-M35), 
which grouped them closer to the BS groups. In addition, their contributions from 
haplogroup A and B was similar to the proportions seen in the BS groups. Although the 
CNC, NAM, KHO, XUN and KWE groups also had high haplogroup E frequencies, a large 
proportion of their haplogroup E types were E-M35, which was absent from the BS 
individuals. Furthermore all of these groups except KWE had higher frequencies of 
haplogroup A than KAR, COL and BS groups. The next level of grouping in this branch 
indeed excluded KWE in both datasets.  
 
The way in which CNC, NAM, KHO, XUN was grouped further differ between the two sets 
of data. While the haplogroup data first grouped CNC and XUN together and then 
 237 
subsequently joined them with KHO and NAM, the haplotype data grouped CNC and NAM 
and subsequently joined them with KHO and XUN. In the other branch for the haplotype 
data, KAR and COL grouped together and then joined the southeastern BS group, SOT 
and ZUL. After that DRC and HER (that also grouped together) joined. In the haplogroup 
data, HER grouped separately and DRC grouped with SOT and ZUL. 
 
To test the resemblance of the STR-haplotype and haplogroup-frequency based distance 
to physical distances between Coloured and Khoe-San groups, the genetic distance 
matrices (Table 4.1) were correlated with a physical distance matrix (Appendix C). In 
Figure 4.23 pairwise comparisons between physical distance (X-axis) and genetic distance 
Y-axis was plotted on graphs. A linear regression was done to determine the line with the 
best fit through the points. Both the graphs of Fst and Rst vs. physical distance (Figure 
4.23) had slightly negative gradients (-1.451e-05 and -1.572e-05, respectively); these 
gradients were, however, found to be non-significant (p = 0.625367 and p = 0.533578). The 
Mantel test also indicated no associations between the two genetic distances and physical 
distance that were significantly different from correlation between random datasets 
generated through permutation tests (Fst: p  = 0.625400 and Rst: p = 0.66770). 
 
Thus both linear regressions and Mantel tests found no correlation between physical and 
genetic distance for Y-chromosome haplogroup frequencies and Y-chromosome STRs. 
 
To investigate the possibility that admixture with Eurasian and Bantu-speaking groups 
erased a gradient of variation in Khoe-San people that existed in the past, genetic distance 
matrices of groups were again compiled, excluding individuals with Eurasian and Bantu-
 speaking associated haplogroups. In these matrices only the following haplogroups were 
included: Haplogroup A and subgroups, haplogroup B-M112 and subgroups, and 
haplogroup E-M35* (excluding E-M34 and E-M78). The percentages of each of these 
haplogroups in each of the population groups are represented in Figure 4.24. The PCA and 
cluster analysis of the resultant Rst distance matrix is presented in Figure 4.25 and 4.26. 
 
 
 
 238 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4.23  Pairwise comparisons between physical geographic distance (X-axis) and Y-chromosome Fst 
and Rst genetic distance (Y-axis). 
A B 
0%
 10%
 20%
 30%
 40%
 50%
 60%
 70%
 80%
 90%
 100%
 KAR_COL KHO CNC NAM JOH XUN KWE
 E-M35*
 B-P8
 B-P6
 B-M112
 A-M14/P28
 A-M114
 A-M51
                     N = 6           N = 20          N = 10        N= 7           N = 27        N = 34       N = 7    
Figure 4.24  Graphical illustration of percentage Y-chromosome haplotype for 
Khoe-San associated haplogroups in the Khoe-San and Coloured groups. 
 239 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Figure 4.25   Principal component analysis of Y-chromosome Rst values (excluding Eurasian and BS associated haplogroups) between Khoe-San and 
Coloured groups. A minimum spanning tree connects populations. Component 1 = 97.4% of the variation, Component 2 = 1.5% of the variation, 
Component 3 = 0.9% of the variation. B ? Loadings for Component 1, C ? loadings for Component 2, D ? loadings for Component 3 
A 
-0.4006 -0.367 -0.393 -0.4009
 -0.3414 -0.3286
 0.4063
 -0.5
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.06148
 -0.5145
 -0.4034
 -0.03466
 0.7331
 0.06224
 -0.1621
 -0.6
 -0.5
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.6
 0.7
 0.8
 0.1374
 -0.2742
 -0.1449
 -0.08959
 -0.3461
 0.8673
 0.06999
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.6
 0.7
 0.8
 0.9
 1
 K
 A
 R
 -
 C
 O
 L
 K
 H
 O
 C
 N
 C
 N
 A
 M
 J
 O
 H
 X
 U
 N
 K
 W
 E
 B 
C 
D 
 240 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
For this distance matrix KHO, NAM and CNC clustered closely together (they contained 
only E-M35 and haplogroup A haplotypes). These three groups were then joined by KAR-
 COL to form a branch containing all the southern Khoe-San and Coloured groups. The 
XUN subsequently joined this branch and thereafter the JOH group whom had smaller E-
 M35 and larger Haplogroup B contributions. The KWE group was separate from the other 
groups and consisted mostly of E-M35 haplotypes with only one haplogroup A haplotype. 
This is reflected in the PCA plot (Figure 4.25) in which the first component (97% of all 
variation) separated KWE from the other groups. The subsequent components, 2 and 3, 
contained only ~1% variation each and separated XUN and JOH from the southern Khoe-
 San-Coloured groups. 
 
Figure 4.26   Cluster analysis tree representing Y-chromosome 
Rst values (excluding Eurasian and BS associated 
haplogroups) between Khoe-San and Coloured groups. 
 241 
In this case the Mantel test still did not find a significant similarity between this Rst distance 
matrix and the physical distance matrix. The p=value (p = 0.296), however, came down 
and the correlation coefficient had a slightly positive value (r = 0.008).  
 
To test the apportionment of variation in the different population groups, AMOVA analysis 
was done to see how much variation is contained firstly between defined groupings of 
populations (see Table 4.2), secondly between the different populations in the study and 
thirdly within the populations. Table 4.2 represent the results of the AMOVA analysis with 
various different groupings of the first level group. 
 
Overall the AMOVA based on STR data had very high intra-population variances. Very little 
variation was ascribed to variation among individual populations (generally 2% - 4%) and 
none to inter-group variation. This illustrated the high variation in STR data and stressed 
the point that it should only be used for finer mapping within haplogroups.  
 
With the haplogroup frequency data much more variance was ascribed to inter-population 
and inter-group differences. The inter-populations variances varied between 6-10%. The 
inter-group variances were the highest (11%) when the non-African populations were split 
from the African populations (Grouping A ? Table 4.2). When the Eurasian, BS and Khoe-
 San-Coloured (KSC) were grouped into three separate groups, inter-group variance 
explained 7% of the variation (Grouping B ? Table 4.2). When only African groups were 
considered, there is 5% variance between the KSC groups and the BS groups (Grouping E 
? Table 4.2). If only the KSC groups are considered and split into northern and southern 
groups only 0.03 of the variance is explained by the groupings (Grouping F ? Table 4.2). 
The variance explained was less than 1% when the KSC groups were split first into a 
Khoe-San and Coloured (2 groups) grouping and thereafter into a Khoe, San and Coloured 
(3 groups) grouping (Grouping G and H ? Table 4.2). Of all the different groupings, the only 
case where inter-group variation was higher than inter-population variation was when the 
non-African populations were split from the African populations (Grouping A ? Table 4.2). 
In the Eurasian, BS and KSC split the values, however, came close to one another 
(Grouping B ? Table 4.2). 
 
 242 
Table 4.2  Results from Y-chromosome AMOVA analysis using different groupings on the first level 
RST Fst  
 
 
Grouping Grouping of first level [Groups] 
 
Between 
groups 
Between 
populations  
within 
groups 
Between 
individuals 
within 
populations 
Between 
groups 
Between 
populations  
within 
groups 
Between 
individuals 
within 
populations 
 
A 
[afe, ind] 
[col, cnc, kar, kho, nam, joh, xun, 
gug, kwe, drc, her, sot, zux] 
- 
2.73 
 
97.92 
 
10.87 
 
9.14 
 
79.99 
 
 
 
B 
[afr, eur, ind]  
[col, cnc, kar, kho, nam, joh, xun, 
gug, kwe, cac] 
[drc, her, sot, zux] 
- 
2.85 
 
97.59 
 
7.03 
 
7.90 
 
85.06 
 
 
 
C 
[ afr, eur, ind ]  
[ col, cnc, kar, kho, nam ]  
[ drc, her, sot, zux ] 
[ gug, joh, xun, kwe ]  
- 
2.82 
 
97.44 
 
4.76 
 
8.09 
 
87.15 
 
 
D 
[col, cnc, kar, kho, nam, joh, xun, 
gug, kwe] 
[drc, her, sot, zux, afr, eur, ind] 
- 
2.71 
 
97.48 
 
2.40 
 
10.58 
 
87.02 
 
 
E 
[col, cnc, kar, kho, nam, joh, xun, 
gug, kwe]  
[drc, her, sot, zux] 
- 
2.98 
 
97.34 
 
4.93 
 
6.59 
 
88.48 
 
 
F 
[col, cnc, kar, kho, nam] 
[joh, xun, gug, kwe] - 
3.87 
 
96.35 
 
0.03 
 
8.15 
 
91.82 
 
 
G 
[col, cnc, kar]  
[nam, joh, xun, gug, kho] - 
3.81 
 
96.90 
 
0.23 
 
7.70 
 
92.07 
 
 
H 
[col, cnc, kar] 
[nam]  
[joh, xun, gug, kho] 
- 
4.20 
 
97.03 
 
0.19 
 
7.70 
 
92.10 
 
 
 
4.5.1 Discussion on the genetic affinities between Khoe-San and Coloured 
populations from southern Africa 
 
When genetic distances between groups as a whole are compared they depend on the 
composite haplogroup profile of the group, and the relationships of these haplogroups to 
one another. Distance trees between groups containing all haplogroups do represent the 
present day genetic profile of the group. However, as explained in section 3.4, when 
inferences of the past relationship between Khoe-San groups are attempted, haplogroups 
that resulted from recent admixture into the groups needs to be removed. For instance 
 243 
when all haplogroups are included, the Karretjie and Colesberg-Coloured groups cluster 
with Bantu-Speakers (Figure 4.22), however, when only Khoe-San associated haplogroups 
(A, B-M112 and E-M35*) are considered, they group with the other southern Khoe-San 
groups (Figure 4.26).  
 
The cluster analyses of Khoe-San associated haplogroups, group all the southern Khoe-
 San groups with the !Xun and Ju\?hoansi as separate outsiders. As was seen in the mtDNA 
group distance analysis the Khwe group is very different from the other groups, even when 
all Bantu-speaking admixture is removed. The component of the PCA plot that separates 
the Khwe from the other groups contained 97% of the variation while, the component that 
separates the !Xun and Ju\?hoansi from the southern groups is very small (1%). 
 
This small amount of variation between southern and northern groups apparent through 
AMOVA analyses (Table 4.2) was also reflected in the genetic vs. physical distance 
correlation (Figure 4.23). In neither the case where all haplogroups were included nor 
where admixed haplogroups were removed were there any indication that genetic and 
physical distances are related. This is in contrast to the mtDNA results, where there was a 
correlation between physical and genetic distances. In section 1.2.2.3 the contrasting 
results from several studies regarding female vs. male gene-flow were discussed. Briefly, 
what was found is that while in pastoralist groups Y-chromosome based genetic distance is 
strongly correlated with physical distance, in hunter-gatherer societies Y-chromosome 
genetic distances are not correlated with physical distance. The lack of correlation between 
physical and genetic distance of the present study is thus not surprising. 
 
AMOVA analysis also confirmed that, unlike for the maternal lineages, the northern and 
southern Khoe-San groups are genetically more homogeneous. The distinction between 
northern and southern Khoe-San contained less than 1% of the variation. Furthermore 
similar to what was found in the maternal lineages the widely used groupings ?San and 
Khoe?, and ?Coloured, Khoe and San? contained <1% variation as well. Significant variation 
between the male lineages of African and non-African groups was, however, observed in 
AMOVA analysis (Table 4.2) where 11% of the variation can be ascribed to variation 
between the continental groups (the same grouping for the female lineages explained 29% 
 244 
variation). Also, when all the haplogroups and population groups are included for PCA the 
largest component by far is between non-African and African groups (Figure 4.19 and 
4.20). The grouping of Khoe-San+Coloured groups vs. Bantu-speakers contained 5% 
variation in AMOVA analysis while in the female lineages this grouping explained 22% of 
the variation (Table 4.2). 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 245 
5. AUTOSOMAL DNA STUDIES 
 
The results of the typing of the 220 SNP loci are available in the Supplementary electronic 
data - File A. 
 
5.1 Results and discussion (Genotypes) 
 
For the genotypic part of the autosomal DNA study 100 datasets of 44 unlinked 
polymorphisms in each were compiled as outlined in Section 2.4. To summarise: The 
whole dataset contained 220 autosomal SNPs. There were 10 SNPs per chromosome 
(chromosome 1 to 22), contained in two groups of 5 linked SNPs. The two SNP-groups 
were completely unlinked from one another (Figure 2.4). To compile datasets for genotypic 
analysis one SNP per SNP-group (5-linked SNPs) were randomly selected. By selecting 
one SNP per group for each of the two SNP-groups per chromosome, a set of 44 SNPs are 
generated. This process was repeated a 100 times to generate 100 different SNP sets, 
each with 44 unlinked SNPs. These 100 SNP sets with 44 SNPs each formed the dataset 
for the autosomal genotypic analyses.   
 
5.1.1 Heterozygosity 
 
The proportion of polymorphic loci, heterozygosity and gene diversity for each of the 100 
different SNP datasets were calculated for the 14 populations analysed as well as for the 
total dataset. The averages for these three summary statistics were calculated across the 
100 datasets and are shown together with the standard deviation between the 100 
datasets in Table 5.1. The heterozygosities in the 14 populations and the total sample set 
for each of the 100 sample sets are shown as a scatter plot in Figure 5.1. 
 
Higher gene diversities and heterozygosities have been demonstrated for African 
populations compared to non-African populations (Tishkoff et al., 2009). In the present 
study the non-African gene diversities and heterozygosities were also low compared to 
those observed for the African groups (Table 5.1). Unlike previous findings (Tishkoff et al., 
 246 
2009), however, the Khoe-San populations generally had lower gene diversities than the 
Bantu-speakers.  
 
To evaluate how heterozygosity correlated with the variation observed between the 
datasets a scatter plot was generated with the average heterozygosity for each population 
on the Y-axis and the standard deviation (SD) between the heterozygosities in the different 
datasets on the X-axis. A linear regression was used to find the function that best 
described the relationship between the points (Figure 5.2). The linear regression showed 
that a straight line with a slope of -18.4 best explained the scatter (p= 0.015) (Figure 5.2). 
This suggested that there is a negative relationship between the average heterozygosity 
and heterozygosity differences between the datasets, i.e., the lower the heterozygosity in a 
population the higher the differences in heterozygosities among the different sample sets. 
This indicated that populations with lower heterozygosities (such as the non-African 
populations) might require more loci to accurately determine correct gene diversity and 
heterozygosity estimates. When the AFR, EUR and IND data (which are outliers) are 
omitted, the slope of the regression line was not significant, which might indicate that the 
differences in the standard variation of heterozygosities between different datasets might 
also be explained by the differences in non-African vs. African populations. 
 
Table 5.1  Average proportion of polymorphic loci, heterozygosities and gene diversities in each population 
over the 100 different SNP datasets 
Group N Ave P SD P Ave Gd SD Gd Ave Het SD Het 
XUN 45 0.998 0.007 0.384 0.015 0.383 0.017 
JOH 41 0.968 0.024 0.361 0.019 0.355 0.020 
KWE 19 0.990 0.014 0.407 0.014 0.383 0.018 
GUG 21 0.987 0.015 0.383 0.017 0.397 0.020 
NAM 28 1.000 0.000 0.413 0.013 0.417 0.016 
KAR 25 0.998 0.007 0.397 0.018 0.370 0.018 
COL 22 0.991 0.011 0.417 0.014 0.419 0.014 
CAC 20 1.000 0.000 0.409 0.014 0.391 0.017 
SEB 48 0.996 0.009 0.416 0.014 0.423 0.019 
HER 14 0.998 0.007 0.424 0.012 0.406 0.016 
DRC 14 0.992 0.014 0.416 0.013 0.407 0.019 
AFR 15 0.840 0.046 0.271 0.019 0.256 0.019 
EUR 15 0.795 0.042 0.253 0.020 0.262 0.021 
IND 25 0.873 0.030 0.276 0.019 0.274 0.021 
MEAN 
 0.961 0.007 0.376 0.010 0.371 0.010 
 
Ave - Average 
SD ? Standard deviation 
P - Proportion of polymorphic loci 
Gd - Gene diversity 
Het - Heterozygosity 
 247 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 5.1  Scatter plot of heterozygosities in the 14 populations and the total sample set for each of the 
100 sample sets 
Figure 5.2  Correlation between heterozygosity and 
the variation observed between the 100 datasets 
 248 
5.1.2 STRUCTURE analyses 
 
The averaged results of the STRUCTURE runs for the 100 different SNP sets are shown in 
Figure 5.3 and Table 5.2. The iterations were done for K=1 to K=10. Iterations for K=2 to 
K=5 are shown.  
 
Similar to previous studies (Jakobsson et al., 2008; Li et al., 2008; Tishkoff et al., 2009), 
the clustering at K=2 separated African from non-African populations (Figure 5.3 and Table 
5.2). The first cluster (blue) predominated in the three non-African populations (AFR, EUR, 
IND) while the second cluster (yellow) occurred at highest frequencies in African 
populations. The mixed Coloured populations (CAC, COL) showed a combination of 
African (yellow) and non-African (blue) contribution. Different amounts of non-African 
admixture into the Khoe-San and Coloured populations could be observed at K=2 (Figure 
5.3 and Table 5.2). Representation from more than one cluster can be an indication of 
recent admixture or shared ancestry before divergence. The northern San populations 
(JOH, XUN, GUG, KWE) and Bantu-speakers had very low levels (<10%) from the non-
 African cluster at K=2 (Table 5.2). This non-African cluster contribution in these groups was 
most likely because of shared ancestry rather than admixture. This was also apparent if the 
non-African allocation in the Bantu-speakers was compared to the non-African allocation in 
the northern San groups. Due to a more recent shared ancestry with non-African groups, 
the two Bantu-speaking groups (DRC and SEB) had a contribution of around 6% from the 
non-African cluster while the San groups had a non-African cluster contribution around 3% 
(Table 5.2). The KWE had similar frequencies to the Bantu-speakers rather than the 
northern San groups. Similarly the European group (EUR) also had 2% contribution from 
the African cluster due to shared ancestry while the increased African cluster allocation in 
the Afrikaner group (AFR) was probably due to recent admixture with African groups (Table 
5.2). 
 
The southern Khoe-San and Coloured groups all had more input from the non-African 
cluster compared to the northern San and Bantu-speakers (Figure 5.3 and Table 5.2). This 
was consistent with history and with mtDNA and Y-chromosome results. The group with 
the least amount of admixture (11%) from non-African groups was the Karretjie group 
 249 
(KAR) (Table 5.2). The Colesberg-Coloured group (COL) that resides next to the Karretjie 
people had much higher contributions from the non-African cluster (36%). The Cape 
Coloured group (CAC) had the highest input from the non-African cluster (57%) (Table 
5.2). This was also consistent with history, since the CAC group was sampled at 
Wellington, which is within the region where the original Cape Colony started. It is well 
known that during the starting years of the colony very high incidences of mixed unions 
between colonists and local Khoe-San women occurred due to the shortages of female 
partners. 
 
As K increased (K>2) additional clusters were resolved in the African populations (cluster 
2) while the non-African cluster (cluster 1) remained. At K=3, cluster 2 (yellow) 
predominated in the Khoe-San populations while a third cluster (red) predominated in the 
BS populations. K=3 thus illustrated the amount of gene-flow between Bantu-speakers and 
Khoe-San (Figure 5.3 and Table 5.2). Except for the JOH group, the autosomal results 
supported asymmetric geneflow between the Bantu-speakers and Khoe-San groups with 
more gene-flow from the Bantu-speakers into the Khoe-San than vice-versa.  
 
The isolated status of the Ju\?hoansi group (JOH) was confirmed by autosomal results with 
a far lower contribution from the Bantu-speakers cluster (13%) than any of the other San 
groups (Table 5.2). This finding supported the Y-chromosome and mtDNA results (Figure 
3.3 and 4.1). Following the JOH, the !Xun group (XUN) had the highest contribution from 
the Khoe-San cluster (Table 5.2). The contribution from the Bantu-speaking cluster into the 
XUN was more than double that of the JOH group. As mentioned previously, the !Xun 
adopted pastoralist practices from surrounding Bantu-speaking groups while the Ju\?hoansi 
maintained their hunter-gatherer lifestyle, isolating them from pastoralists groups.  
 
The Karretjie group (KAR) had the third highest contribution from the Khoe-San cluster. 
Only the JOH (85%) and XUN (67%) had larger inputs from the Khoe-San cluster than the 
KAR (55%) (Table 5.2). This finding supported historical records and local opinion that the 
Karretjie people are descendant from the San groups that once lived in the Karoo (See 
section 1.1.1.5.3). Their Coloured neighbours (COL) had a much lower input from the 
Khoe-San cluster (27%) (Table 5.2). Their allocation to the BS-cluster was similar but the 
 250 
non-African contribution represented by the blue cluster was much higher in the COL. The 
remaining Coloured group (CAC) had the largest input from the non-African cluster at K=3, 
while the inputs from the Khoe-San and Bantu-speaking clusters were similar.  
 
In addition to the Ju\?hoansi, !Xun and Karretjie, the Nama (NAM) was the only other group 
where the Khoe-San cluster (49%) had a greater contribution than the other two clusters 
(Table 5.2). The Bantu-speaking component in the Nama was larger than the non-African 
component. This was expected because of the pastoralist culture of the Nama, interaction 
with the pastoralist Bantu-speakers would not have been uncommon. 
 
The /Gui + //Gana + Kgalagari (GUG) had substantial inputs from the Bantu-speaking 
cluster (Figure 5.3 and Table 5.2). The Bantu-speaking cluster contributed marginally more 
than the Khoe-San cluster (Table 5.2). The autosomal results together with the mtDNA 
(Figure 3.3) and Y-chromosome results (Figure 4.1) therefore illustrated extreme gender 
biased gene-flow into this mixed group. Autosomal results indicated approximately equal 
contributions from Khoe-San and Bantu-speakers, while Y-chromosome and mtDNA 
results illustrated that the male lineages was almost exclusively contributed by Bantu-
 speakers and the female lineages exclusively by Khoe-San women.   
 
The Khwe (KWE) had the largest input from the Bantu-speaking cluster of all the Khoe-San 
groups (Figure 5.3 and Table 5.2). This supported previous findings based on the classical 
blood group markers (See section 1.2.2.1). The KWE did, however, have a larger 
contribution (35%) from the Khoe-San cluster compared with the Khoe-San contribution 
into the southern Bantu-speakers (19%). This indicated that the Khwe is not merely a 
Bantu-speaking group that adopted the hunter-gatherer lifestyle and a Khoisan language. 
 
Higher input from the Khoe-San cluster (~18.6) was seen in the southern Bantu-speakers 
(SEB and HER) compared to the central African Bantu-speakers (DRC ? 11.7%) (Table 
5.2). This illustrated the geneflow from resident San groups into the Bantu-speakers when 
they moved into southern Africa. While the mitochondria indicated much higher gene-flow 
from the Khoe-San into the SEB than into the HER (Figure 3.3), autosomal results 
indicated similar frequencies (Table 5.2). This might be an indication that the gene-flow into 
 251 
the Herero (HER) from the Khoe-San was less female biased. The HER, however, also 
had less Khoe-San specific Y-chromosome haplogroups (Figure 4.1). It might be that the 
HER sample size was too small. If not, another cause such as a population bottleneck in 
the Herero, might explain the pattern. There is evidence that the Herero went through a 
recent population bottleneck (Excoffier and Schneider, 1999). The low haplotype diversity 
estimates for both the mtDNA and Y-chromosome results also indicated a possible 
bottleneck. The original Bantu-speakers that moved to the south might have initially 
intermixed with the Khoe-San groups. Thereafter, the Herero went through a reduction in 
population size, which would have caused the loss of many mtDNA and Y-chromosome 
haplotypes. Subsequently, when the population expanded, the Herero did not intermix with 
the Khoe-San again. Thus, many of the male and female lineages were lost but the 
autosomal contribution is still evident. 
 
At K=4, the BS cluster was subdivided into two clusters (3-red and 4-green). The red 
cluster seemed to have lower frequencies than the green cluster in all the Khoe-San and 
Coloured groups (except the KWE). On the contrary in the BS-groups the green cluster has 
higher frequencies than the red cluster and this difference was the largest in the DRC 
group. Higher order clustering (K=5 to K=10) continued to resolve the BS cluster internally 
with no apparent substructure between different populations. 
 
The amount of clusters that received the highest average posterior likelihood score across 
the 100 different SNP sets was K=3. The number of clusters, however, with the best delta 
K score across the runs was K=2 (Table 5.3). Although the SD of the likelihoods of K=2, 
K=3 and K=4 over the 100 different runs did overlap, K=3 received the highest likelihood in 
every single run. 
 
The individual cluster assignments at K=3 were also represented in triangle plot (Figure 
5.4) with the Khoe-San, Non-African and BS associated clusters on the three different 
corners of the triangle. From this plot AFR, EUR and IND clearly clustered at the K=1 
corner while HER, DRC and SEB clustered at the K=3 corner and JOH and XUN at the 
K=2 corner. GUG and KWE were positioned on the side of the triangle that separates K2 
from K3 while COL and CAC was in the middle of the triangle between the three different 
 252 
corners. NAM also occurred in the middle of the triangle but was more clustered towards 
the K2 side. SEB points were more drawn out to the K2 corner than DRC points. 
 
When looking at individual assignments rather than average population assignments 
(Figure 5.3 and 5.4), it became clear that while certain individuals from admixed groups 
clearly resulted from admixture between different populations, other individuals had more 
exclusive cluster assignments. This could especially be seen for certain individuals from 
the KAR, GUG, NAM and to a lesser extent the KWE group, where some of these 
individuals clustered amidst the XUN and JOH individuals in the Khoe-San corner of the 
triangle representation (Figure 5.4). Very few Bantu-speakers clustered towards the Khoe-
 San corner, and the ones that did were southeastern Bantu-speaking individuals. Certain of 
the KWE, GUG and COL and to a lesser extent XUN individuals clustered in the Bantu-
 speaker-corner but only one JOH individual was seen halfway towards the Bantu-speaker 
corner, the other JOH were in the Khoe-San corner (Figure 5.4). None of the CAC 
individuals clustered exclusively in the Bantu-speakers or Khoe-San corner and only a few 
clustered in the non-African corner. Mostly, CAC individuals occurred in the middle of the 
triangle together with some of the COL individuals, illustrating their individual admixed 
status (Figure 5.4). 
 
 
 
 253 
 
 
 
Figure 5.3   Averaged results of the Structure runs of the 100 different SNP sets. K2 to K5 is shown. Individual assignments on the left and population assignments 
on the right. 
 
 
 
K2 
 
 
K3 
 
 
K4 
 
 
K5 
 
                           Individual assignments                                                            Population assignments      
 254 
Table 5.2   Averaged population cluster assignments of the STRUCTURE runs from the 100 different SNP 
sets 
K Pop K1 K2 K3 K4 K5 
2 XUN 0.038 0.962    
2 JOH 0.030 0.970    
2 KWE 0.067 0.933    
2 GUG 0.032 0.969    
2 NAM 0.149 0.851    
2 KAR 0.106 0.894    
2 COL 0.361 0.639    
2 CAC 0.571 0.429    
2 SEB 0.068 0.932    
2 HER 0.138 0.862    
2 DRC 0.060 0.940    
2 AFR 0.964 0.036    
2 EUR 0.979 0.021    
2 IND 0.963 0.037    
3 XUN 0.029 0.674 0.296   
3 JOH 0.022 0.846 0.132   
3 KWE 0.048 0.353 0.598   
3 GUG 0.023 0.478 0.498   
3 NAM 0.127 0.487 0.386   
3 KAR 0.091 0.551 0.358   
3 COL 0.332 0.267 0.402   
3 CAC 0.543 0.216 0.242   
3 SEB 0.042 0.185 0.773   
3 HER 0.097 0.187 0.716   
3 DRC 0.035 0.117 0.848   
3 AFR 0.949 0.027 0.024   
3 EUR 0.968 0.016 0.017   
3 IND 0.946 0.026 0.028   
4 XUN 0.022 0.497 0.222 0.259  
4 JOH 0.016 0.705 0.115 0.164  
4 KWE 0.037 0.222 0.380 0.362  
4 GUG 0.017 0.304 0.330 0.349  
4 NAM 0.105 0.317 0.263 0.315  
4 KAR 0.075 0.377 0.252 0.296  
4 COL 0.311 0.162 0.258 0.269  
4 CAC 0.524 0.142 0.163 0.172  
4 SEB 0.033 0.109 0.472 0.387  
4 HER 0.081 0.112 0.436 0.370  
4 DRC 0.028 0.070 0.517 0.385  
4 AFR 0.942 0.021 0.018 0.019  
4 EUR 0.962 0.012 0.013 0.013  
4 IND 0.938 0.020 0.021 0.021  
5 XUN 0.017 0.375 0.168 0.205 0.168 
5 JOH 0.013 0.565 0.095 0.138 0.095 
5 KWE 0.031 0.171 0.283 0.268 0.283 
5 GUG 0.014 0.227 0.236 0.259 0.236 
5 NAM 0.093 0.224 0.196 0.238 0.196 
5 KAR 0.065 0.273 0.188 0.226 0.188 
5 COL 0.298 0.119 0.193 0.200 0.193 
5 CAC 0.509 0.108 0.126 0.132 0.126 
5 SEB 0.027 0.089 0.367 0.235 0.282 
5 SEB 0.030 0.086 0.369 0.283 0.369 
5 HER 0.072 0.089 0.345 0.271 0.345 
5 DRC 0.024 0.062 0.415 0.281 0.415 
5 AFR 0.934 0.017 0.015 0.016 0.015 
5 EUR 0.955 0.010 0.011 0.012 0.011 
5 IND 0.928 0.016 0.018 0.018 0.018 
 
 
 
 
 255 
Table 5.3  Average likelihood and delta-K scores across the 100 runs 
K Ln Likelihood SD Ln Likelihood Delta-K SD Delta-K 
2 -16629.9 365.044 980.1158 392.2799 
3 -16520.4 362.0634 93.40483 42.02413 
4 -16760 372.3144 2.558441 2.593335 
5 -16940.9 426.0762 2.205267 2.334452 
6 -17078.1 399.9683 1.684401 1.700221 
7 -17185.3 417.852 0.958339 0.81949 
8 -17302.8 431.8609 0.797023 0.721795 
9 -17412.4 428.0192 0.739302 0.599983 
10 -17496.5 440.4873 0.573535 0.490531 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 5.4  Triangle plot of individual cluster assignment at K=3 with the Khoe-San, non-African and BS 
associated clusters on the three different corners of the triangle 
 256 
5.1.3 Variation across STRUCTURE datasets 
 
Cluster assignment of populations and individuals over the 100 datasets differed from each 
other. Figure 5.5 shows a graphical representation of the variation between the population 
cluster assignments of the 100 datasets. Each dot represents a population K-cluster 
assignment from one of the 100 datasets.  
 
At K=2 the datasets correlated relatively well and the dots form tighter clusters compared 
to the higher order cluster assignments. Cluster assignments in the different datasets of the 
non-African and Khoe-San populations were more homogeneous than in the BS and 
Coloured populations. In most populations clusters were well separated but were more 
closely associated in the COL and CAC. 
 
At K=3 the cluster assignments over different datasets was more heterogeneous than for 
K=2. The non-African populations (blue dots) still had more homogenous cluster 
assignments over the different datasets than the African populations (red and yellow). The 
XUN and the JOH had more homogeneous results compared to the other African 
populations. The red (BS-associated cluster) and yellow (Khoe-San associated cluster) 
overlapped in many cases. The yellow cluster was well separated from the lower red 
cluster in the JOH and XUN. In the BS groups the yellow and red clusters were also well 
separated with the red cluster in this case being the highest assignment. In all the other 
populations the red and yellow cluster assignments between the different runs overlapped 
in different extents. This indicated that one might get a wrong picture of a population by just 
looking at one SNP set. However, by averaging across several SNP sets, as was done 
here, (Figure 5.1 and Table 5.1) a much more confident deduction could be made for the 
population cluster assignments. 
 
The higher order clusters K=4 and K=5 continued to show the higher heterogeneity across 
datasets. African clusters as well as the African populations compared to the non-African 
cluster and populations were still more heterogeneous across runs for different datasets. 
 
 257 
Pearson?s correlation coefficient (r) was calculated for each pair of 100 datasets for each K 
of K=3 and all correlations were significant (Supplementary electronic data - File B). 
Pairwise correlations (of individual cluster assignments for each K) between the 100 
different datasets varied between r = 0.60 and r = 0.91 with an average of r = 0.78. 
 258 
 
 
Figure 5.5  Graphical representation of the variation between the population cluster assignments across the 100 runs. Each dot represents a 
population K-cluster assignment of one of the 100 runs.  
 259 
5.1.4 Distance based analysis of unlinked SNP sets 
 
The same 100 SNP datasets used in the STRUCTURE analysis was also used in distance 
based analysis. The 100 population distance matrices were used to construct 100 NJ trees, 
which were then condensed into a consensus NJ tree (Figure 5.6a). The numbers on the 
branches indicate the number of times the particular node was supported after computing 
100 trees. This approach does not take into account the distances between groups in the 
form of differences in branch lengths. 
 
A second approach used was to first condense the distance information from the 100 SNP 
sets into one average population distance matrix (Table 5.4) from which a NJ tree was then 
constructed (Figure 5.6b). This approach does not indicate the number of times branches 
were supported by the 100 different datasets. However, the calculated means of the 100 
distance matrices is represented on the tree by variable branch lengths. 
 
The overall affinities of the populations using both approaches corresponded reasonably 
well. Both types of trees that summarised the distance data from the 100 different sample 
sets clearly divided non-African and African variation with the CAC and COL groups being 
placed at intermediate positions (Figure 5.6a and 5.6b).  Furthermore, Bantu-speaking 
groups (DRC, SEB, HER) and also northern Khoe-San groups (JOH, XUN, GUG, KWE) 
formed monophyletic clades on both trees. The KWE assignment to this cluster, however, 
was weakly supported by the NJ consensus tree (Figure 5.6a). XUN and JOH grouped 
together, with GUG and KWE being placed closer to the BS populations, most likely due to 
admixture from these groups (see Figure 5.3). The influence of the non-African component 
in the KAR and NAM positioned these populations more towards the non-African branch. 
The high Khoe-San input into these two populations apparent from the STRUCTURE 
results could not be deduced from the distance trees.  
 
Trees as representations of distance analysis are very sensitive to the influence of 
admixture in the groups. When data is visually represented in a tree, distances are only 
optimized in one dimension. This was especially apparent in the KAR and NAM group who 
 260 
was heavily influenced by the variation contributed by the non-African group. This caused 
them to group between African and non-African groups and not with the other Khoe-San 
groups (Figure 5.6a and 5.6b). When STRUCTURE results were considered, however, one 
could see that these two groups actually had a larger Khoe-San cluster representation than 
the GUG and KWE (Figure 5.3 and Table 5.2). Yet, the GUG and KWE grouped in a Khoe-
 San cluster with the JOH and XUN on the tree because of their small contribution of non-
 African variation (Figure 5.6a and 5.6b). It is thus useful to utilize techniques, such as PCA, 
that are able to optimize the distance matrix in more than one dimension (Figure 5.7 and 
5.8). To extract additional information from the distance matrices, PCA was performed 
(Figure 5.7) 
 
 
Table 5.4  Average population distance matrix of autosomal genotypic data 
 CAC COL IND KAR KSB KSJR KSNA KSSK KSV SAB SEB SWB WA WE 
CAC 0.000              
COL 0.042 0.000             
IND 0.092 0.157 0.000            
KAR 0.104 0.049 0.273 0.000           
KSB 0.122 0.069 0.293 0.053 0.000          
KSJR 0.158 0.105 0.350 0.046 0.066 0.000         
KSNA 0.085 0.040 0.246 0.021 0.053 0.054 0.000        
KSSK 0.157 0.087 0.345 0.054 0.040 0.057 0.054 0.000       
KSV 0.144 0.083 0.331 0.034 0.047 0.028 0.041 0.037 0.000      
SAB 0.141 0.089 0.314 0.074 0.056 0.108 0.069 0.066 0.078 0.000     
SEB 0.116 0.059 0.290 0.046 0.036 0.074 0.047 0.039 0.049 0.027 0.000    
SWB 0.100 0.050 0.244 0.061 0.051 0.096 0.054 0.067 0.066 0.051 0.035 0.000   
WA 0.107 0.182 0.054 0.289 0.314 0.364 0.259 0.361 0.347 0.334 0.310 0.270 0.000  
WE 0.125 0.195 0.046 0.307 0.336 0.388 0.277 0.382 0.367 0.354 0.329 0.290 0.031 0.000 
 
 
 
 
 
 
 
 
 
 261 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 5.6a  The Majority Rule consensus tree constructed from a 100 NJ trees. A rectangular phylogram 
shows the branch support (indicating the number of times the particular node is supported after computing 
100 trees). The root was placed between CAC and the non-African population. The radial phylogram shows 
the unrooted version of the tree.  
Root 
 262 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 5.6b The consensus tree constructed from the average of 100 distance matrices. For the rectangular 
phylogram the root was placed between CAC and the non-African populations. The radial phylogram shows 
the unrooted version of the tree.  
Root 
 263 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 5.7  A and B ? Principal component analysis of autosomal genotypic distances between different populations in the study group. Component 
1 = 92.6% of the variation, Component 2 = 5.7% of the variation, Component 3 = 1.1% of the variation (Rest of the components < 0.16 each).  
C ? Loadings for Component 1, D ? loadings for Component 2, E ? loadings for Component 3  
A 
B 
0.02568
 -0.1078
 0.2812
 -0.2531
 -0.2779
 -0.3201
 -0.2214
 -0.3284 -0.3173
 -0.2822 -0.278
 -0.2186
 0.3062 0.33
 -0.5
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 0.4
 -0.4137
 -0.3552
 -0.4345
 -0.1917
 -0.1397
 -0.03603
 -0.2259
 -0.06448
 -0.07017
 -0.1508
 -0.1692
 -0.2427
 -0.3819 -0.3743
 -0.7
 -0.6
 -0.5
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.07451
 0.02966
 -0.01001
 0.2797
 -0.1148
 0.5176
 0.2142
 0.0005433
 0.293
 -0.5499
 -0.3011
 -0.3177
 0.069990.05891
 C
 A
 C
 C
 O
 L
 I
 N
 D
 K
 A
 R
 K
 W
 E
 J
 O
 H
 N
 A
 M
 G
 U
 G
 X
 U
 N
 D
 R
 C
 S
 E
 B
 H
 E
 R
 A
 F
 R
 E
 U
 R
 -0.9
 -0.8
 -0.7
 -0.6
 -0.5
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.6
 C 
D 
E 
 264 
As was observed in other studies (Li et al., 2008; Tishkoff et al., 2009) the first PCA 
component (93%) summarised the variation present between African and non-African 
populations (Figure 5.7). This axis separated the African from the non-African populations 
with varying degrees of admixture in the CAC, COL, SWB and NAM. The very high level of 
non-African admixture into the CAC group was seen in the first component (Figure 5.7c) 
and compared well to what was observed in the STRUCTURE result (Figure 5.3 and Table 
5.2). Accordingly the other southern groups (COL, NAM, KAR) and the HER also showed 
high non-African contributions while the JOH, XUN and GUG showed the lowest levels 
(Figure 5.7c). 
 
Interestingly, the second component (Figure 5.7) did not separate the Khoe-San groups 
from the Bantu-speaking groups as was expected based on the STRUCTURE results and 
seen in the mtDNA and Y-chromosome studies (Figure 3.16, 4.19 and 4.20). Rather the 
second component (5.7%) (Figure 5.7d) separated the northern San groups (JOH, GUG, 
XUN) from the southern Khoe-San and Coloured populations (COL, CAC, NAM). 
 
It is only in the third component (1.1%) (Figure 5.7e) that the Bantu-speakers were 
separated from the San. This might indicate a very ancient split between the northern and 
the southern Khoe-San groups. On the extremities of the second component (Figure 5.7d) 
the northern groups (XUN, GUG, JOH) were at the one end and the southern groups 
(CAC, COL) at the other. NAM was also located with the southern groups but more 
towards the northern groups. KAR were placed intermediate between the groups. This is 
interesting since the historically the theory exists that the Karretjie (KAR) are descendant 
from the /Xam San group while the CAC, COL and NAM are expected to have more Khoe 
input (see section 1.1.1.5.3). The following hypothesis was formulated from this: The San 
groups formed an earlier continuum from the northern San groups in the north to the /Xam 
group in the south. The Khoe ancestral groups that contributed to the CAC, COL and NAM 
were originally occupying the southern parts of South Africa in the coastal regions where 
the Cape KhoeKhoe have lived. Upon acquiring the cultural practice of pastoralism from 
central groups such as the ancestors to the ?Khomani, these southern groups expanded 
and moved northwards into the regions occupied by the other hunter-gatherers. To a 
degree they settled and intermixed with the local hunter-gatherer groups. Later, the Nama 
 265 
group moved further northwards and had more recent gene-flow with the northern groups. 
This theory would, however, be very difficult to prove with the disappearance of the cultural 
identities and languages of the southern Khoe and San groups. 
 
The third component (Figure 5.7e) represented the variation between Bantu-speakers and 
Khoe-San groups. At extreme ends of the continuum were the Bantu-speakers from the 
DRC with the lowest amount of Khoe-San input and the Ju\?hoansi (JOH) with the lowest 
amount of Bantu-speaking input. As was seen in the STRUCTURE results, the Herero 
(HER) and southeastern Bantu-speakers (SEB) had similar amounts of Khoe-San input 
(Table 5.2). For the rest of the groups the third component also reflected STRUCTURE 
results with the KWE showing more Bantu-speaking admixture than the other Khoe-San 
groups while the XUN and the KAR showed lower amounts than other groups (Figure 
5.7e). 
 
PCA plots of the separate individuals rather than the group was also constructed to see if 
the individual apportionment of variation corresponds to the composite apportionment 
(Figure 5.8). 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 266 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6 7 8 9
 Component 1
 -4.9
 -4.8
 -4.7
 -4.6
 -4.5
 -4.4
 -4.3
 -4.2
 -4.1
 -4
 -3.9
 -3.8
 -3.7
 -3.6
 -3.5
 -3.4
 -3.3
 -3.2
 C
 o
 m
 p
 o
 n
 e
 n
 t
  
2
 6 7 8 9
 Component 1
 0
 0.1
 C
 o
 m
 p
 o
 n
 e
 n
 t
  
3
 -4.9 -4.8 -4.7 -4.6 -4.5 -4.4 -4.3 -4.2 -4.1 -4 -3.9 -3.8 -3.7 -3.6 -3.5 -3.4 -3.3 -3.2 -3.1
 Component 2
 0
 0.1
 C
 o
 m
 p
 o
 n
 e
 n
 t
  
3
 Figure 5.8  Principal component analysis 
of the average individual distance matrix.  
Component 1 = 59.1% of the variation, 
Component 2 = 13.1% of the variation, 
Component 3 = 2.3% of the variation. 
Individuals are colour coded according to 
the key. 
 267 
For the individual PCA the apportionment of variation to three axes was not as good as in 
the group-wise comparison. This was expected as in the group-wise comparison there will 
be only 14 pairwise comparisons while in the individual comparisons it involves 352 
pairwise comparisons. This increased the multi-dimensional space substantially making the 
reduction into three dimensions more difficult. Overall, the apportionment of variation 
compared well to the group based analysis. The axis that includes most of the variation 
(59%) was again the axis that separates African from non-African populations. Similar 
results were seen for the 650K SNP based study of worldwide variation (Li et al., 2008), 
where the first component (separating African and non-African variation) comprised 56% of 
variation. In the microsatellite and insertion/deletion based study of worldwide variation 
(Tishkoff et al., 2009) the first component only represented 19.5%. The lower 
representation in the first component of the microsatellite versus SNP studies can be 
explained by the higher mutation rate of microsatellites, which would lead to convergence 
of distantly related populations.  
 
Using the PCA plots of individuals one gets a clearer picture of how individual variation is 
apportioned and how closely individuals from the same group cluster together. While the 
first component clearly separated the African from the non-African variation with the 
Coloured groups in-between, the components representing variation within Africa were 
more continuous (Figure 5.8). Similar to group based results, the second component for 
the individual?s data (13.2%) separated northern and southern Khoe-San and Coloured 
groups while the third component (2.3%) separated the Bantu-speakers from the Khoe-San 
(Figure 5.8). Although one can infer that the second component contained the variation 
between northern and southern Khoe-San groups, the change was very continuous and 
more individuals were scattered than observed for the third component. The third 
component therefore showed a better clustering and separation of individuals from the two 
different groupings (Khoe-San and Bantu-speakers). Thus even though the second 
component contained more variation than the third component the second component 
showed more of a continuum. On this axis there was a gradual decrease in northern San 
individuals aligned with a gradual increase of southern Coloured and Khoe individuals. This 
indicated more of a clinal difference between northern and southern Khoe-San groups 
while the difference between Khoe-San and Bantu-speakers was more abrupt. The above 
 268 
explanation might be a part of the reason why STRUCTURE did not assign this second 
variation component as a separate cluster but assigned a cluster for the third component. 
 
To investigate the relationship between the physical geographic distance (km) and genetic 
distance using autosomal SNPs in the Khoe-San and Coloured populations, the composite 
distance matrix of the 100 datasets (Reynolds distance) was compared to a physical 
distance matrix (Appendix C). In Figure 5.9 pairwise comparisons between physical 
geographic distance (X-axis) and genetic distance Y-axis is plotted on graphs. A linear 
regression was done to determine the line with the best fit through the points.  
 
The best fit to the points on the graph was a straight line with a slope of 0.00003057 (p = 
0.0258) (Figure 5.9). A Mantel test also found a correlation between the two distances (r = 
0.421) that were significantly different from correlation between random datasets generated 
through permutation tests (p = 0.0248). The physical distance explained 17.7% of the 
genetic distance. 
 
The clinal distribution of genetic variation of northern versus southern Khoe-San groups 
was thus also illustrated by the correlation of autosomal genetic and physical distance 
(Figure 5.9). The correlation coefficient between physical distance and genetic distance for 
unlinked autosomal SNPs was slightly higher than the correlation found for the mtDNA 
variation (r = 0.402750, p = 0.027, see section 3.4), while the Y-chromosome studies 
indicated no significant correlation between physical and genetic distance for either Rst or 
Fst datasets. For the mtDNA analysis, however, the non-African and Bantu-speaking 
haplotypes were removed from groups, while this was not possible for the genotypic data. 
It is therefore expected that the genotypic correlation of physical versus genetic distance 
would be influenced by the larger input from non-African groups into the southern Khoe-
 San and Coloured groups. 
 
 
 
 
 
 269 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5.1.5 AMOVA analysis 
 
To test the apportionment of variation at different levels of grouping, AMOVA analysis was 
used. The degree of variation was tested firstly between defined groups, secondly between 
the different populations in the study and thirdly within the populations. For the 100 random 
unlinked autosomal datasets generated, 10 datasets were randomly picked and AMOVA 
analysis performed on them. Table 5.5 gives the average results of the 10 sets of AMOVA 
analysis with various different groupings of the first level group. 
 
 
 
 
Figure 5.9  Pairwise comparisons between physical geographic 
distance (X-axis) and autosomal genotypic distance (Y-axis). 
 270 
Table 5.5  Results from autosomal genotypic AMOVA analysis using different groupings on the first level 
 
 
Grouping Grouping of first level [Groups] 
 
Between 
groups 
 
Between 
populations  
within groups 
 Between 
individuals 
within 
populations 
 
A 
[afe, ind] 
[col, kar, cac, nam, joh, xun, gug, kwe, drc, her, seb] 21.00 3.41 75.59 
 
B 
[afr, eur, ind]  
[col, kar, cac, nam, joh, xun, gug, kwe] 
[drc, her, seb] 
12.86 3.01 84.13 
 
 
C 
[ afr, eur, ind]  
[ col, kar, cac, nam]  
[ drc, her, seb] 
[ gug, joh, xun, kwe]  
11.05 2.07 86.88 
 
D 
[col, kar, cac, nam, joh, xun, gug, kwe] 
[drc, her, seb, afr, eur, ind] 3.87 8.88 87.25 
 
E 
[col, kar, cac, nam, joh, xun, gug, kwe]  
[drc, her, seb] 2.22 3.53 94.25 
 
F 
[col, kar, cac, nam] 
[joh, xun, gug, kwe] 2.73 2.85 94.42 
 
G 
[col, kar, cac]  
[nam, joh, xun, gug, kho] 3.10 3.03 93.87 
 
H 
[col, kar, cac] 
[nam]  
[joh, xun, gug, kho] 
2.32 3.08 94.60 
 
 
Most of the variation between groups (21%) were explained between African and non-
 African groups (Grouping A - Table 5.5). When the African groups were split into Bantu-
 speakers and Khoe-San+Coloured the variation contained by the first level grouping falls to 
13% (Grouping B - Table 5.5). The variation on the first level grouping only decreased 
slightly when the northern and southern Khoe-San+Coloured groups were separated 
(Grouping C ? Table 5.5). Variation between Bantu-speakers and Khoe-San+Coloured 
groups was only 2.2% (Grouping E ? Table 5.5) (when the KWE group was omitted it 
increased to 2.4% - data not shown). The variation between the southern Khoe-
 San+Coloured groups and northern Khoe-San groups (2.7%) (Grouping F ? Table 5.5) was 
higher than the variation between Khoe-San+Coloured and Bantu-speaking (Grouping E ? 
Table 5.5) (when the KWE group was omitted it increased to 3%-data not shown). This was 
in support to findings from the PCA. It was only in the cases when non-African groups were 
 271 
included, however, that the group-based variation was more than the variation between 
individual populations (Grouping A, B, C ? Table 5.5). 
 
The last two rows in the table shows the classification used today, namely, the division 
between Coloured and Khoe-San (Grouping G ? Table 5.5) and the division between Khoe, 
San and Coloured (Grouping H ? Table 5.5). In both cases the variation between individual 
populations was almost equal or greater than the variation between groups. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 272 
5.2 Results and discussion (Haplotypes) 
 
For haplotype analysis of autosomal data, five linked SNPs on the same haploblock was 
used to infer 44 short haplotypes consisting of 5 bp each as described in section 2.4. The 
haplotypes were inferred separately for each population and each SNP set of 5. The full list 
of inferred haplotypes and their frequencies in the different populations is available in 
Supplementary Electronic Data ? File C. 
 
5.2.1 Inferred haplotypes 
 
The inferred haplotypes for the 44 different loci yielded different results. There were 
differences in the number of haplotypes per locus, population frequencies and structuring 
between different populations. A selection of eight haplotype loci with their inferred 
haplotypes and their frequencies in each of the 14 populations is shown in Figure 5.10. The 
full set of bar charts of all 44 loci is included in Appendix H.  
 
The number of haplotypes per loci varied from five haplotypes (04-01 in Figure 5.10) to 29 
(14-01 in Figure 5.10). In most of the loci, a clear difference in population frequencies could 
be seen while only few loci failed to show structuring (19-02 in Figure 5.10). The 
frequencies of representing haplotypes between the African and non-African populations 
differed in most loci. The non-African populations tended to have smaller subsets of the 
African haplotypes but one or two haplotypes were predominant in frequency. Some 
haplotypes showed clear differences between BS and Khoe-San populations (e.g. purple in 
13-02, pink in 01-02, yellow in 05-01 in Figure 5.10). These differences, however, were not 
as pronounced as the differences between African and non-African populations. 
 
The frequency distributions of inferred haplotypes thus clearly illustrated higher African 
haplotype diversities and that non-African variation represents a subset of African variation 
(Figure 5.10). This finding corroborate other studies confirming the out of Africa hypothesis 
(Bowcock et al., 1987; Nei and Livshits, 1989; Bowcock et al., 1991a; Bowcock et al., 
1991b; Bowcock et al., 1994; Tishkoff et al., 2009). 
 
 273 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
04-01 14-01 
01-02 17-02 
13-02 05-01 
06-02 19-02 
Figure 5.10  Bar charts of inferred haplotypes and their frequencies in each of the 14 populations. 
 274 
5.2.2 Distance analysis 
 
To consolidate the information across the 44 separate haplotype loci, the 88 haplotypes 
generated for each individual were concatenated into two haplotypes per individual. 
Individuals with >50% missing data at any locus were excluded from further analysis. 
Following removal of missing data, 298 individuals were retained in the data yielding 596 
haplotypes. The individual haplotypes were then used to construct distance matrices. Both 
population and individual distance matrices were constructed (the population distance 
matrix are shown in Table 5.6 and the individual distance matrix is included in 
Supplementary Electronic Data File D). These distance matrices were then used for PCA 
for the population matrix (Figure 5.11) and the individual matrix (Figure 5.12). 
 
The concatenation of the different short haplotypes into one long haplotype resulted in high 
diversities between the individual haplotypes. Since some of the loci were very polymorphic 
and contained many different haplotypes, the combination of several such loci led to high 
haplotype diversities. By concatenating haplotypes in individuals led to 594 unique 
haplotypes in the total of 596 haplotypes.  
 
 
Table 5.6  Maximum composite likelihood population distances of individual haplotypes 
 AFR CAC COL DRC EUR GUG HER IND JOH KAR KWE NAM SEB XUN 
AFR 0.000              
CAC 0.599 0.000             
COL 0.737 0.764 0.000            
DRC 1.095 0.927 0.845 0.000           
EUR 0.377 0.612 0.719 1.044 0.000          
GUG 1.094 0.893 0.786 0.745 1.084 0.000         
HER 0.932 0.851 0.781 0.765 0.912 0.726 0.000        
IND 0.386 0.597 0.709 1.031 0.405 1.048 0.880 0.000       
JOH 1.014 0.828 0.762 0.768 0.973 0.634 0.730 0.969 0.000      
KAR 0.904 0.821 0.748 0.779 0.870 0.685 0.746 0.886 0.631 0.000     
KWE 1.005 0.867 0.792 0.753 0.987 0.676 0.735 0.952 0.678 0.715 0.000    
NAM 0.861 0.818 0.772 0.819 0.832 0.729 0.781 0.845 0.688 0.704 0.759 0.000   
SEB 1.032 0.890 0.817 0.759 0.991 0.716 0.756 0.986 0.726 0.744 0.749 0.786 0.000  
XUN 1.035 0.868 0.780 0.761 1.000 0.648 0.731 1.011 0.600 0.653 0.700 0.705 0.733 0.000 
 
 
 
 275 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 5.11  A ? Principal Component Analysis of autosomal haplotype distance values between different populations in the study group.  
Component 1 = 54.6% of the variation, Component 2 = 7.1% of the variation. B ? Loadings for Component 1, C ? loadings for Component 2. 
A 
-0.4498
 -0.2227
 -0.06066
 0.235
 -0.4218
 0.2986
 0.1314
 -0.4126
 0.2354
 0.144
 0.2101
 0.07946
 0.2081
 0.2546
 A
 F
 R
 C
 A
 C
 C
 O
 L
 D
 R
 C
 E
 U
 R
 G
 U
 G
 H
 E
 R
 I
 N
 D
 J
 O
 H
 K
 A
 R
 K
 W
 E
 N
 A
 M
 S
 E
 B
 X
 U
 N
 -0.9
 -0.8
 -0.7
 -0.6
 -0.5
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 L
 o
 a
 d
 i
 n
 g
 -0.07752
 -0.09397
 -0.1533
 0.5485
 -0.07457
 -0.1473
 0.1711
 -0.02782
 -0.3895-0.3716
 -0.01925
 -0.3885
 0.2327
 -0.3289
 A
 F
 R
 C
 A
 C
 C
 O
 L
 D
 R
 C
 E
 U
 R
 G
 U
 G
 H
 E
 R
 I
 N
 D
 J
 O
 H
 K
 A
 R
 K
 W
 E
 N
 A
 M
 S
 E
 B
 X
 U
 N
 -0.9
 -0.8
 -0.7
 -0.6
 -0.5
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.6
 0.7
 L
 o
 a
 d
 i
 n
 g
 B 
C 
 276 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 5.12  Principal Component Analysis of autosomal haplotype distance values between different 
individuals in the study group. Component 1 = 47.85% of the variation, Component 2 = 27.21% of the 
variation, Component 3 = 3.5% of the variation (remaining components contains < 1.4% of the variation each) 
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
 Comp 1
 -11
 -10
 -9
 -8
 -7
 -6
 -5
 -4
 -3
 -2
 -1
 0
 1
 2
 3
 Co
 m
 p 
2
 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3
 Comp 2
 14
 15
 16
 17
 18
 19
 20
 Co
 m
 p 
3
 277 
Group based PCA based on haplotypes again assigned the largest part of variation 
(PC1=55%) to the African ? non-African division (Figure 5.11). Similar to genotypic results 
the first component illustrated more non-African admixture into the Coloured groups and a 
relative small non-African component in the northern San groups (Figure 5.11b). Contrary 
to the genotypic PCA, the PCA based on haplotypes assigned the second component (7%) 
to the division between Khoe-San groups and Bantu-speakers and not between northern 
and southern Khoe-San groups. The JOH, NAM, KAR and XUN were separated from the 
SEB, DRC and HER with the other groups located in between. The remaining components 
were not informative and also did not differentiate between northern and southern Khoe-
 San groups (component 3 to 6 contained between 3% and 4% variation, component 7 to 9 
contained 2-3% variation and component 11-13, 1 to 2% variation).  
 
Similar results were obtained through the individual based PCA (Figure 5.12). The first axis 
of the PCA plot, however, separated the different inferred haplotypes in each individual 
from one another. This was however an artefact resulting from the methodology employed 
when haplotypes were inferred and thereafter concatenated into one haplotype. When 
haplotypes were inferred the short haplotypes of 5 bp were sorted alphabetically in each 
individual by the software program used. For instance; individual 1 would have two 
haplotypes at a certain locus that would be sorted first AACCC and then AAGCC, for 
individual 2 it, the haplotypes would be sorted AACCT and then AAGCC. Thus haplotype 1 
of individual 1 and 2 and haplotype 2 of individual 1 and 2 would group together. This bias 
was then reflected when haplotypes are concatenated. In a population comparison this 
effect would be neutralized. 
 
Thus in the PCA plot in Figure 5.12 only the second and third axis contained useful 
information. On the second axis non-African individuals were separated from African 
individuals with CAC and COL in-between. The third axis separated the BS and the Khoe-
 San individuals. The rest of the axes contained little variation with each representing <1.4% 
of the variation. The individual PCA plot was useful to observe the variation in each 
individual. For populations such as GUG, the population as a whole did not associate that 
strongly with the other Khoe-San groups, however, there were specific GUG individuals 
that did group with the Khoe-San individuals in the individual PCA plot. 
 278 
 
To explain the difference between the genotype and haplotype based PCA the following 
hypothesis is proposed. It might be that there was more continuous gene-flow over 
thousands of years between the different Khoe-San groups leading to a clinal distribution of 
genetic variation with a distance based trend. On the contrary, the Bantu-speaking and 
Khoe-San gene-pools were isolated for many years before recent admixture. The older 
continuous gene flow between Khoe-San groups may have broken up many more 
haplotypes than the recent admixture by between Bantu-speakers and Khoe-San. Thus by 
in inferring and concatenating haplotypes the genotypic signature of the distance based 
cline between Khoe-San groups were erased. Conversely, many of the haplotypes 
remained intact when comparing haplotypic variation between Bantu-speakers and Khoe-
 San. 
 
To alleviate the problem the most common haplotype for each population were selected 
and used as a population representative haplotype. In this approach only the haplotypes 
with the highest frequency in each specific population at each of the 44 loci were selected. 
This was then taken as the 44 representing short haplotypes from each population. The 44 
representative short haplotypes for each population were then concatenated into one 
sequence (long haplotype) for each population. These 14 population representative 
sequences were then used to construct a distance matrix (Table 5.7). The distance matrix 
was used to do PCA (Figure 5.13) and cluster analysis (Figure 5.14). This will partially 
overcome the effect of recent admixture between the groups and level out the difference 
between the recent and ancient admixture. When this was done, a signature of divergence 
between the northern and southern Khoe-San groups again emerged (Figure 5.13). 
 
 
 
 
 
 
 
 
 279 
Table 5.7  Maximum composite likelihood population distances of population representative haplotypes 
 AFR CAC COL DRC EUR GUG HER IND JOH KAR KWE NAM SEB XUN 
AFR 0.000              
CAC 0.132 0.000             
COL 0.454 0.317 0.000            
DRC 1.268 1.152 0.750 0.000           
EUR 0.093 0.184 0.419 1.626 0.000          
GUG 1.323 1.230 0.790 0.402 1.706 0.000         
HER 1.067 0.940 0.606 0.209 1.185 0.437 0.000        
IND 0.070 0.108 0.454 1.455 0.071 1.421 0.986 0.000       
JOH 1.269 0.963 0.700 0.538 1.239 0.318 0.517 1.202 0.000      
KAR 0.845 0.770 0.427 0.560 1.004 0.277 0.535 0.949 0.323 0.000     
KWE 1.384 1.091 0.814 0.392 1.574 0.262 0.393 1.293 0.430 0.517 0.000    
NAM 0.741 0.712 0.427 0.542 0.781 0.361 0.517 0.858 0.366 0.195 0.440 0.000   
SEB 1.177 1.093 0.721 0.183 1.490 0.235 0.258 1.264 0.397 0.488 0.247 0.463 0.000  
XUN 1.322 1.108 0.756 0.442 1.322 0.247 0.375 1.257 0.205 0.232 0.329 0.320 0.290 0.000 
 
 
PCA again dedicated the first component to non-African versus African variation (91%) and 
although both the second and third components were dedicated to Khoe-San versus 
Bantu-speaking variation, a differentiation between northern and southern Khoe-San 
groups could be made (Figure 5.13). While the second component (5%) maximized the 
variation component between Bantu-speakers and southern Khoe-San groups (KAR, NAM, 
COL) the third component (2%) maximized the variation component between Bantu-
 speakers and northern Khoe-San groups (XUN, JOH) (Figure 5.13). This indicated that the 
southern and northern Khoe-San groups? genetic variation were different and had to be 
optimized against Bantu-speaking variation in different dimensions. Other components 
contained less than 0.7% of the variation each. 
 
The cluster analysis (Figure 5.14) reflected results from PCA plots. Non-African 
populations were separated from African populations and the non-African admixture in the 
CAC and COL caused them to group with the non-African group. The three BS groups 
grouped together and the KWE and GUG grouped together on an adjacent branch. The 
KAR grouped with the Khoe-San groups (NAM, JOH, XUN) on one branch. Furthermore 
the two northern San groups JOH and XUN grouped together while the two southern 
groups (KAR and NAM) grouped together. 
 
 280 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Figure 5.13  A and B ? Principal Component Analysis of autosomal representative haplotype distance values between different populations in the 
study group. Component 1 = 90.9% of the variation, Component 2 = 4.8% of the variation, Component 3 = 2.2% of the variation (remaining components 
contains < 1.4% of the variation each). C ? Loadings for Component 1, D ? loadings for Component 2, E ? loadings for Component 3 
A 
-0 .3241
 -0 .2696
 -0 .09492
 0.2928
 -0 .3802
 0.3266
 0.1948
 -0 .3356
 0.2305
 0.1404
 0.2924
 0.1025
 0.2839 0.2761
 -0 .5
 -0 .4
 -0 .3
 -0 .2
 -0 .1
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.6
 L
 o
 a
 d
 i
 n
 g
 -0.1822
 -0.2091
 -0.3492
 0.03766
 -0.3512
 -0.2452
 0.04573
 -0.1639
 -0.322
 -0.4824
 -0.05925
 -0.4087
 0.008608
 -0.2833
 -0.5
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 0.4
 L
 o
 a
 d
 i
 n
 g
 -0.2289
 -0.1867
 -0.3213
 -0.563
 -0.1053
 0.008627
 -0.4842
 -0.1612
 0.2577
 0.1145
 -0.1136
 0.07374
 -0.3099
 0.1644
 A
 F
 R
 C
 A
 C
 C
 O
 L
 D
 R
 C
 E
 U
 R
 G
 U
 G
 H
 E
 R
 I
 N
 D
 J
 O
 H
 K
 A
 R
 K
 W
 E
 N
 A
 M
 S
 E
 B
 X
 U
 N
 -0.9
 -0.8
 -0.7
 -0.6
 -0.5
 -0.4
 -0.3
 -0.2
 -0.1
 0
 0.1
 0.2
 0.3
 0.4
 L
 o
 a
 d
 i
 n
 g
 C 
D 
E 
B 
 281 
 
 
 
 
 
 
 
 
 
 
 
 
 
The 44 separate haplotypes that were concatenated into one haplotype would have 
different evolutionary histories, and a single unique tree would not best characterize the 
phylogenetic representation of the haplotype. To overcome this problem an approach was 
followed where the data was not forced into a single tree, rather a Neighbour-Net splits 
decomposition tree was compiled (Figure 5.15). This method gave a good indication of how 
tree-like the dataset was.  
 
The splits decomposition network clearly showed that there were several trees that 
explained the relationships between the representative composite haplotypes of the 
different populations. Although, if only trees were used, which have 95% confidence, the 
network was reduced to only few reticulations, mainly at the base of the branches 
supporting BS groups. The African and non-African variation was the most pronounced 
with the admixed Coloured groups in-between. Furthermore the Bantu-speakers grouped 
together and the KWE and GUG grouped with them because of the high amounts of 
admixture. For the remaining Khoe-San groups, the XUN and JOH grouped together, while 
the NAM and KAR were located more towards the non-African side of the network due to 
the higher amount of admixture. There were, however, evidence in the reticulations that 
there were trees that group the NAM and KAR together and also the GUG and KWE with 
the JOH and XUN.  
JO
 H
 XU
 N
 KA
 R
 N
 AM
 D
 R
 C
 SE
 B
 H
 ER
 G
 U
 G
 KW
 E
 EU
 R
 IN
 D
 AF
 R
 C
 AC
 C
 OL
 Figure 5.14  Cluster analysis tree illustrating autosomal representative haplotype 
distance values between different populations in the study group. 
 282 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 5.15  A - Splits decomposition network showing the different trees that explain the relationships 
between the representative composite haplotypes of the different populations.  
B ? Network resulting when using only trees with 95% confidence. 
 283 
The relationship of physical distance and genetic distance using the autosomal inferred 
haplotypes in the Khoe-San and Coloured populations was tested by comparing the 
genetic distance matrices of both approaches described above to a physical geographic 
distance matrix. Pairwise comparisons between physical geographic distance (X-axis) and 
genetic distance based on the individual inferred haplotypes (Figure 5.16 - A) and genetic 
distance based on the top frequency population representative haplotypes (Figure 5.16 - B) 
on the Y-axis were plotted on graphs. A linear regression was performed to determine the 
line with the best fit through the points on both plots.  
 
The best fit to the points on the individual haplotypes graph was a straight line with a slope 
of 0.000057 (p = 0.0404) (Figure 6.8 - A). A Mantel test also found a correlation between 
the physical and individual based genetic distance (r = 0.390) that were significantly 
different from correlation between random datasets generated through permutation tests (p 
= 0.0215). The physical distance explained 15.2% of the genetic distance. 
 
For the top frequency haplotypes the line that fits the points best was a straight line with a 
slope of 0.00029 (p = 0.0072). The Mantel test also found a significant correlation (p = 
0.0124) between the two distance matrices (r = 0.497). In this case the physical distance 
explained 24.7 % of the genetic distance 
 
A reduction of the distance based cline seen in the individual based haplotypes versus the 
population representative haplotypes could also be seen in the comparison of physical 
versus genetic distance (Figure 5.16) and Mantel test results. There was a stronger 
correlation between genetic and physical distance (indicating a distance-based cline) in the 
population representative haplotypes than the individual haplotypes. This is, as explained 
previously, because the most common haplotypes in a population will not be affected as 
much by recent admixture.  
 
 
 
 
 
 284 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 5.16  Pairwise comparisons between physical geographic distance (X-axis) and autosomal haplotype genetic distance (Y-axis).  
A ? Using individual haplotypes in genetic distance.  
B ? Using top frequencies representative haplotypes. 
B A 
 285 
5.3 Summary of autosomal results 
 
STRUCTURE results illustrated different amounts of non-African and Bantu-speaking 
admixture into the various Khoe-San and Coloured populations. Results supported low 
levels of contribution from non-Africans to the northern San populations (Ju\?hoansi, !Xun, 
|Gui + ||Gana, Khwe) and Bantu-speakers. Conversely, the southern Khoe-San and 
Coloured groups showed evidence of higher non-African admixture. This is consistent with 
history and with mitochondrial and Y-chromosome results. Furthermore, the southern 
Bantu-speakers had higher Khoe-San admixture compared to the central African Bantu-
 speakers, which illustrates the geneflow from resident San groups into the Bantu-speakers 
when they moved into southern Africa. Excluding the Ju\?hoansi, asymmetric geneflow 
between the Bantu-speakers and Khoe-San groups were observed with more gene-flow 
from the Bantu-speakers into the Khoe-San than vice-versa. In support of previous findings 
based on the classical blood group markers, the Khwe had the highest Bantu-speaker 
admixture of all the Khoe-San groups. Yet, the Khwe showed a much larger contribution 
from the Khoe-San compared to the Khoe-San component seen in Bantu-speakers, 
indicating again that the Khwe is not merely a Bantu-speaking group that adopted the 
hunter-gatherer lifestyle and a Khoisan language. 
 
While STRUCTURE results could not illustrate a divide between northern and southern 
Khoe-San groups, the ability of PCA to optimize and reduce different distance components 
to minimal dimensions illustrated a north-south divide in the Khoe-San. For the genotypic 
analysis the component summarising the distance between northern and southern groups 
was in fact larger than the distance component summarising the variation between the 
Khoe-San and Bantu-speakers. When using population representing haplotypes in the 
haplotype analyses, the variation between BS vs. northern Khoe-San and BS vs. southern 
Khoe-San were optimised on different components; confirming the north-south 
differentiation within the Khoe-San. This north-south divide was also illustrated by the 
strong association that exist between geographic distance and genetic distance in both the 
genotypic and haplotypic analyses. 
 
 286 
6. GENERAL DISCUSSION 
 
Having had the opportunity to examine three different types of data (mtDNA, Y-
 chromosome DNA and autosomal DNA) in some Khoe-San and Coloured populations from 
southern Africa in conjunction with other sub-Saharan African populations using various 
analystical methods, it is now possible to address some of the specific objectives raised in 
section 1.3. 
 
The genetic affinities within and between Khoe-San 
Linguistic groupings have been used widely to classify the different Khoe-San groups. 
These studies have suggested that the the Ju, Tuu and Khoe speakers ought to be 
assigned to three different linguistic families (Table 1.1). These linguistic families are either 
unrelated or have genealogical relationships that can be traced back in excess of 10 000 
years (G?ldemann, In Press). The question arising from these linguistic assignments is 
whether these observations could be corroborated from genetic data. Serogenetic studies 
conducted by Jenkins and colleagues (Jenkins et al., 1971; Jenkins, 1986) did not find 
unambiguous correlations between linguistic groupings and genetic clusters (Figure 1.3). 
 
Genetic studies to date that have included ?Khoisan? groups have been based on a few 
groups, notably, the two Ju-speaking groups, the Ju\?hoansi and the !Xun, and one 
Kalahari Khoe group, namely, the Khwe. By including more groups, even though the 
sample sizes have not been the best in some populations, we found a clinal difference 
between northern and southern Khoe-San groups in the present study (Figures 3.18, 5.9 
and 5.16). The Nama Khoe group has a similar genetic signature to the southern Khoe-San 
and Coloured groups (Figures 3.16, 3.17, 3.21, 3.22, 4.21, 4.22, 4.26, 5.7, 5.13, 5.14). 
Haplogroup frequencies differ between northern and southern groups. In both the mtDNA 
and Y-chromosome studies the northern groups contain haplogroups that are exclusive to 
them (Figures 3.7, 3.8, 3.11, 4.1, 4.2, 4.3). It is probable that the northern groups had 
gene-flow with other ancient hunter-gatherer groups north of them that introduced genetic 
material to them that are not found in the southern groups. Thus the L0k mtDNA 
haplogroups that was previously defined as a Khoe-San haplogroup are not present in the 
southern groups. Similarly previously Khoe-San associated Y-chromosome haplogroups A-
 287 
M14 and B-M112 mostly occur in northern groups. The pan-Khoe-San associated 
haplogroups L0d for the mitochondria and A-M51 for the Y-chromosome, have a larger 
diversity in the southern groups. 
 
Genetic studies based on autosomal and mitochondrial DNA thus did find a difference 
between Ju speakers and descendants of Tuu speakers. Also, a greater genetic diversity 
was seen in the pan-Khoe-San associated haplogroups of the Tuu speaker descendants. 
This mirrors the linguistic profile where the Tuu languages were historically more diverse 
than the Ju languages. The KhoeKhoe speakers (Nama) clustered with southern Tuu 
groups, the Kalahari Khoe group (/Gui + //Gana + Kgalagari) clustered with northern 
groups, while the Khwe (Kalahari Khoe) have some similarity to northern groups but seem 
to have a unique genetic profile aside from its Bantu-speaking admixture. It therefore 
seems that the emerging genetic profile reflects the deep division between the Ju and Tuu 
speakers but that the Khoe language group was introduced later on to some of the Ju and 
Tuu speakers with some gene flow.  
 
However, to more conclusively establish the genetic relationships between the different 
linguistically classified Khoe-San groups, bigger sample sizes and the inclusion of 
additional groups such as the !X?? (a Tuu speaking group), and more representation from 
the Kalahari Khoe group, such as the Naro, the Shua and Tshua and a less admixed group 
of /Gui and //Gana are needed.  
 
The relationsip between geographic and genetic distance in Khoe-San  groups 
Previous studies found that while in food producers the gene flow between groups was 
female biased because of patrilocality, hunter-gatherer populations had a male biased 
gene-flow (Seielstad et al., 1998; Hammer et al., 2001a; Destro-Bisol et al., 2004; Wood et 
al., 2005). This was observed through the stronger association of geographic distance with 
mtDNA genetic distance than with  Y-chromosome genetic distance. 
 
The present study found similar results to previous studies (Figures 3.18 and 4.23). Khoe-
 San hunter-gatherer populations had a significant correlation between mtDNA genetic 
distance and geographic distance, while there was no correlation between Y-chromosome 
 288 
genetic distance and geographic distance. These results indicates that male movement 
between groups in the Khoe-San is more prominent than female movement. 
 
The genetic affinities of the Khwe population 
Because the Khwe phenotypically resemble Bantu-speakers but speak a Khoisan language 
it was not certain whether this group genetically resemble Khoe-San groups. Theories put 
forward was that the Khwe are Khoe-San groups with extensive Bantu-speaking admixture, 
Bantu-speakers that lost their cattle and language, another pastoralist population closely 
related to Bantu-speakers who occupied the region before the Bantu expansions or a 
mixture of various refugee groups driven from the grazing grounds into the Okovango 
swamps (Cashdan, 1986). Serogenetic studies supported the theory that the Khwe are 
Bantu-speakers that lost their cattle. Published mtDNA studies showed high amounts of 
Bantu-speaking admixture (Chen et al., 2000; Tishkoff et al., 2007). It, however, also 
showed appreciable frequencies of northern San associated haplogroups, L0d and L0k. 
Henn et al., theorized that the Khwe is a descendant group of the east African pastoralists 
that introduced sheep into southern Africa (Henn et al., 2008). 
 
Autosomal results from the present study also support high amounts of Bantu-speaking 
admixture into the Khwe (Figure 5.3 and Table 5.2). The Khwe, however, also contain a 
large proportion of Khoe-San genetic variation. This Khoe-San genetic component is much 
larger than the Khoe-San genetic component introduced into other southern Bantu-
 speaking groups. Their Y-chromosome genetic profile contains high amounts of the east 
African pastoralist associated marker, supporting the study of Henn et al., (Henn et al., 
2008). Besides the Bantu-speaking associated haplogroups, their mtDNA profile contains 
primarily haplogroup L0k1 and also a newly identified haplogroup L0dx. From network 
analysis it was deduced that the L0k1 haplogroup was introduced to the northern San 
groups by the Khwe, while the L0dx haplogroup more likely were transferred from the !Xun 
to the Khwe. The L0k1 haplogroup is exclusive to the northern San groups. If the Khwe 
introduced the L0k1 haplogroup into the northern groups it will be interesting to see if any 
other African hunter-gatherer group contain the L0k1 group. Thus far L0k1 was not found in 
the Pygmy, Hadza and Sandawe groups. A related haplogroup L0k2 was, however, 
identified in an individual from Yemen (Behar et al., 2008). This suggests that the L0k 
 289 
haplogroups might have had an extensive spread in prehistoric Africa but remnants of the 
haplogroup in other populations have been lost due to drift or has not been detected due to 
insufficient sampling. 
 
It is therefore likely that the Khwe came from a location north from the traditional San 
territory and introduced new mtDNA and Y-chromosome haplogroups into the San groups. 
 
The eastern Khoe-speaking San groups, Tshua and Shua, phenotypically resemble the 
Khwe and it would be interesting to include them in future genetic studies. Groups that 
occupied the region between east and southern Africa before the Bantu-expansions might 
be related to the Khwe group. It will therefore also be interesting to include groups such as 
the Ba-Twa Pygmy group in future genetic studies. Furthermore, comparing east African 
pastoralist groups containing high frequencies of Y-chromosome haplogroup E-M35 to the 
Khwe will also be crucial towards pin-pointing their origin. 
 
The spread of pastoralism in southern Africa 
Henn et al., suggested that pastoralism was introduced ~2 000 years BP by a group from 
east Africa to the northern Botswana area (Henn et al., 2008). This group was possibly 
ancestral to the present day Khwe group since the E-M293 marker associated with the 
introduction of pastoralism occurs in high frequencies in the Khwe. The Hadza and 
Sandawe group of east Africa also carry this E-M293 marker. Without representation of 
more Khoe-San groups in their study, Henn et al., could not address the question of how 
pastoralism spread after it reached the northern Botswana area (Henn et al., 2008). 
 
The Henn et al., (Henn et al., 2008) study was published after the laboratory work for this 
thesis was completed and therefore the E-M293 marker was not typed. However, analysis 
of the E-M35* (DYS389I-10) that most likely is the equivalent of E-M293 were performed. 
The results showed that it is not likely that the spread of pastoralism was a clear-cut demic 
or cultural diffusion towards the south. Rather some E-M35* (DYS389I-10) male individuals 
integrated in the southern tribes and took with them the pastoralist practice and likely also 
their Khoe-language. The southern San groups that adopted the pastoralist culture and 
Khoe language had population expansions and became the Khoe (KhoeKhoe speakers). 
 290 
This theory is supported by the Y-chromosome and mtDNA profile of the representative 
Khoe group, the Nama. Although the Nama do contain high proportions of E-M35* 
(DYS389I-10) they still retained a larger proportion of original Khoe-San haplogroup A.  
Furthermore, their mtDNA and remaining Y-chromosome haplogroup profile is similar to 
the other southern Khoe-San and Coloured groups. 
 
The present study also identified another E-M35* profile that most likely does not contain 
the E-M293 marker but possibly also arrived with the group from east Africa. It is unlikely 
that only one haplotype would have migrated south and Henn et al., admits that it is 
possible that other male individuals who did not carry M293 were also involved (Henn et 
al., 2008). Fewer !Xun and Khwe individuals carry this E-M35* profile and this profile did 
not spread to the southern groups.   
 
A demic diffusion of a few male individuals coupled to cultural diffusion would also explain 
why there is no ceramic stylistic chain in the archaeological record, which reflects the 
spread of pastoralism by a Khoe group (Sadr, 1998). Since only male individuals dispersed 
the ceramic styles would not accompany the pastoralist tradition. Furthermore, a previous 
study on rock paintings suggested a similar hypothesis (Kinahan, 1995). This theory is 
based on paintings that indicated male figures that are distinct from the traditional San 
monochrome trance scenes. They were identified as specialist shamans with higher status. 
The hypothesis put forward was that these individuals acquired higher status through the 
acquisition of sheep (Kinahan, 1995). It could also have been that these figures were the 
immigrant males from east Africa or descendants from them. Due to their high status in the 
communities they could have transferred their language to the resident San groups as well.  
 
Future research 
The typing of the E-M293 in the present study group is crucial. Furthermore genetic 
characterization of the eastern Khoe-speaking San groups, the Tshua and Shua, of eastern 
Botswana is important since they phenotipically resembles the Khwe and also might be 
descendants of the east African pastoralists. Moreover, one of the eastern Khoe-speaking 
San groups, the Hietshware, is the linguistic link to the extinct language, Kwadi, which in 
turn links to the east African Sandawe language. An E-M293 characterization in the 
 291 
remnant hunter-gatherer groups intermediate to the Khwe and the east African groups will 
also be interesting. Especially the Ba-Twa Pygmy group might harbor interesting genetic 
commonalities to both east African and the Khoe-San groups. 
 
Do genetic data support population expansions as suggested in the archaeological 
record? 
Archaeological records indicate that certain sites showed increases in population sizes but 
only for truncated periods during the MSA to LSA transition (30 000 ? 20 000 years BP). 
Around the LGM (~18 000 years BP) population contractions and localized extinctions are 
recorded. Population densities only increased noticeably from 13 500 years BP after the 
LGM. Population increases are recorded especially in the last 4 000 years with various 
technological innovations. Pastoralism was introduced 2 000 years BP and gave rise to 
further population increases (Deacon and Deacon, 1999; Mitchell, 2002). 
 
MtDNA genetic data is important to see if local population history corresponds to 
paleoenvironmental history since female associated markers are more sedentary than 
male associated markers for hunter-gatherer communities. By looking at expansion signals 
in the mtDNA genetic data through various methods, evidence of the expansions recorded 
in the archaeological record was found. Different mtDNA haplogroups have different 
associated expansion signals coupled to different geographic distributions. The localized 
population increases of the MSA to LSA transition was seen for one haplogroup.  Another 
haplogroup show strong signals of the post LGM population increase. None of the 
haplogroups showed signals of the population contractions associated with the LGM. 
Several haplogroups showed further increases in the last 4 000 years. All haplogroups, 
except one, reacted with a steep population increase upon the introduction of pastoralism. 
 
Many theories in the anthropological and archaeological field hypothesizes that in-moving 
pastoralists adversely affected hunter-gatherers. The pastoralists occupy resources and 
marginalize hunter-gatherers. This hypothesis was also used previously to explain 
mismatch distributions in hunter-gatherers. It was theorized that populations that did not go 
through the Neolithic transition, experienced reduction of effective population sizes 
because of competing Neolithic farmers that caused fragmentation of the hunter-gatherer 
 292 
habitat, These reductions in population size obscured previous population expansion 
signals (Excoffier and Schneider, 1999). 
 
Most L0d sub-haplogroups showed recent expansion signals associated with the 
introduction of pastoralism. Only one haplogroup present in low frequency (in both the 
Khoe and San groups) experienced a population contraction. This contraction is most likely 
due to drift effects coupled to the steep Ne increase of the other haplogroups. It therefore 
thus seems that most extant Khoe-San associated haplogroups benefited from the 
introduction of pastoralism into southern Africa. This might not always have been a direct 
benefit through the adoption of pastoralist practices, but could be indirect benefits through 
trade relations with pastoralists. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 293 
7. CONCLUSION 
 
The inclusion of three genetic markers with different modes of inheritance (mtDNA 
maternal inheritance; Y-chromosome DNA paternal inheritance; autosomal DNA bi-
 parentally inherited), properties (no recombination in mtDNA and Y-chromosome DNA; 
recombination in autosomal DNA; single locus history for mtDNA and Y-chromosome DNA; 
multiple unlinked markers for the autosomal DNA) and differences in mutation rate (fast 
rate in the Y-chromosome; slower rates in the mtDNA and autosomal SNPs) afforded this 
study the unique opportunity of robustly examining patterns of genetic variation in Khoe-
 San populations from southern Africa. These data were used to assess the evolutionary 
history of the Khoe-San in Africa.   
 
Both mtDNA and Y-chromosome studies revealed that the mtDNA lineages (L0d and L0k, 
found at frequencies of 74% and 14%, respectively) and Y-chromosome haplogroups 
(haplogroup A found at frequencies of 34%) in the Khoe-San are among the oldest 
lineages that have survived in the human population and retained in this group at 
appreciable frequencies. However, differences in frequencies and distribution of sub-
 haplogroups of the major mtDNA and Y-chromosome haplogroups suggest that the 
different Khoe-San groups have over the years diverged from an ancestral parental group 
and acquired their own unique history. Consequently, these findings caution against a 
haphazard grouping of populations or a pooling of groups into a single group.  
 
Although language as a tool for historical reconstruction has a shallow dept of resolution 
(~10 000) relative to genetic data (~60 000 ? 200 000 years using Y-chromosome DNA 
and mtDNA), the results from this study was concordant with linguistic data that suggested 
a deep and ancient divide between northern and southern Khoe-San groups (G?ldemann, 
Forthcoming-a; G?ldemann, In Press). This divide was more pronounced in the maternal 
gene-pool (mtDNA data), where genetic distances between groups strongly correlate with 
geographic distances. Conversely, no significant correlation was seen between Y-
 chromosome genetic distances and geographic distances. This pattern could be attributed 
to female stationarity and male migration between groups.   
 
 294 
Y-chromosome data, more specifically, the distribution and frequency of the E-M35 
haplogroup, seems to parallel archaeological data with respect to the spread of pastoralism 
in sub-Saharan Africa (Elphick, 1977; Smith, 1983; Smith, 1992; Sadr, 1998; Mitchell, 
2002). Y-chromosome data obtained in the present study and that of Henn et al., (2008) 
suggests that the present-day group who self identify as Khwe were responsible for the 
introduction of pastoralism from east Africa into the region of northern Botswana. Also, 
these data were used to address how pastoralism was introduced to the south. The data 
tend to favor a coupled cultural-demic model with the movement of a few male individuals 
that integrated with the existing San tribes south of them and took with them the pastoralist 
practice and likely also their Khoe-language. This pattern is reflected in the frequency and 
distribution of E-M35, with highest frequency (46%) in the Khwe and a decrease in 
frequency towards the south presenting with low frequencies (<10%) in the Karoo Coloured 
groups. Conversely, none of the mtDNA (female) L0k and L0d lineages observed in the 
Khwe group was observed in the southern Khoe-San and Coloured groups, suggesting 
limited or no female movement. 
 
Many of the hypotheses discussed in this thesis were based on the interpretation of results 
from genetic data examined in the present study.  Several of these would be refined, 
modified or even disproved as more data become available in the future. While the focus of 
this study was to evaluate the use of various types of genetic markers in reconstructing the 
history of Khoisan-speaking populations, a more comprehensive comparative analysis of 
these data with archaeological data were outside the scope of this study, and could be the 
focus of future studies.  
 
This study has highlighted the place of the Khoe-San in the evolutionary history of African 
populations. Presently, many Khoe-San groups are still not being respected as individuals 
with a democratic right to speak for themselves and decide their own destiny. In the span 
of a few hundred years this group of people has lost so much; they have been massacred, 
victimized, discriminated against and marginalized by other migrant groups to their 
homeland region in southern Africa. They are increasingly being affected by social ills such 
as economic dependency, alcoholism, malnutrition, and societal breakdown. The constant 
discrimination and humiliation (especially among the younger generations) has had a 
 295 
profound effect on the way individuals prefer to identify themselves, with a stronger affinity 
to self identify as Bantu-speakers or Coloured rather than Khoe or San. However, some 
groups are re-discovering their identity and take pride in the uniqueness of their ancestry. 
We had the opportunity in this study to take back genetic ancestry test results to many 
individuals and to share with them the genetic findings from this study. This dialogue with 
individuals and community, it is hoped, would contribute in spreading the ?word? about their 
unique place in the history of the world and to document their own history.  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 296 
8. REFERENCES 
-  Allard MW, Polanskey D, Miller K, Wilson MR, Monson KL and Budowle B (2005). Characterization of 
human control region sequences of the African American SWGDAM forensic mtDNA data set. Forensic Sci 
Int 148: 169-79 
-  Ambrose SH (1982). Berkeley, University of California Press: 104?157. 
-  Amo T and Brand MD (2007). Were inefficient mitochondrial haplogroups selected during migrations of 
modern humans? A test using modular kinetic analysis of coupling in mitochondria from cybrid cell lines. 
Biochem J 404: 345-51 
-  Anderson S, Bankier AT, Barrell BG, de Bruijn MH, Coulson AR, Drouin J, Eperon IC, et al., (1981). 
Sequence and organization of the human mitochondrial genome. Nature 290: 457-65 
-  Andrews RM, Kubacka I, Chinnery PF, Lightowlers RN, Turnbull DM and Howell N (1999). Reanalysis and 
revision of the Cambridge reference sequence for human mitochondrial DNA. Nat Genet 23: 147 
-  Anisimova M and Gascuel O (2006). Approximate likelihood-ratio test for branches: A fast, accurate, and 
powerful alternative. Syst Biol 55: 539-52 
-  Atkinson QD, Gray RD and Drummond AJ (2008). mtDNA variation predicts population size in humans and 
reveals a major Southern Asian chapter in human prehistory. Mol Biol Evol 25: 468-74 
-  Atkinson QD, Gray RD and Drummond AJ (2009). Bayesian coalescent inference of major human 
mitochondrial DNA haplogroup expansions in Africa. Proc Biol Sci 276: 367-73 
-  Balloux F, Handley LJ, Jombart T, Liu H and Manica A (2009). Climate shaped the worldwide distribution of 
human mitochondrial DNA sequence variation. Proc Biol Sci 276: 3447-55 
-  Bamshad M, Wooding S, Salisbury BA and Stephens JC (2004). Deconstructing the relationship between 
genetics and race. Nat Rev Genet 5: 598-609 
-  Bandelt HJ, Forster P and Rohl A (1999). Median-joining networks for inferring intraspecific phylogenies. 
Mol Biol Evol 16: 37-48 
-  Barbujani G, Magagni A, Minch E and Cavalli-Sforza LL (1997). An apportionment of human DNA diversity. 
Proc Natl Acad Sci U S A 94: 4516-9 
-  Barnard A (1988). Kinship, language and production: a conjectural history of Khoisan social structure. 
Africa 58: 29-50 
-  Barnard A (1992). Hunters and herders of southern Africa - A comparitive ethnography of the Khoisan 
peoples. Cambridge, Cambridge University Press 
-  Beaumont PB (1980). On the age of Border Cave hominids 1-5. Palaeontologia Afr 23: 131-143 
-  Behar DM, Rosset S, Blue-Smith J, Balanovsky O, Tzur S, Comas D, Mitchell RJ, et al., (2007). The 
Genographic Project public participation mitochondrial DNA database. PLoS Genet 3: e104 
-  Behar DM, Villems R, Soodyall H, Blue-Smith J, Pereira L, Metspalu E, Scozzari R, et al., (2008). The 
dawn of human matrilineal diversity. Am J Hum Genet 82: 1130-40 
-  Bennun N (2004). The Broken String - The last words of an extinct people. London, Penguin Books 
-  Bergen AW, Wang CY, Tsai J, Jefferson K, Dey C, Smith KD, Park SC, et al., (1999). An Asian-Native 
American paternal lineage identified by RPS4Y resequencing and by microsatellite haplotyping. Ann Hum 
Genet 63: 63-80 
-  Biesele M and Royal K (1999). Africa; Mbuti. The Cambridge encyclopedia of hunters and gatherers. 
Richard B and Daly R. Cambridge, Cambridge University Press: 210?214. 
-  Bleek DF (1928). Bushmen of central Angola. Bantu Studies 3: 105-125 
-  Bleek WHI (1862). A comparative grammar of South African languages. Part I. Phonology. London 
-  Boonzaier E, Malherbe C, Smith A and Berens P (1996). The Cape Herders: A History of the Khoikhoi of 
Southern Africa. Cape Town and Johannesburg, David Philip 
-  Bosch E, Calafell F, Comas D, Oefner PJ, Underhill PA and Bertranpetit J (2001). High-resolution analysis 
of human Y-chromosome variation shows a sharp discontinuity and limited gene flow between northwestern 
Africa and the Iberian Peninsula. Am J Hum Genet 68: 1019-29 
-  Bouzouggar A, Barton N, Vanhaeren M, d'Errico F, Collcutt S, Higham T, Hodge E, et al., (2007). 82,000-
 year-old shell beads from North Africa and implications for the origins of modern human behavior. Proc Natl 
Acad Sci U S A 104: 9964-9 
-  Bowcock AM, Bucci C, Hebert JM, Kidd JR, Kidd KK, Friedlaender JS and Cavalli-Sforza LL (1987). Study 
of 47 DNA markers in five populations from four continents. Gene Geogr 1: 47-64 
-  Bowcock AM, Hebert JM, Mountain JL, Kidd JR, Rogers J, Kidd KK and Cavalli-Sforza LL (1991a). Study of 
an additional 58 DNA markers in five human populations from four continents. Gene Geogr 5: 151-73 
 297 
-  Bowcock AM, Kidd JR, Mountain JL, Hebert JM, Carotenuto L, Kidd KK and Cavalli-Sforza LL (1991b). 
Drift, admixture, and selection in human evolution: a study with DNA polymorphisms. Proc Natl Acad Sci U S 
A 88: 839-43 
-  Bowcock AM, Ruiz-Linares A, Tomfohrde J, Minch E, Kidd JR and Cavalli-Sforza LL (1994). High resolution 
of human evolutionary trees with polymorphic microsatellites. Nature 368: 455-7 
-  Broyhill K, Hitchcock R and Biesele M (Current). Current situations facing the san peoples of southern 
africa, Review on Current San Economic and Social Situations for the University of Free State. 
http://www.kalaharipeoples.org/downloads/Current%20Situations%20of%20the%20San.pdf. 
-  Bryant D and Moulton V (2002). NeighborNet: An agglomerative method for the construction of planar 
phylogenetic networks. Algorithms in Bioinformatics. Guig? R and Guseld D, WABI 2002. LNCS 2452: 375-
 391. 
-  Campbell AC (1990). Comment on: Foragers, genuine or spurious? by J.S. Solway and R.B. Lee. Curr 
Anthropol 31: 123-124 
-  Campbell MC and Tishkoff SA (2008). African genetic diversity: implications for human demographic 
history, modern human origins, and complex disease mapping. Annu Rev Genomics Hum Genet 9: 403-33 
-  Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, Bodmer J, et al., (2002). A human 
genome diversity cell line panel. Science 296: 261-2 
-  Cann RL, Stoneking M and Wilson AC (1987). Mitochondrial DNA and human evolution. Nature 325: 31-6 
-  Casanova M, Leroy P, Boucekkine C, Weissenbach J, Bishop C, Fellous M, Purrello M, et al., (1985). A 
human Y-linked DNA polymorphism and its potential for estimating genetic and evolutionary distance. 
Science 230: 1403-6 
-  Cashdan E (1986). Hunter-gatherers of the northern Kalahari. Contemporary Studies on Khoisan. Vossen 
R and Keuthmann K. Hamburg, Helmut Buske Verlag. 1: 145-180. 
-  Cavalli-Sforza LL (1986). African Pygmies. Orlando (FL), Academic Press 
-  Cavalli-Sforza LL (1998). The DNA revolution in population genetics. Trends Genet 14: 60-5 
-  Cavalli-Sforza LL, Menozzi P and Piazza A (1994). The History and Geography of Human Genes. 
Princeton, Princeton University Press 
-  Chen YS, Olckers A, Schurr TG, Kogelnik AM, Huoponen K and Wallace DC (2000). mtDNA variation in 
the South African Kung and Khwe-and their genetic relationships to other African populations. Am J Hum 
Genet 66: 1362-83 
-  Clark AG, Weiss KM, Nickerson DA, Taylor SL, Buchanan A, Stengard J, Salomaa V, et al., (1998). 
Haplotype structure and population genetic inferences from nucleotide-sequence variation in human 
lipoprotein lipase. Am J Hum Genet 63: 595-612 
-  Cooke CK (1965). Evidence of human migrations from the rock art of Southern Rhodesia. Africa 5: 263-285 
-  Corander J, Waldmann P and Sillanpaa MJ (2003). Bayesian analysis of genetic differentiation between 
populations. Genetics 163: 367-74 
-  Crawhall N (2003). The rediscovery of N|u and the ?Khomani Land Claim Process, South Africa. 
Maintaining the Links: Language Identity and the Land: Proceedings of the Seventh Foundation for 
Endangered Languages Conference, Broome, Western Australia, Bristol: Foundation for Endangered 
Languages. 
-  Crawhall N (2006). Languages, genetics and archaeology: problems and the possibilties in Africa. The 
prehistory of Africa. Soodyall H. Johannesburg & Cape Town, Jonathan Ball Publishers: 109-124. 
-  Cruciani F, La Fratta R, Santolamazza P, Sellitto D, Pascone R, Moral P, Watson E, et al., (2004). 
Phylogeographic analysis of haplogroup E3b (E-M215). Y chromosomes reveals multiple migratory events 
within and out of Africa. Am J Hum Genet 74: 1014-22 
-  Cruciani F, La Fratta R, Trombetta B, Santolamazza P, Sellitto D, Colomb EB, Dugoujon JM, et al., (2007). 
Tracing past human male movements in northern/eastern Africa and western Eurasia: new clues from Y-
 chromosomal haplogroups E-M78 and J-M12. Mol Biol Evol 24: 1300-11 
-  Cruciani F, Santolamazza P, Shen P, Macaulay V, Moral P, Olckers A, Modiano D, et al., (2002). A back 
migration from Asia to sub-Saharan Africa is supported by high-resolution analysis of human Y-chromosome 
haplotypes. Am J Hum Genet 70: 1197-214 
-  De Almeida A (1965). Bushmen and other non-Bantu peoples of Angola. Johannesburg, Witwatersrand 
University Press for the Institute for the Study of Man in Africa 
-  De Jongh M (2002). No fixed abode: the poorest of the poor and elusive identities in rural South Africa. 
Journal of Southern African Studies 28: 441-460 
-  Deacon HJ and Deacon J (1999). Human Beginnings in South Africa. Uncovering the Secrets of the Stone 
Age. Cape Town and Johannesburg, David Philip Publishers 
 298 
-  Deacon HJ, Deacon J, Brooker M and Wilson ML (1978). The evidence for herding at Boomplaas Cave in 
the southern Cape, South Africa. South African Archaeological Bulletin 33: 39-65 
-  Deacon J (1984). Later Stone Age people and their descendants in southern Africa. Southern African 
Prehistory and Paleoenvironments. Klein R G. Rotterdam, A. A. Balkema: 221-328. 
-  Deacon J (1996). A Tale of Two Families: Wilhelm Bleek, Lucy Lloyd and the /Xam San of the Northern 
Cape. Miscast. Negotiating the Presence of the Bushmen. Skotnes P. Cape Town, UCT Press: 93-113. 
-  Denbow JR and Wilmsen EN (1986). Advent and course of pastoralism in the kalahari. Science 234: 1509-
 15 
-  Destro-Bisol G, Donati F, Coia V, Boschi I, Verginelli F, Caglia A, Tofanelli S, et al., (2004). Variation of 
female and male lineages in sub-saharan populations: the importance of sociocultural factors. Mol Biol Evol 
21: 1673-82 
-  Dornan SS (1975). Pygmies and Bushmen of the Kalahari. Cape Town, C. Struik (PTY) LTD. 
-  Drummond AJ and Rambaut A (2007). BEAST: Bayesian evolutionary analysis by sampling trees. BMC 
Evol Biol 7: 214 
-  Drummond AJ, Rambaut A, Shapiro B and Pybus OG (2005). Bayesian coalescent inference of past 
population dynamics from molecular sequences. Mol Biol Evol 22: 1185-92 
-  Ehret C (1982). The first spread of food production in southern Africa. The archaelogical and linguistic 
reconstruction of African history. Ehret C and Posnansky M. Berkeley, University of California Press: 158-
 181. 
-  Ehret C and Posnansky M (1982). The archaeological and linguistic reconstruction of African history. 
California, University of California Press 
-  Elphick R (1977). Kraal and castle. New Haven, Yale University Press 
-  Elphick R (1985). Khoikhoi and the founding of White South Africa. Johannesburg, Raven Press 
-  Elson JL, Turnbull DM and Howell N (2004). Comparative genomics and the evolution of human 
mitochondrial DNA: assessing the effects of selection. Am J Hum Genet 74: 229-38 
-  Engelbrecht JA (1936). The Korana: an account of their customs and their history. Cape Town, Miller 
-  Estermann C, Ed. (1976). The ethnography of southwestern Angola, Volume 1: The non-Bantu peoples; 
the Ambo ethnic group. New York, Africana Publishing Company. 
-  Evanno G, Regnaut S and Goudet J (2005). Detecting the number of clusters of individuals using the 
software STRUCTURE: a simulation study. Mol Ecol 14: 2611-20 
-  Excoffier L, Laval G and Schneider S (2005). Arlequin ver. 3.0: An integrated software package for 
population genetics data analysis. Evol Bioinfor Online 1: 47-50 
-  Excoffier L and Schneider S (1999). Why hunter-gatherer populations do not show signs of pleistocene 
demographic expansions. Proc Natl Acad Sci U S A 96: 10597-602 
-  Excoffier L and Slatkin M (1995). Maximum-likelihood estimation of molecular haplotype frequencies in a 
diploid population. Mol Biol Evol 12: 921-7 
-  Excoffier L and Yang Z (1999). Substitution rate variation among sites in mitochondrial hypervariable region 
I of humans and chimpanzees. Mol Biol Evol 16: 1357-68 
-  Falush D, Stephens M and Pritchard JK (2003). Inference of population structure using multilocus genotype 
data: linked loci and correlated allele frequencies. Genetics 164: 1567-87 
-  Falush D, Stephens M and Pritchard JK (2007). Inference of population structure using multilocus genotype 
data: dominant markers and null alleles. Mol Ecol Notes 7: 574-578 
-  Felsenstein J (2004). PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. 
Department of Genome Sciences, University of Washington, Seattle 
-  Fluxus-engineering (2008). Fluxus Technology Ltd. 2008-2009. 
-  Forster P (2004). Ice Ages and the mitochondrial DNA chronology of human dispersals: a review. Philos 
Trans R Soc Lond B Biol Sci 359: 255-64; discussion 264 
-  Forster P, Harding R, Torroni A and Bandelt HJ (1996). Origin and evolution of Native American mtDNA 
variation: a reappraisal. Am J Hum Genet 59: 935-45 
-  Francois O, Ancelet S and Guillot G (2006). Bayesian clustering using hidden Markov random fields in 
spatial population genetics. Genetics 174: 805-16 
-  Fu YX (1997). Statistical tests of neutrality of mutations against population growth, hitchhiking and 
background selection. Genetics 147: 915-25 
-  Garrigan D and Hammer MF (2006). Reconstructing human origins in the genomic era. Nat Rev Genet 7: 
669-80 
-  Gifford-Gonzalez D (2000). Animal disease challenges to the emergence of pastoralism in sub-Saharan 
Africa. Afr Archaeol Rev 17: 95-139 
 299 
-  Golden-Software I (2006). Surfer Demo, Golden Software Inc. 2007-2009. 
-  Goldstein DB, Ruiz Linares A, Cavalli-Sforza LL and Feldman MW (1995). Genetic absolute dating based 
on microsatellites and the origin of modern humans. Proc Natl Acad Sci U S A 92: 6723-7 
-  Gonder MK, Mortensen HM, Reed FA, de Sousa A and Tishkoff SA (2007). Whole-mtDNA genome 
sequence analysis of ancient African lineages. Mol Biol Evol 24: 757-68 
-  Gordon R (1984). The !Kung in the Kalahari exchange: an ethnohistorical perspective. Past and present in 
hunter-gatherer studies. Schrire C. Orlando, FL, Academic Press: 195-224. 
-  Gordon R (1986). Once again: How many Bushmen are there? The past and future of !Kung ethnography: 
critical reflections and symbolic perspectives, essays in honour of Lorna Marshall. Biesele M, Gordon R and 
Lee R. Hamburg, Helmut Buske Verlag: 53-68. 
-  Gordon RG, Ed. (2005). Ethnologue: Languages of the World. Online version: http://www.ethnologue.com/. 
Dallas, Texas, SIL International. 
-  Gorissen P (2008). Google Maps Latitude, Longitude Popup. 2008. 
-  Green RE, Malaspinas AS, Krause J, Briggs AW, Johnson PL, Uhler C, Meyer M, et al., (2008). A complete 
Neandertal mitochondrial genome sequence determined by high-throughput sequencing. Cell 134: 416-26 
-  Greenberg JH (1963). The languages of Africa. Bloomington, Indiana, Indiana University Press 
-  Greenberg JH (1972). Linguistic evidence concerning Bantu origins. J Afr Hist 13: 189-216 
-  Griffiths RC and Tavare S (1994). Simulating probability distributions in the coalescent. Theor Popul Biol 
46: 131?159 
-  Grine FE, Bailey RM, Harvati K, Nathan RP, Morris AG, Henderson GM, Ribot I, et al., (2007). Late 
Pleistocene human skull from Hofmeyr, South Africa, and modern human origins. Science 315: 226-9 
-  Grun R, Shackleton NJ and Deacon HJ (1990). Electron-Spin-Resonance Dating of Tooth Enamel From 
Klasies River Mouth Cave. Curr Anthropol 31: 427-432  
-  Guenther M (1996). From 'Lords of the Desert' to 'Rubbish People': The Colonial and Comtemporary State 
of the Nharo of Botswana. Miscast. Negotiating the Presence of the Bushmen. Skotnes P. Cape Town, UCT 
Press. 
-  Guenther MG (1986). Acculturation and assimilation of the Bushmen of Botswana and Namibia. 
Contemporary Studies on Khoisan. Vossen R and Keuthmann K. Hamburg, Helmut Buske Verlag. 1: 347-
 373. 
-  Guindon S, Lethiec F, Duroux P and Gascuel O (2005). PHYML Online--a web server for fast maximum 
likelihood-based phylogenetic inference. Nucleic Acids Res 33: W557-9 
-  G?ldemann T (2006a). The San languages of southern Namibia: linguistic appraisal with special reference 
to J. G. Kr?nlein?s N|uusaa data. Anthropological Linguistics 48: 369-395 
-  G?ldemann T (2006b). Structural isoglosses between Khoekhoe and Tuu: the Cape as a linguistic area. 
Linguistic areas: convergence in historical and typological perspective. Matras Y, McMahon A and Vincent N. 
Hampshire, Palgrave Macmillan: 99-134. 
-  G?ldemann T (2007). Clicks, genetics, and ?proto-world? from a linguistic perspective. University of Leipzig 
Papers on Africa. Leipzig, Institut f?r Afrikanistik, Universit?t Leipzig. 
-  G?ldemann T (Forthcoming-a). Greenberg's "case" for Khoisan: the morphological evidence. Problems of 
linguistic-historical reconstruction in Africa. Vossen R and Ibriszimow D. K?ln, R?diger K?ppe. 
-  G?ldemann T (Forthcoming-b). Person-gender-number marking from Proto-Khoe-Kwadi to its descendents: 
a rejoinder with particular reference to language contact. Festschrift for Bernd Heine. K?nig C and Vossen R. 
London, Routledge. 
-  G?ldemann T (In Press). Changing profile when encroaching on hunter-gatherer territory: towards a history 
of the Khoe-Kwadi family in southern Africa. Hunter-gatherers and linguistic history: a global perspective. 
G?ldemann T, McConvell P and Rhodes R. Cambridge, Cambridge University Press. 
-  G?ldemann T and Elderkin ED (Forthcoming). On external genealogical relationships of the Khoe family. 
Khoisan Languages and Linguistics: the Riezlern Symposium 2003. Brenzinger M and K?nig C. K?ln, 
R?diger K?ppe. 
-  Hall TA (1999). BioEdit: a user-friendly biological sequence alignment editor and analysis program for 
Windows 95/98/NT. Nucl. Acids. Symp. Ser. 41: 95-98 
-  Hammer MF (1994). A recent insertion of an alu element on the Y chromosome is a useful marker for 
human population studies. Mol Biol Evol 11: 749-61 
-  Hammer MF and Horai S (1995). Y chromosomal DNA variation and the peopling of Japan. Am J Hum 
Genet 56: 951-62 
 300 
-  Hammer MF, Karafet T, Rasanayagam A, Wood ET, Altheide TK, Jenkins T, Griffiths RC, et al., (1998). Out 
of Africa and back again: nested cladistic analysis of human Y chromosome variation. Mol Biol Evol 15: 427-
 41 
-  Hammer MF, Karafet TM, Redd AJ, Jarjanazi H, Santachiara-Benerecetti S, Soodyall H and Zegura SL 
(2001a). Hierarchical patterns of global human Y-chromosome diversity. Mol Biol Evol 18: 1189-203 
-  Hammer MF, Spurdle AB, Karafet T, Bonner MR, Wood ET, Novelletto A, Malaspina P, et al., (1997). The 
geographic distribution of human Y chromosome variation. Genetics 145: 787-805 
-  Hammer O, Harper DAT and Ryan PD (2001b). PAST: Palaeontological Statistics software package for 
education and data analysis. Palaeontologia Electronica 4: 9 
-  Harding RM, Fullerton SM, Griffiths RC, Bond J, Cox MJ, Schneider JA, Moulin DS, et al., (1997). Archaic 
African and Asian lineages in the genetic ancestry of modern humans. Am J Hum Genet 60: 772-89 
-  Harding RM, Healy E, Ray AJ, Ellis NS, Flanagan N, Todd C, Dixon C, et al., (2000). Evidence for variable 
selective pressures at MC1R. Am J Hum Genet 66: 1351-61 
-  Harpending H and Rogers A (2000). Genetic perspectives on human origins and differentiation. Annu Rev 
Genomics Hum Genet 1: 361-85 
-  Harpending HC, Sherry ST, Rogers AR and Stoneking M (1993). The genetic structure of ancient human 
populations. Curr Anthropol 34: 483?496 
-  Harris EE and Hey J (1999). X chromosome evidence for ancient human histories. Proc Natl Acad Sci U S 
A 96: 3320-4 
-  Henn BM, Gignoux C, Lin AA, Oefner PJ, Shen P, Scozzari R, Cruciani F, et al., (2008). Y-chromosomal 
evidence of a pastoralist migration through Tanzania to southern Africa. Proc Natl Acad Sci U S A 105: 
10693-8 
-  Henshilwood CS (1996). A revised chronology for pastoralism in southernmost Africa: New evidence of 
sheep at ca. 2000 B.P. from Blombos Cave, South Africa. Antiquity 70: 945-949 
-  Henshilwood CS, d'Errico F, Yates R, Jacobs Z, Tribolo C, Duller GA, Mercier N, et al., (2002). Emergence 
of modern human behavior: Middle Stone Age engravings from South Africa. Science 295: 1278-80 
-  Hoernle AW, Ed. (1985). The social organization of the Nama and other essays. Johannesburg, 
Witwatersrand University Press. 
-  Horai S (1995). Evolution and the origins of man: clues from complete sequences of hominoid 
mitochondrial DNA. Southeast Asian J Trop Med Public Health 26 Suppl 1: 146-54 
-  Horai S, Hayasaka K, Kondo R, Tsugane K and Takahata N (1995). Recent African origin of modern 
humans revealed by complete sequences of hominoid mitochondrial DNAs. Proc Natl Acad Sci U S A 92: 
532-6 
-  Hudson RR (1990). Gene genealogies and the coalescent process. Oxf Surv Evol Biol 7: 1?14 
-  Huffman TN (1982). Archaeology and the ethnohistory of the African Iron Age. Ann Rev Anthropol 11: 133-
 150 
-  Huffman TN (1983). The trance hypothesis and the rock art of Zimbabwe. New approaches to southern 
African rock art. Lewis-Williams J D, South African Archaeological Society: 49-53. 
-  Huson DH and Bryant D (2006). Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 
23: 254-67 
-  Huson DH, Richter DC, Rausch C, Dezulian T, Franz M and Rupp R (2007). Dendroscope: An interactive 
viewer for large phylogenetic trees. BMC Bioinformatics 8: 460 
-  Ingman M and Gyllensten U (2007). Rate variation between mitochondrial domains and adaptive evolution 
in humans. Hum Mol Genet 16: 2281-7 
-  Ingman M, Kaessmann H, Paabo S and Gyllensten U (2000). Mitochondrial genome variation and the 
origin of modern humans. Nature 408: 708-13 
-  Jakobsson M and Rosenberg NA (2007). CLUMPP: a cluster matching and permutation program for 
dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23: 1801-6 
-  Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, Szpiech ZA, et al., (2008). 
Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451: 998-1003 
-  Jenkins T (1974). Blood group Abantu population and family studies. Vox Sang 26: 537-50 
-  Jenkins T (1982). Human evolution in southern Africa. The Unfolding Genome. Bonne-Tamir B. New York, 
Alan R. Liss Inc.: 227-253. 
-  Jenkins T (1986). The prehistory of the San and Khoikhoi as recorded in their blood. Contemporary Studies 
on Khoisan. Vossen R and Keuthmann K. Hamburg, Helmut Buske Verlag. 2: 51-77. 
 301 
-  Jenkins T (1988). The peoples of southern Africa. Studies in diversity and disease. Raymond Dart 
Lectures. Lecture 24. Pines N J. Johannesburg, Institute for the Study of Man in Africa, Witwatersrand 
University Press. 
-  Jenkins T and Corfield V (1972). The red cell acid phosphatase polymorphism in Southern Africa: 
population data and studies on the R, RA and RB phenotypes. Ann Hum Genet 35: 379-91 
-  Jenkins T and Dunn DS (1981). Haematological genetics in the tropics. Part 1: Tropical Africa. Clin 
Haematol 10: 1029-50 
-  Jenkins T, Harpending HC, Gordon H, Keraan MM and Johnston S (1971). Red-cell-enzyme 
polymorphisms in the Khoisan peoples of Southern Africa. Am J Hum Genet 23: 513-32 
-  Jenkins T and Nurse GT (1972). Blood group gene frequencies. S Afr Med J 46: 560 
-  Jenkins T, Zoutendyk A and Steinberg AG (1970). Gammaglobulin groups (Gm and Inv) of various 
Southern African populations. Am J Phys Anthropol 32: 197-218 
-  Jobling MA, Hurles ME and Tyler-Smith C (2004a). Human Evolutionary Genetics. Origins, Peoples & 
Disease. New York, Garland Publishing 
-  Jobling MA, Hurles ME and Tyler-Smith C (2004b). Making inferences from diversity. Human Evolutionary 
Genetics. Origins, Peoples & Disease. New York, Garland Publishing: 164. 
-  Jobling MA, Hurles ME and Tyler-Smith C (2004c). Measuring and summerizing genetic variation. Human 
Evolutionary Genetics. Origins, Peoples & Disease. New York, Garland Publishing: 155. 
-  Jobling MA and Tyler-Smith C (2000). New uses for new haplotypes the human Y chromosome, disease 
and selection. Trends Genet 16: 356-62 
-  Jobling MA and Tyler-Smith C (2003). The human Y chromosome: an evolutionary marker comes of age. 
Nat Rev Genet 4: 598-612 
-  Johnston HH (1913). A survey of the ethnography of Africa: and the former racial and tribal migrations of 
that continent. Journal of the Royal Anthropological Institute XLIII: 391-392 
-  Jorde LB, Watkins WS and Bamshad MJ (2001). Population genomics: a bridge from evolutionary history to 
genetic medicine. Hum Mol Genet 10: 2199-207 
-  Jorde LB, Watkins WS, Bamshad MJ, Dixon ME, Ricker CE, Seielstad MT and Batzer MA (2000). The 
distribution of human genetic diversity: a comparison of mitochondrial, autosomal, and Y-chromosome data. 
Am J Hum Genet 66: 979-88 
-  Kaessmann H, Heissig F, von Haeseler A and Paabo S (1999). DNA sequence variation in a non-coding 
region of low recombination on the human X chromosome. Nat Genet 22: 78-81 
-  Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura SL and Hammer MF (2008). New binary 
polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree. Genome Res 
18: 830-8 
-  Kayser M, Brauer S, Weiss G, Underhill PA, Roewer L, Schiefenhovel W and Stoneking M (2000). 
Melanesian origin of Polynesian Y chromosomes. Curr Biol 10: 1237-46 
-  Kinahan J (1995). A new archaeological perspective on nomadic pastoralist expansion in south-western 
Africa. Azania 29/30: 211-226 
-  Kingman JFC (1982). On the genealogy of large populations. J Appl Probab 19A: 27?43 
-  Kivisild T, Shen P, Wall DP, Do B, Sung R, Davis K, Passarino G, et al., (2006). The role of selection in the 
evolution of human mitochondrial genomes. Genetics 172: 373-87 
-  Klein RG (1986). The prehistory of stone age herders in the Cape Province of South Africa. South African 
Archaeological Society, Goodwin Series 5: 5-12 
-  Klein RG (2000). The human career: Human biological and cultural origins. Chicago, University of Chicago 
Press 
-  Klein RG, Avery G, Cruz-Uribe K, Halkett D, Parkington JE, Steele T, Volman TP, et al., (2004). The 
Ysterfontein 1 Middle Stone Age site, South Africa, and early human exploitation of coastal resources. Proc 
Natl Acad Sci U S A 101: 5708-15 
-  Knight A, Underhill PA, Mortensen HM, Zhivotovsky LA, Lin AA, Henn BM, Louis D, et al., (2003). African Y 
chromosome and mtDNA divergence provides insight into the history of click languages. Curr Biol 13: 464-73 
-  Korsman SA and Plug I (1992). Archeological evidence and ethnographic analogy - interpreting prehistoric 
social behaviour at Honingklip in the eastern Transvaal. S Afr J Ethnol 15: 120-126 
-  Lahr MM and Foley RA (1998). Towards a theory of modern human origins: geography, demography, and 
diversity in recent human evolution. Am J Phys Anthropol Suppl 27: 137-76 
-  Landsteiner K (1901). Uber Agglutinationserscheinungen normalen menschlichen. Wiener Klin. 
Wochenschr. 14: 1132-1134 
-  Langella O (2002). Populations v.1.2.30. 2008. 
 302 
-  le Roux W and White A, Eds. (2004). Voices of the San. Cape Town, Kwela Books. 
-  Lee RB (1979). The !Kung San: men, women, and work in a foraging society. Cambridge, Cambridge 
University Press. 
-  Lewis-Williams JD (1986). Beyond style and portrait: A comparison of Tanzanian and southern African rock 
art. Contemporary Studies on Khoisan. Vossen R and Keuthmann K. Hamburg, Helmut Buske Verlag. 2: 95-
 122. 
-  Lewis PO and Zaykin DV (2001). Genetic Data Analysis: Computor program for the analysis of allelic data. 
Version 1.0 (d16c). Free program distributed by the authors over the internet. 
-  Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM, et al., (2008). 
Worldwide human relationships inferred from genome-wide patterns of variation. Science 319: 1100-4 
-  Liu K and Muse SV (2005). PowerMarker: an integrated analysis environment for genetic marker analysis. 
Bioinformatics 21: 2128-9 
-  Lomax A (1968). Folk Song Style and Culture. Washington, DC, National Association for the Advancement 
of Science: 16?18, 26, 91?92. 
-  Lombard M (2008). From testing times to high resolution: The Late Pleistocene Middle Stone Age of South 
Africa and beyond. Goodwin Series 10: 180-188 
-  Low BS (1988). Measures of polygyny in humans. Curr Anthropol 29: 189?194 
-  Maca-Meyer N, Gonzalez AM, Larruga JM, Flores C and Cabrera VM (2001). Major genomic mitochondrial 
lineages delineate early human expansions. BMC Genet 2: 13 
-  Macaulay V, Richards M, Hickey E, Vega E, Cruciani F, Guida V, Scozzari R, et al., (1999). The emerging 
tree of West Eurasian mtDNAs: a synthesis of control-region sequences and RFLPs. Am J Hum Genet 64: 
232-49 
-  Marlowe FW (2004). Is human ovulation concealed? Evidence from conception beliefs in a hunter-gatherer 
society. Arch Sex Behav 33: 427?432 
-  Marshall J and Ritchie C (1984). Where are the Ju/wasi of Nyae Nyae? Changes in a Bushman society: 
1958-1981. Cape Town, Centre for African Studies, University of Cape Town (Communications No.9) 
-  Marshall L (1960). !Kung Bushmen Bands. Africa 30: 325-355 
-  Marshall L (1976). The !Kung of Nyae Nyae. Cambridge, Harvard University Press 
-  Mazel A (1996). In pursuit of San Pre-colonial History in the Natal Drakensberg: A Historical Overview. 
Miscast. Negotiating the Presence of the Bushmen. Skotnes P. Cape Town, UCT Press: 191-195. 
-  McDougall I, Brown FH and Fleagle JG (2005). Stratigraphic placement and age of modern humans from 
Kibish, Ethiopia. Nature 433: 733-6 
-  Merriwether DA, Clark AG, Ballinger SW, Schurr TG, Soodyall H, Jenkins T, Sherry ST, et al., (1991). The 
structure of human mitochondrial DNA variation. J Mol Evol 33: 543-55 
-  Meyer S, Weiss G and von Haeseler A (1999). Pattern of nucleotide substitution and rate heterogeneity in 
the hypervariable regions I and II of human mtDNA. Genetics 152: 1103-10 
-  Michels C (1997). Latitude/Longitude Distance Calculation. 2008. 
-  Miller-Ockhuizen A and Sands BE (1999). !Kung as a linguistic construct. Language & Communication 19: 
401-413 
-  Miller SA, Dykes DD and Polesky HF (1988). A simple salting out procedure for extracting DNA from 
human nucleated cells. Nucleic Acids Res 16: 1215 
-  Mishmar D, Ruiz-Pesini E, Golik P, Macaulay V, Clark AG, Hosseini S, Brandon M, et al., (2003). Natural 
selection shaped regional mtDNA variation in humans. Proc Natl Acad Sci U S A 100: 171-6 
-  Mitchell PJ (2002). The Archaeology of Southern Africa. Cambridge, Cambridge University Press 
-  Mitchell PJ (2008). Developing the archaeology of Marine Isotope Stage 3. Goodwin Series 10: 52-65 
-  Morris AG (1992). Biological relationships between Upper Pleistocene and Holocene populations in 
southern Africa. Continuity or Replacement: Controversies in Homo sapiens evolution. Brauer G and Smith F 
H. Rotterdam, Balkema: 131-143. 
-  Morris AG (2002). Isolation and the Origin of the Khoisan: Late Pleistocene and Early Holocene Human 
Evolution at the Southern End of Africa. Hum Evol 17: 231-240 
-  Morris AG (2003). The Myth of the East African 'Bushmen'. S Afr Arch Bull 58: 85-90 
-  Morris AG (2005). Prehistory in blood and bone: An essay on the reconstruction of the past from genetics 
and morphology. Transactions of the Royal Society of South Africa 60: 111-114 
-  Morris AG (2008). Searching for 'real' Hottentots: the Khoekhoe in the history of South African physical 
anthropology Southern African Humanities 20: 221-233 
-  Morris AG and Ribot I (2006). Morphometric cranial identity of prehistoric Malawians in the light of sub-
 Saharan African diversity. Am J Phys Anthropol 130: 10-25 
 303 
-  Murdock GP (1967). Ethnographic atlas. Pittsburgh (PA), University of Pittsburgh Press 
-  Murdock GP (1981). Atlas of World Cultures. Pittsburgh (PA), University of Pittsburgh Press 
-  Naidoo T, Schlebusch CM, Makkan H, Patel P, Mahabeer R, Erasmus JC and Soodyall H (Unpublished). 
Development of a single base extension method to resolve Y chromosome haplogroups in sub-Saharan 
African populations.  
-  Nebel A, Filon D, Brinkmann B, Majumder PP, Faerman M and Oppenheim A (2001). The Y chromosome 
pool of Jews as part of the genetic landscape of the Middle East. Am J Hum Genet 69: 1095-112 
-  Nei M (1987). Molecular Evolutionary Genetics. New York, USA, Columbia University Press 
-  Nei M and Livshits G (1989). Genetic relationships of Europeans, Asians and Africans and the origin of 
modern Homo sapiens. Hum Hered 39: 276-81 
-  Nelson RM (2006). S-Compare. 2006-2008. 
-  Newman JL (1995). The peopling of Africa. New Haven, CT, Yale University Press 
-  Niu T, Qin ZS, Xu X and Liu JS (2002). Bayesian haplotype inference for multiple linked single-nucleotide 
polymorphisms. Am J Hum Genet 70: 157-69 
-  Nurse GT (1983). Population movement around the northern Kalahari. African Studies 42: 153-63 
-  Nurse GT and Jenkins T (1977). Serogenetic studies on the Kavango peoples of South West Africa. Ann 
Hum Biol 4: 465-78 
-  Nurse GT, Lane AB and Jenkins T (1976). Sero-genetic studies on the Dama of South West Africa. Ann 
Hum Biol 3: 33-50 
-  Nurse GT, Weiner JS and Jenkins T (1985). The Peoples of Southern Africa and their Affinities. New York, 
Oxford University Press 
-  Oliver MA and Webster R (1990). Kriging: a method of interpolation for geographical information systems. 
International Journal of Geographical Information Systems 4: 313 
-  Parkington JE (1984). Soaqua and Bushmen: hunters and robbers. Past and present in hunter-gatherer 
studies. Schrire C. New York, Academic Press: 151-174. 
-  Parkington JE, Yates R, Manhire A and Halkett D (1986). The social impact of pastoralism in the 
southwestern Cape. . Journal of Anthropological Archaeology 5: 313-329 
-  Passarino G, Semino O, Quintana-Murci L, Excoffier L, Hammer M and Santachiara-Benerecetti AS (1998). 
Different genetic components in the Ethiopian population, identified by mtDNA and Y-chromosome 
polymorphisms. Am J Hum Genet 62: 420-34 
-  Penn N (1996). "Fated to Perish": The Destruction of the Cape San. Miscast. Negotiating the Presence of 
the Bushmen. Skotnes P. Cape Town, UCT Press: 81-91. 
-  Pereira L, Macaulay V, Torroni A, Scozzari R, Prata MJ and Amorim A (2001). Prehistoric and historic 
traces in the mtDNA of Mozambique: insights into the Bantu expansions and the slave trade. Ann Hum Genet 
65: 439-58 
-  Phillipson D (1993). African Archaeology. Cambridge, UK, Cambridge Univ Press 
-  Pijper A (1932). Blood-groups of Bushmen. S Afr Med J 6: 35-37 
-  Pijper A (1935). Blood groups in the Hottentots. S Afr Med J 9: 192-195 
-  Pilkington MM, Wilder JA, Mendez FL, Cox MP, Woerner A, Angui T, Kingan S, et al., (2008). Contrasting 
signatures of population growth for mitochondrial DNA and Y chromosomes among human populations in 
Africa. Mol Biol Evol 25: 517-25 
-  Polzin T and Daneschmand SV (2003). On Steiner trees and minimum spanning trees in hypergraphs. 
Operations Res Lett 31: 12?20 
-  Posada D and Crandall KA (1998). MODELTEST: testing the model of DNA substitution. Bioinformatics 14: 
817-8 
-  Potgieter EF (1955). The disappearing Bushmen of Lake Chrissie: a preliminary survey. Pretoria, J.L. van 
Schaick 
-  Prins F (Unknown). A glimpse into Bushman presence in the Anglo Boer War. 
http://www.chrissiesmeer.co.za/the_sun.html 
-  Pritchard JK, Seielstad MT, Perez-Lezaun A and Feldman MW (1999). Population growth of human Y 
chromosomes: a study of Y chromosome microsatellites. Mol Biol Evol 16: 1791-8 
-  Pritchard JK, Stephens M and Donnelly P (2000). Inference of population structure using multilocus 
genotype data. Genetics 155: 945-59 
-  Przeworski M, Hudson RR and Di Rienzo A (2000). Adjusting the focus on human variation. Trends Genet 
16: 296-302 
-  Qamar R, Ayub Q, Khaliq S, Mansoor A, Karafet T, Mehdi SQ and Hammer MF (1999). African and 
Levantine origins of Pakistani YAP+ Y chromosomes. Hum Biol 71: 745-55 
 304 
-  Qamar R, Ayub Q, Mohyuddin A, Helgason A, Mazhar K, Mansoor A, Zerjal T, et al., (2002). Y-
 chromosomal DNA variation in Pakistan. Am J Hum Genet 70: 1107-24 
-  Quintana-Murci L, Quach H, Harmant C, Luca F, Massonnet B, Patin E, Sica L, et al., (2008). Maternal 
traces of deep common ancestry and asymmetric gene flow between Pygmy hunter-gatherers and Bantu-
 speaking farmers. Proc Natl Acad Sci U S A 105: 1596-601 
-  R-Project (2006). The R-Project for statistical computing, CRAN project. 2006-2009. 
-  Rambaut A and Drummond AJ (2007). Tracer v1.4. 
-  Ramirez-Soriano A, Ramos-Onsins SE, Rozas J, Calafell F and Navarro A (2008). Statistical power 
analysis of neutrality tests under demographic expansions, contractions and bottlenecks with recombination. 
Genetics 179: 555-67 
-  Ramos-Onsins SE and Rozas J (2002). Statistical properties of new neutrality tests against population 
growth. Mol Biol Evol 19: 2092-100 
-  Raymond M and Rousset F (1995). An exact test for population differentiation. Evolution Int J Org Evolution 
49: 1280-1283 
-  Reed FA and Tishkoff SA (2006). African human diversity, origins and migrations. Curr Opin Genet Dev 16: 
597-605 
-  Reynolds J, Weir BS and Cockerham CC (1983). Estimation of the Coancestry Coefficient: Basis for a 
Short-Term Genetic Distance. Genetics 105: 767-779 
-  Richards M, Corte-Real H, Forster P, Macaulay V, Wilkinson-Herbots H, Demaine A, Papiha S, et al., 
(1996). Paleolithic and neolithic lineages in the European mitochondrial gene pool. Am J Hum Genet 59: 185-
 203 
-  Risch N, Burchard E, Ziv E and Tang H (2002). Categorization of humans in biomedical research: genes, 
race and disease. Genome Biol 3: comment2007 
-  Rogers AR and Harpending H (1992). Population growth makes waves in the distribution of pairwise 
genetic differences. Mol Biol Evol 9: 552-69 
-  Romualdi C, Balding D, Nasidze IS, Risch G, Robichaux M, Sherry ST, Stoneking M, et al., (2002). 
Patterns of human diversity, within and among continents, inferred from biallelic DNA polymorphisms. 
Genome Res 12: 602-12 
-  Rosenberg NA (2002). Distruct: a program for the graphical display of structure results. 
-  Rosenberg NA, Mahajan S, Ramachandran S, Zhao C, Pritchard JK and Feldman MW (2005). Clines, 
Clusters, and the Effect of Study Design on the Inference of Human Population Structure. PLoS Genet 1: e70 
-  Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA and Feldman MW (2002). 
Genetic structure of human populations. Science 298: 2381-5 
-  Rosser ZH, Zerjal T, Hurles ME, Adojaan M, Alavantic D, Amorim A, Amos W, et al., (2000). Y-
 chromosomal diversity in Europe is clinal and influenced primarily by geography, rather than by language. Am 
J Hum Genet 67: 1526-43 
-  Rozas J, Sanchez-DelBarrio JC, Messeguer X and Rozas R (2003). DnaSP, DNA polymorphism analyses 
by the coalescent and other methods. Bioinformatics 19: 2496-7 
-  Rozen S and Skaletsky H (2000). Primer3 on the WWW for general users and for biologist programmers. 
Methods Mol Biol 132: 365-86 
-  Ruiz-Pesini E, Mishmar D, Brandon M, Procaccio V and Wallace DC (2004). Effects of purifying and 
adaptive selection on regional variation in human mtDNA. Science 303: 223-6 
-  Sadr K (1997). Archaeology and the Bushman Debate. Curr Anthropol 38: 104-112 
-  Sadr K (1998). The First Herders at the Cape of Good Hope. Afr Archaeol Rev 15: 101-132 
-  Saillard J, Forster P, Lynnerup N, Bandelt HJ and Norby S (2000). mtDNA variation among Greenland 
Eskimos: the edge of the Beringian expansion. Am J Hum Genet 67: 718-26 
-  Salas A, Richards M, De la Fe T, Lareu MV, Sobrino B, Sanchez-Diz P, Macaulay V, et al., (2002). The 
making of the African mtDNA landscape. Am J Hum Genet 71: 1082-111 
-  Sands B (1998). Language, Identity and Conceptualization Among the Khoisan. K?ln, Rudiger Kupper. Bd 
15: 266?283. 
-  Sands BE, Miller AL and Brugman J (2007). The Lexicon in Language Attrition: The Case of N|uu. Selected 
Proceedings of the 37th Annual Conference on African Linguistics. Payne D L and Pe?a J. Somerville, MA, 
Cascadilla Proceedings Project: 55-65. 
-  Santos FR, Pandya A, Tyler-Smith C, Pena SD, Schanfield M, Leonard WR, Osipova L, et al., (1999). The 
central Siberian origin for native American Y chromosomes. Am J Hum Genet 64: 619-28 
-  Schapera I (1930). The Khoisan Peoples of South Africa: Bushmen and Hottentots. London, George 
Routledge and Sons 
 305 
-  Scheet P and Stephens M (2006). A fast and flexible statistical model for large-scale population genotype 
data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78: 629-44 
-  Schlebusch CM, Naidoo T and Soodyall H (2009). SNaPshot minisequencing to resolve mitochondrial 
macro-haplogroups found in Africa. Electrophoresis 30: 3657-64 
-  Schneider S and Excoffier L (1999). Estimation of past demographic parameters from the distribution of 
pairwise differences when the mutation rates vary among sites: application to human mitochondrial DNA. 
Genetics 152: 1079-89 
-  Schultze L, Ed. (1928). Zur Kenntnis des Korpers der Hottentotten und Buschmanner. Zoologische und 
Anthropologische Ergebnisse einer Forschungsreise im westlichen und zentralen Sudafrika. 
-  Scozzari R, Cruciani F, Malaspina P, Santolamazza P, Ciminelli BM, Torroni A, Modiano D, et al., (1997). 
Differential structuring of human populations for homologous X and Y microsatellite loci. Am J Hum Genet 61: 
719-33 
-  Scozzari R, Cruciani F, Pangrazio A, Santolamazza P, Vona G, Moral P, Latini V, et al., (2001). Human Y-
 chromosome variation in the western Mediterranean area: implications for the peopling of the region. Hum 
Immunol 62: 871-84 
-  Scozzari R, Cruciani F, Santolamazza P, Malaspina P, Torroni A, Sellitto D, Arredi B, et al., (1999). 
Combined use of biallelic and microsatellite Y-chromosome polymorphisms to infer affinities among African 
populations. Am J Hum Genet 65: 829-46 
-  Sealy J and Yates R (1994). The chronology of the introduction of pastoralism to the Cape, South Africa. 
Antiquity 68 58-67 
-  Seielstad MT, Hebert JM, Lin AA, Underhill PA, Ibrahim M, Vollrath D and Cavalli-Sforza LL (1994). 
Construction of human Y-chromosomal haplotypes using a new polymorphic A to G transition. Hum Mol 
Genet 3: 2159-61 
-  Seielstad MT, Minch E and Cavalli-Sforza LL (1998). Genetic evidence for a higher female migration rate in 
humans. Nat Genet 20: 278-80 
-  Semino O, Magri C, Benuzzi G, Lin AA, Al-Zahery N, Battaglia V, Maccioni L, et al., (2004). Origin, 
diffusion, and differentiation of Y-chromosome haplogroups E and J: inferences on the neolithization of 
Europe and later migratory events in the Mediterranean area. Am J Hum Genet 74: 1023-34 
-  Semino O, Passarino G, Oefner PJ, Lin AA, Arbuzova S, Beckman LE, De Benedictis G, et al., (2000). The 
genetic legacy of Paleolithic Homo sapiens sapiens in extant Europeans: a Y chromosome perspective. 
Science 290: 1155-9 
-  Semino O, Santachiara-Benerecetti AS, Falaschi F, Cavalli-Sforza LL and Underhill PA (2002). Ethiopians 
and Khoisan share the deepest clades of the human Y-chromosome phylogeny. Am J Hum Genet 70: 265-8 
-  Semino O, Torroni A, Scozzari R, Brega A and Santachiara Benerecetti AS (1991). Mitochondrial DNA 
polymorphisms among Hindus: a comparison with the Tharus of Nepal. Ann Hum Genet 55 ( Pt 2): 123-36 
-  Shapiro B, Drummond AJ, Rambaut A, Wilson MC, Matheus PE, Sher AV, Pybus OG, et al., (2004). Rise 
and fall of the Beringian steppe bison. Science 306: 1561-5 
-  Sharp J and Douglas S (1996). Prisoners of their Reputation? The Veterans of the 'Bushman' Battalions in 
South Africa. Miscast. Negotiating the Presence of the Bushmen. Skotnes P. Cape Town, UCT Press: 323-
 329. 
-  Shen P, Wang F, Underhill PA, Franco C, Yang WH, Roxas A, Sung R, et al., (2000). Population genetic 
implications from sequence variation in four Y chromosome genes. Proc Natl Acad Sci U S A 97: 7354-9 
-  Sherry ST, Rogers AR, Harpending H, Soodyall H, Jenkins T and Stoneking M (1994). Mismatch 
distributions of mtDNA reveal recent human population expansions. Hum Biol 66: 761-75 
-  Silberbauer GB (1965). Report to the Government of Bechuanaland on the Bushman Survey. Gabarone, 
Bechuanaland Government 
-  Slatkin M (1995). Hitchhiking and associative overdominance at a microsatellite locus. Mol Biol Evol 12: 
473-80 
-  Smith A (2005). The concepts of 'Neolithic' and 'Neolithisation' for Africa? Before Farming 1: 1- 6 
-  Smith A, Malherbe C, Guenther M and Berens P (2000). The Bushmen of Southern Africa. Cape Town, 
David Philips Publishers 
-  Smith AB (1983). Prehistoric Pastoralism in the Southwestern Cape, South Africa. World Archaeology 15: 
79-89 
-  Smith AB (1986). Competition, Conflict and Clientship: Khoi and San Relationships in the Western Cape. 
Goodwin Series 5: 36-41 
-  Smith AB (1992). Origins and Spread of Pastoralism in Africa. Annual Review of Anthropology 21: 125-141 
 306 
-  Smith AB (1995). Einiqualand: Studies of the Orange River Frontier. Cape Town, University of Cape Town 
Press 
-  Smith AB, Sadr K, Gribble J and Yates R (1991). Excavations in the South-Western Cape, South Africa, 
and the Archaeological Identity of Prehistoric Hunter-Gatherers within the Last 2000 Years. The South 
African Archaeological Bulletin 46: 71-91 
-  Smith BW (2006). Reading rock art and writing genetic history. The Prehistory of Africa - Tracing the 
lineage of modern man. Soodyall H. Johannesburg & Cape Town, Jonathan Ball Publishers: 76-96. 
-  Soodyall H and Jenkins T (1992). Mitochondrial DNA polymorphisms in Khoisan populations from southern 
Africa. Ann Hum Genet 56 ( Pt 4): 315-24 
-  Soodyall H, Vigilant L, Hill AV, Stoneking M and Jenkins T (1996). mtDNA control-region sequence 
variation suggests multiple independent origins of an "Asian-specific" 9-bp deletion in sub-Saharan Africans. 
Am J Hum Genet 58: 595-608 
-  Stephens M, Smith NJ and Donnelly P (2001). A new statistical method for haplotype reconstruction from 
population data. Am J Hum Genet 68: 978-89 
-  Steyn HP (1984). Southern Kalahari San Subsistence Ecology: A Reconstruction. The South African 
Archaeological Bulletin 39: 117-124 
-  Stoneking M (2000). Hypervariable sites in the mtDNA control region are mutational hotspots. Am J Hum 
Genet 67: 1029-32 
-  Stoneking M and Soodyall H (1996). Human evolution and the mitochondrial genome. Curr Opin Genet Dev 
6: 731-6 
-  Stow GW (1905). The native races of South Africa. London, Swan Sonnen-schein 
-  Stynder DD (2009). Craniometric evidence for South African Later Stone Age herders and hunter-gatherers 
being a single biological population. Journal of Archaeological Science 36: 798-806 
-  Stynder DD, Ackermann RR and Sealy JC (2007a). Craniofacial variation and population continuity during 
the South African Holocene. Am J Phys Anthropol 134: 489-500 
-  Stynder DD, Ackermann RR and Sealy JC (2007b). Early to mid-Holocene South African Later Stone Age 
human crania exhibit a distinctly Khoesan morphological pattern. S Afr J Sci 103: 349-352 
-  Swofford DL (1998). PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. 
Sunderland, Massachusetts, Sinauer Associates 
-  Tajima F (1989). Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. 
Genetics 123: 585-95 
-  Tajima F (1996). The amount of DNA polymorphism maintained in a finite population when the neutral 
mutation rate varies among sites. Genetics 143: 1457-65 
-  Tamura K, Dudley J, Nei M and Kumar S (2007). MEGA4: Molecular Evolutionary Genetics Analysis 
(MEGA) software version 4.0. Mol Biol Evol 24: 1596-9 
-  Tamura K and Nei M (1993). Estimation of the number of nucleotide substitutions in the control region of 
mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10: 512-26 
-  Ten Raa R (1970). The couth and the uncouth: ethnic, social and linguistic division among the Sandawe of 
central Tanzania. Anthropos 65: 127-153 
-  Thomas MG, Bradman N and Flinn HM (1999). High throughput analysis of 10 microsatellite and 11 
diallelic polymorphisms on the human Y-chromosome. Hum Genet 105: 577-81 
-  Thompson JD, Higgins DG and Gibson TJ (1994). CLUSTAL W: improving the sensitivity of progressive 
multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix 
choice. Nucleic Acids Res 22: 4673-80 
-  Thomson R, Pritchard JK, Shen P, Oefner PJ and Feldman MW (2000). Recent common ancestry of 
human Y chromosomes: evidence from DNA sequence data. Proc Natl Acad Sci U S A 97: 7360-5 
-  Tishkoff SA, Gonder MK, Henn BM, Mortensen H, Knight A, Gignoux C, Fernandopulle N, et al., (2007). 
History of click-speaking populations of Africa inferred from mtDNA and Y chromosome genetic variation. Mol 
Biol Evol 24: 2180-95 
-  Tishkoff SA, Reed FA, Friedlaender FR, Ehret C, Ranciaro A, Froment A, Hirbo JB, et al., (2009). The 
genetic structure and history of Africans and African Americans. Science 324: 1035-44 
-  Tobias PV (1985). History of physical anthropology in Southern Africa. Am J Phys Anthropol 28: 1-52 
-  Torroni A, Achilli A, Macaulay V, Richards M and Bandelt HJ (2006). Harvesting the fruit of the human 
mtDNA tree. Trends Genet 22: 339-45 
-  Torroni A, Bandelt HJ, D'Urbano L, Lahermo P, Moral P, Sellitto D, Rengo C, et al., (1998). mtDNA analysis 
reveals a major late Paleolithic population expansion from southwestern to northeastern Europe. Am J Hum 
Genet 62: 1137-52 
 307 
-  Torroni A, Huoponen K, Francalacci P, Petrozzi M, Morelli L, Scozzari R, Obinu D, et al., (1996). 
Classification of European mtDNAs from an analysis of three European populations. Genetics 144: 1835-50 
-  Torroni A, Rengo C, Guida V, Cruciani F, Sellitto D, Coppa A, Calderon FL, et al., (2001). Do the four 
clades of the mtDNA haplogroup L2 evolve at different rates? Am J Hum Genet 69: 1348-56 
-  Traill A (1973). 'N4 or S7': another Bushman language. African Studies 32: 25-32 
-  Traill A (1996). !Khwa-Ka Hhouiten Hhouiten - "The Rush of the Storm" : The linguistic death of /Xam. 
Miscast. Negotiating the Presence of the Bushmen. Skotnes P. Cape Town, UCT Press: 171-183. 
-  Traunm?ller H (2003). Clicks and the idea of a human protolanguage. Phonum 9: 1-4 
-  Underhill PA, Jin L, Lin AA, Mehdi SQ, Jenkins T, Vollrath D, Davis RW, et al., (1997). Detection of 
numerous Y chromosome biallelic polymorphisms by denaturing high-performance liquid chromatography. 
Genome Res 7: 996-1005 
-  Underhill PA and Kivisild T (2007). Use of y chromosome and mitochondrial DNA population structure in 
tracing human migrations. Annu Rev Genet 41: 539-64 
-  Underhill PA, Passarino G, Lin AA, Shen P, Mirazon Lahr M, Foley RA, Oefner PJ, et al., (2001). The 
phylogeography of Y chromosome binary haplotypes and the origins of modern human populations. Ann Hum 
Genet 65: 43-62 
-  Underhill PA, Shen P, Lin AA, Jin L, Passarino G, Yang WH, Kauffman E, et al., (2000). Y chromosome 
sequence variation and the history of human populations. Nat Genet 26: 358-61 
-  Wadley L (2007). The Middle Stone Age and Later Stone Age. A Search for Origins: Science, History and 
South Africa's 'Cradle of Humankind'. Bonner P, Esterhuysen A and Jenkins T. Johannesburg, Wits 
University Press: 122-135. 
-  Walker NJ (1995). The archaeology of the San: the Late stone age of Botswana. Speaking for the 
Bushmen. Sanders A J G M. Gabarone, The Botswana Society: 54-87. 
-  Wallace DC (1995). 1994 William Allan Award Address. Mitochondrial DNA variation in human evolution, 
degenerative disease, and aging. Am J Hum Genet 57: 201-23 
-  Vallone PM and Butler JM (2004). AutoDimer: a screening tool for primer-dimer and hairpin structures. 
Biotechniques 37: 226-31 
-  Walter RC, Buffler RT, Bruggemann JH, Guillaume MM, Berhe SM, Negassi B, Libsekal Y, et al., (2000). 
Early human occupation of the Red Sea coast of Eritrea during the last interglacial. Nature 405: 65-9 
-  Vansina JC (1990). Paths in the rainforest. Towards a history of political tradition in equatorial Africa. 
London, Currey 
-  Ward RH, Frazier BL, Dew-Jager K and Paabo S (1991). Extensive mitochondrial diversity within a single 
Amerindian tribe. Proc Natl Acad Sci U S A 88: 8720-4 
-  Watson E, Forster P, Richards M and Bandelt HJ (1997). Mitochondrial footprints of human expansions in 
Africa. Am J Hum Genet 61: 691-704 
-  Weir BS (1996a). Genetic data analysis II. Sunderland, MA, Sinauer Associates, Inc: 141-150. 
-  Weir BS (1996b). Genetic data analysis II. Sunderland, MA, Sinauer Associates, Inc 
-  Westphal EOJ (1963). The Linguistic Prehistory of Southern Africa: Bush, Kwadi, Hottentot, and Bantu 
Linguistic Relationships. Africa: Journal of the International African Institute 33: 237-265 
-  Westphal EOJ (1971). The click languages of southern and eastern Africa. Current trends in linguistics, 7: 
Linguistics in Sub-Saharan Africa. Sebeok T A. The Hague, Mouton: 367-420. 
-  Westphal EOJ (1974). Notes on A. 'Traill: N4 or S7?' (with a reply by A. Traill). African Studies 33: 243-255 
-  White TD, Asfaw B, DeGusta D, Gilbert H, Richards GD, Suwa G and Howell FC (2003). Pleistocene Homo 
sapiens from Middle Awash, Ethiopia. Nature 423: 742-7 
-  Whitfield LS, Sulston JE and Goodfellow PN (1995). Sequence variation of the human Y chromosome. 
Nature 378: 379-80 
-  Vigilant L, Pennington R, Harpending H, Kocher TD and Wilson AC (1989). Mitochondrial DNA sequences 
in single hairs from a southern African population. Proc Natl Acad Sci U S A 86: 9350-4 
-  Vigilant L, Stoneking M, Harpending H, Hawkes K and Wilson AC (1991). African populations and the 
evolution of human mitochondrial DNA. Science 253: 1503-7 
-  Wilder JA, Mobasher Z and Hammer MF (2004). Genetic evidence for unequal effective population sizes of 
human females and males. Mol Biol Evol 21: 2047-57 
-  Wilmsen EN (1989). Land filled with flies: A political economy of the Kalahari. Chicago, Chicago University 
Press 
-  Wilmsen EN, Denbow JR, Bicchieri MG, Binford LR, Gordon R, Guenther M, Lee RB, et al., (1990). 
Paradigmatic History of San-Speaking Peoples and Current Attempts at Revision [and Comments and 
Replies]. Curr Anthropol 31: 489-524 
 308 
-  Wilson IJ and Balding DJ (1998). Genealogical inference from microsatellite data. Genetics 150: 499-510 
-  Vinnicombe P (1976). People of the eland: rockpaintings of the Drakensberg Bushmen as a reflection of 
their life and thought. Pietermaritzburg, University of Natal Press 
-  Vogel JO (1994). Eastern and south-central African Iron Age. Encyclopedia of precolonial Africa. Vogel J O. 
Walnut Creek, Alta-Mira Press: 439?444. 
-  Wood ET, Stover DA, Ehret C, Destro-Bisol G, Spedini G, McLeod H, Louie L, et al., (2005). Contrasting 
patterns of Y chromosome and mtDNA variation in Africa: evidence for sex-biased demographic processes. 
Eur J Hum Genet 13: 867-76 
-  Vossen R (1998). Historical Classification of Khoe (Central Khoisan) Languages of Southern Africa. African 
Studies 57: 93-106 
-  Wright JB (1971). Bushman raiders of the Drakersberg, 1840-1870. Pietermaritzburg, University of Natal 
Press 
-  Xue FZ, Wang JZ, Hu P and Li GR (2005). The "Kriging" model of spatial genetic structure in human 
population genetics. Yi Chuan Xue Bao 32: 219-33 
-  Yao YG, Kong QP, Man XY, Bandelt HJ and Zhang YP (2003). Reconstructing the evolutionary history of 
China: a caveat about inferences drawn from ancient DNA. Mol Biol Evol 20: 214-9 
-  YCC (2002). A nomenclature system for the tree of human Y-chromosomal binary haplogroups. Genome 
Res 12: 339-48 
-  Zhivotovsky LA, Underhill PA, Cinnioglu C, Kayser M, Morar B, Kivisild T, Scozzari R, et al., (2004). The 
effective mutation rate at Y chromosome short tandem repeats, with application to human population-
 divergence time. Am J Hum Genet 74: 50-61 
-  Ziervogel D (1955). Notes on the language of the Eastern Transvaal Bushmen. The disappearing Bushmen 
of Lake Chrissie: a preliminary survey. Potgieter E F. Pretoria, J.L. van Schaick. 
-  Zoutendyk A, Kopec AC and Mourant AE (1955). The blood groups of the Hottentot. Am J Phys Anthropol 
13: 691-698 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 309 
9. APPENDICES 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 310 
Appendix A: Ethics approval 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 311 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 312 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 313 
Appendix B: Recipes for reagents and solutions used 
 
 
Sucrose-Triton X Lysing buffer 
10 ml 1 M Tris-HCl pH8 
5 ml 1 M MgCl2 
10 ml Triton-X 100 
Make up to 1 L with dH2O and autoclave 
Add 109.5 g sucrose just before use 
Keep chilled at 4?C 
 
1 M Tris-HCl 
121.1 g Tris 
1 L dH2O 
Autoclave 
 
1 M MgCl2 
101.66 g MgCl2 
500 ml dH2O 
Autoclave 
 
T20E5 
20 ml 1M Tris-HCl 
10 ml 0.5M EDTA pH8 
Make up to 1 L with dH2O and autoclave 
 
0.5 M EDTA 
93.06 g EDTA 
500 ml dH2O 
pH to 8.0 with NaOH and autoclave 
 
10% SDS 
10 g SDS 
100 ml dH2O 
Autoclave 
 
Proteinase K (10 mg/ml) 
100 mg Proteinase K stock (100 mg/ml)* 
10 ml ddH20 
*Available from Roche Diagnostics 
 
Proteinase-K mix 
For 16 extractions: 
400 ?l 10% SDS 
16 ?l 0.5 M EDTA 
2.8 ml autoclaved dH2O 
Add 800 ?l Proteinase K (10 mg/ml stock) just before use 
 314 
 
Saturated NaCl 
100 ml autoclaved dH2O 
Slowly add 40 g NaCl until absolutely saturated (some NaCl will precipitate out) 
Before use, agitate and let NaCl precipitate out 
 
1 X TE buffer 
10 ml 1 M Tris-HCl pH8 
2 ml 0.5 M EDTA 
Make up to 1 L with dH2O and autoclave 
 
10 X TBE buffer 
108 g Tris 
55 g Boric acid 
7.44 g EDTA  
Make up to 1 L with dH2O and autoclave 
 
1 X TBE (1:10 dilution) 
40 ml 10 X TBE 
Make up to 200ml with ddH20 
 
Bromophenol blue Ficoll dye 
50 ml dH2O 
50 g sucrose 
1.86 g EDTA 
0.1 g bromophenol blue 
10 g Ficoll  
Dissolve 
Adjust volume to 100 ml with dH2O, stir overnight 
pH to 8.0   
Filter through Whatmann filter paper 
Store at room temperature 
 
10 mg/ml Ethidium bromide (EtBr) 
Add 1 g of ethidium bromide to  
100 ml of ddH2O 
Stir for several hours until completely dissolved 
Store wrapped in aluminum foil at 4?C 
 
1kb size standard 
285 ?l 1kb ladder (GibcoBRL)  
143 ?l Ficoll dye  
2 400 ?l 1 X TE 
 
 
 
 
 315 
 
10 mg/ml BSA 
1 g BSA  
10 ml ddH2O 
Aliquot into 1 ml amounts and store at 20?C 
 
2.5mM dNTPs 
Use 100 mM premade stocks of dATP, dGTP, dCTP and dTTP (GibcoBRL) 
10 ?l of each stock dNTP + 360 ?l sterile ddH2O = 400 ?l of 2.5 mM dNTPs 
 
2.5 mM Spermidine (Sigma) 
Add 6 887 ml ddH2O to 1 g of Spermidine to make a 1 M stock  
Dilute 1 in 400 to 2.5 mM for use 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 316 
Appendix C: Physical distance matrix (in km) between Khoe-San and 
Coloured groups 
 
 
 CAC COL CNC GUG JOH KAR KHO KWE NAM XUN 
CAC 0.000          
COL 659.7942 0.000         
CNC 761.8532 591.0919 0.000        
GUG 1241.469 787.2487 537.4522 0.000       
JOH 1570.437 1321.059 821.7263 624.5947 0.000      
KAR 659.7942 0.000 591.0919 787.2487 1321.059 0.000     
KHO 761.8532 591.0919 0.000 537.4522 821.7263 591.0919 0.000    
KWE 1854.39 1501.823 1092.547 722.4106 359.1735 1501.823 1092.547 0.000   
NAM 1248.44 1208.281 618.8586 787.1903 484.7312 1208.281 618.8586 843.9047 0.000  
XUN 2120.967 1944.929 1412.054 1244.793 629.6031 1944.929 1412.054 629.6031 884.9136 0.000 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 317 
Appendix D: Details of SNP used in autosomal analyses 
 
 
 
Name in 
thesis 
Chromo- 
some 
Group on 
chromosome SNP ID (hCV) SNP ID (rs) 
Base 
Position 
Distance from 
previous marker 
Yoruba 
MAF 
Afr American 
MAF 
 chr01-1-1 1 1 hCV29985869 rs7523071 185581438  32 N/A 
* chr01-1-2 1 1 hCV8349921 rs1445667 185582609 1171 32 N/A 
 chr01-1-3 1 1 hCV8352908 rs1445670 185587536 4927 42 N/A 
 chr01-1-4 1 1 hCV30688593 rs6660605 185594475 6939 32 N/A 
 chr01-1-5 1 1 hCV30688596 rs6666285 185594691 216 44 N/A 
 chr01-2-1 1 2 hCV26908697 rs6702432 243839090  26 N/A 
 chr01-2-2 1 2 hCV28021091 rs7366424 243845655 6565 27 N/A 
 chr01-2-3 1 2 hCV30447094 rs7555211 243848391 2736 36 N/A 
 chr01-2-4 1 2 hCV12075636 rs1954187 243851004 2613 9 N/A 
* chr01-2-5 1 2 hCV30382617 rs10399826 243861576 10572 38 N/A 
 chr02-1-1 2 1 hCV15781272 rs2373901 40769550  50 N/A 
 chr02-1-2 2 1 hCV8809743 rs882007 40781807 12257 20 26 
 chr02-1-3 2 1 hCV29048038 rs6755751 40782253 446 47 N/A 
 chr02-1-4 2 1 hCV1296264 rs3851315 40785189 2936 37 N/A 
 chr02-1-5 2 1 hCV26252361 rs11124754 40787764 2575 32 N/A 
 chr02-2-1 2 2 hCV29410368 rs6743609 78370721  27 N/A 
 chr02-2-2 2 2 hCV29410366 rs6715934 78379067 8346 23 N/A 
 chr02-2-3 2 2 hCV16127289 rs2839828 78382969 3902 39 N/A 
 chr02-2-4 2 2 hCV11464467 rs1837144 78383601 632 50 N/A 
 chr02-2-5 2 2 hCV11464466 rs1816652 78388857 5256 17 N/A 
 chr03-1-1 3 1 hCV11749718 rs1987888 4053654  24 N/A 
 chr03-1-2 3 1 hCV8827003 rs1087817 4063576 9922 33 N/A 
 chr03-1-3 3 1 hCV626367 rs317575 4063809 233 N/A N/A 
 chr03-1-4 3 1 hCV626362 rs317530 4069293 5484 34 40 
 chr03-1-5 3 1 hCV626353 rs317534 4074043 4750 49 N/A 
 chr03-2-1 3 2 hCV27956340 rs4624549 189144204  48 N/A 
 chr03-2-2 3 2 hCV3244174 rs2590451 189147479 3275 42 N/A 
 chr03-2-3 3 2 hCV1058808 rs567713 189151423 3944 47 N/A 
 chr03-2-4 3 2 hCV15917716 rs2679506 189154725 3302 28 N/A 
 chr03-2-5 3 2 hCV3244161 rs522833 189160082 5357 27 N/A 
 chr04-1-1 4 1 hCV2967242 rs9998475 13325188  26 N/A 
 chr04-1-2 4 1 hCV2967234 rs1352786 13326354 1166 26 42 
 chr04-1-3 4 1 hCV1192506 rs1948354 13334081 7727 26 N/A 
 chr04-1-4 4 1 hCV1192503 rs6837122 13335534 1453 24 41 
 chr04-1-5 4 1 hCV7562999 rs1032358 13338502 2968 19 N/A 
 chr04-2-1 4 2 hCV29608728 rs10084822 172054953  29 N/A 
 chr04-2-2 4 2 hCV30204213 rs9312493 172061519 6566 31 N/A 
 chr04-2-3 4 2 hCV30114165 rs10004230 172066255 4736 17 N/A 
 chr04-2-4 4 2 hCV8242322 rs1403213 172075840 9585 43 N/A 
* chr04-2-5 4 2 hCV30600558 rs10002204 172096780 20940 21 N/A 
 chr05-1-1 5 1 hCV7447360 rs1366370 66593667  39 N/A 
 chr05-1-2 5 1 hCV27915872 rs755877 66593979 312 45 N/A 
 chr05-1-3 5 1 hCV7447351 rs1593948 66594316 337 47 N/A 
 chr05-1-4 5 1 hCV11824955 rs7715561 66598715 4399 37 N/A 
 chr05-1-5 5 1 hCV2937282 rs919308 66604140 5425 17 N/A 
* chr05-2-1 5 2 hCV26117944 rs165073 163963822  31 N/A 
 chr05-2-2 5 2 hCV7522487 rs1363174 163978188 14366 N/A N/A 
 chr05-2-3 5 2 hCV1393057 rs250597 163980289 2101 30 N/A 
 318 
 chr05-2-4 5 2 hCV30220715 rs10515884 163985604 5315 41 N/A 
 chr05-2-5 5 2 hCV7522494 rs1421905 163990354 4750 38 N/A 
 chr06-1-1 6 1 hCV30355724 rs9505359 809219  22 N/A 
 chr06-1-2 6 1 hCV1819928 rs884126 815244 6025 27 N/A 
 chr06-1-3 6 1 hCV8773399 rs885450 815563 319 N/A N/A 
 chr06-1-4 6 1 hCV1819934 rs873560 820559 4996 24 N/A 
 chr06-1-5 6 1 hCV1819941 rs6916756 825467 4908 23 N/A 
 chr06-2-1 6 2 hCV30164637 rs6912046 79193277  45 N/A 
 chr06-2-2 6 2 hCV15868784 rs2223722 79197714 4437 46 N/A 
 chr06-2-3 6 2 hCV7546896 rs926654 79202638 4924 36 43 
 chr06-2-4 6 2 hCV30416745 rs9361404 79205477 2839 21 N/A 
 chr06-2-5 6 2 hCV29496547 rs9448411 79208314 2837 32 N/A 
 chr07-1-1 7 1 hCV3253650 rs2592859 35206935  31 N/A 
 chr07-1-2 7 1 hCV1071172 rs731015 35212110 5175 25 N/A 
 chr07-1-3 7 1 hCV16249550 rs2541911 35216715 4605 37 N/A 
 chr07-1-4 7 1 hCV16249554 rs2250212 35221258 4543 7 N/A 
 chr07-1-5 7 1 hCV3253622 rs2592848 35230892 9634 22 N/A 
 chr07-2-1 7 2 hCV30792597 rs7806350 144859843  49 N/A 
 chr07-2-2 7 2 hCV7434566 rs1523729 144867554 7711 27 37 
 chr07-2-3 7 2 hCV15843844 rs2888245 144871885 4331 24 N/A 
 chr07-2-4 7 2 hCV7435229 rs1523723 144877013 5128 20 N/A 
* chr07-2-5 7 2 hCV30792607 rs6954212 144880096 3083 28 N/A 
 chr08-1-1 8 1 hCV8947909 rs871565 18152103  39 N/A 
 chr08-1-2 8 1 hCV8947923 rs1493029 18165651 13548 29 N/A 
 chr08-1-3 8 1 hCV8947937 rs902960 18168085 2434 38 N/A 
* chr08-1-4 8 1 hCV29066331 rs7846103 18170309 2224 22 N/A 
 chr08-1-5 8 1 hCV16075982 rs2131422 18178912 8603 23 N/A 
 chr08-2-1 8 2 hCV11456221 rs2385226 126751178  17 N/A 
 chr08-2-2 8 2 hCV2761265 rs4871628 126752121 943 27 N/A 
 chr08-2-3 8 2 hCV8449160 rs7838054 126753324 1203 27 N/A 
 chr08-2-4 8 2 hCV2761254 rs1159478 126757397 4073 N/A N/A 
 chr08-2-5 8 2 hCV2761245 rs7460157 126761038 3641 22 N/A 
 chr09-1-1 9 1 hCV1617703 rs10966574 24919668  42 N/A 
 chr09-1-2 9 1 hCV3157880 rs7025715 24924491 4823 37 N/A 
 chr09-1-3 9 1 hCV1617701 rs7871011 24925087 596 47 N/A 
 chr09-1-4 9 1 hCV26305217 rs4085752 24931125 6038 14 N/A 
 chr09-1-5 9 1 hCV8767627 rs1461333 24936349 5224 42 N/A 
 chr09-2-1 9 2 hCV11489339 rs1927239 123675437  21 N/A 
 chr09-2-2 9 2 hCV16242136 rs2489161 123678034 2597 28 N/A 
 chr09-2-3 9 2 hCV995477 rs562239 123679804 1770 21 N/A 
 chr09-2-4 9 2 hCV29392986 rs4836945 123689332 9528 21 N/A 
 chr09-2-5 9 2 hCV16069779 rs2768818 123690135 803 28 N/A 
 chr10-1-1 10 1 hCV29522539 rs9663972 60527538  20 N/A 
 chr10-1-2 10 1 hCV31345052 rs6481457 60531364 3826 42 N/A 
 chr10-1-3 10 1 hCV908092 rs733341 60533393 2029 46 N/A 
 chr10-1-4 10 1 hCV31345143 rs11006373 60539023 5630 45 N/A 
 chr10-1-5 10 1 hCV31345171 rs7921026 60541895 2872 27 N/A 
 chr10-2-1 10 2 hCV11207816 rs7094944 109799612  37 N/A 
 chr10-2-2 10 2 hCV1798848 rs10509859 109803462 3850 23 30 
 chr10-2-3 10 2 hCV1798849 rs1125798 109808286 4824 25 31 
 chr10-2-4 10 2 hCV1798851 rs7073564 109813235 4949 23 N/A 
 chr10-2-5 10 2 hCV1798854 rs1556592 109819760 6525 35 42 
 chr11-1-1 11 1 hCV29137013 rs7124156 13198502  42 N/A 
 chr11-1-2 11 1 hCV9600088 rs900141 13204100 5598 20 N/A 
 319 
 chr11-1-3 11 1 hCV1870543 rs900142 13204831 731 22 N/A 
 chr11-1-4 11 1 hCV30567849 rs7117211 13205223 392 32 N/A 
 chr11-1-5 11 1 hCV7667097 rs7107711 13212114 6891 43 N/A 
 chr11-2-1 11 2 hCV11481013 rs2042599 127235817  34 N/A 
 chr11-2-2 11 2 hCV11481007 rs1812931 127240375 4558 30 N/A 
 chr11-2-3 11 2 hCV7504970 rs1364777 127242208 1833 27 N/A 
 chr11-2-4 11 2 hCV2890056 rs1107869 127249002 6794 27 N/A 
 chr11-2-5 11 2 hCV31697360 rs10893778 127253038 4036 27 N/A 
 chr12-1-1 12 1 hCV7562390 rs917589 3412660  17 N/A 
 chr12-1-2 12 1 hCV7562396 rs917587 3412936 276 16 N/A 
 chr12-1-3 12 1 hCV2649193 rs2878578 3413587 651 47 N/A 
 chr12-1-4 12 1 hCV29394818 rs6489468 3421275 7688 34 N/A 
 chr12-1-5 12 1 hCV2649182 rs7961141 3424976 3701 45 N/A 
 chr12-2-1 12 2 hCV2801082 rs855228 101400231  29 N/A 
 chr12-2-2 12 2 hCV7570428 rs855224 101405390 5159 35 N/A 
 chr12-2-3 12 2 hCV7570434 rs855218 101409109 3719 35 N/A 
 chr12-2-4 12 2 hCV7570449 rs855211 101413277 4168 32 N/A 
 chr12-2-5 12 2 hCV3061163 rs35746 101417107 3830 47 N/A 
 chr13-1-1 13 1 hCV1620102 rs4769191 21547069  35 N/A 
 chr13-1-2 13 1 hCV7556053 rs1323170 21547219 150 19 N/A 
 chr13-1-3 13 1 hCV30332355 rs4770238 21548179 960 37 N/A 
 chr13-1-4 13 1 hCV1620098 rs9316743 21548512 333 45 N/A 
 chr13-1-5 13 1 hCV7556051 rs1323172 21550247 1735 44 N/A 
 chr13-2-1 13 2 hCV509921 rs978089 85554112  21 N/A 
* chr13-2-2 13 2 hCV509923 rs4910994 85559270 5158 41 N/A 
 chr13-2-3 13 2 hCV7508241 rs1029143 85563006 3736 41 36 
 chr13-2-4 13 2 hCV30569079 rs9594117 85578891 15885 20 N/A 
 chr13-2-5 13 2 hCV9462203 rs1413441 85580898 2007 19 N/A 
 chr14-1-1 14 1 hCV15790014 rs2383584 33849679  21 N/A 
 chr14-1-2 14 1 hCV29357552 rs7143582 33852799 3120 33 N/A 
 chr14-1-3 14 1 hCV1453684 rs1958572 33858595 5796 47 N/A 
 chr14-1-4 14 1 hCV1453694 rs1958574 33867066 8471 15 N/A 
 chr14-1-5 14 1 hCV1453706 rs1958579 33870654 3588 13 26 
 chr14-2-1 14 2 hCV3244666 rs1241743 91751928  40 N/A 
 chr14-2-2 14 2 hCV3244664 rs1241745 91752315 387 36 N/A 
 chr14-2-3 14 2 hCV3244656 rs1956413 91753943 1628 44 N/A 
 chr14-2-4 14 2 hCV11666013 rs1956414 91758924 4981 40 N/A 
 chr14-2-5 14 2 hCV7585435 rs1741443 91774327 15403 47 N/A 
 chr15-1-1 15 1 hCV8926261 rs722150 31201795  N/A 42 
 chr15-1-2 15 1 hCV9960323 rs4780082 31202774 979 23 N/A 
 chr15-1-3 15 1 hCV11671510 rs1988447 31204618 1844 14 N/A 
 chr15-1-4 15 1 hCV29223603 rs7181962 31204650 32 46 N/A 
 chr15-1-5 15 1 hCV9960256 rs8023846 31211066 6416 35 N/A 
 chr15-2-1 15 2 hCV9708740 rs920921 66573339  41 N/A 
 chr15-2-2 15 2 hCV9708750 rs1373697 66577067 3728 35 N/A 
 chr15-2-3 15 2 hCV9708758 rs895133 66580703 3636 34 N/A 
 chr15-2-4 15 2 hCV15809641 rs2084032 66582870 2167 37 N/A 
 chr15-2-5 15 2 hCV9708767 rs895131 66583554 684 27 N/A 
 chr16-1-1 16 1 hCV11624551 rs1848824 61630443  44 N/A 
 chr16-1-2 16 1 hCV2281952 rs153322 61631942 1499 50 N/A 
 chr16-1-3 16 1 hCV2281956 rs153341 61644707 12765 23 N/A 
 chr16-1-4 16 1 hCV29048177 rs1605960 61655814 11107 27 N/A 
 chr16-1-5 16 1 hCV2281807 rs198007 61678146 22332 28 N/A 
 chr16-2-1 16 2 hCV1446720 rs1510205 84851316  17 N/A 
 320 
 chr16-2-2 16 2 hCV26612563 rs2883250 84859632 8316 22 N/A 
 chr16-2-3 16 2 hCV1521231 rs2696815 84859844 212 40 N/A 
 chr16-2-4 16 2 hCV31422718 rs717482 84862498 2654 16 N/A 
 chr16-2-5 16 2 hCV8898710 rs1027910 84866445 3947 17 N/A 
 chr17-1-1 17 1 hCV7596153 rs2007643 52084308  38 N/A 
 chr17-1-2 17 1 hCV2641904 rs6503752 52088714 4406 38 N/A 
 chr17-1-3 17 1 hCV2297056 rs714832 52093256 4542 29 N/A 
 chr17-1-4 17 1 hCV29726775 rs10491158 52099116 5860 17 N/A 
 chr17-1-5 17 1 hCV7596175 rs1019117 52103128 4012 27 N/A 
 chr17-2-1 17 2 hCV29084850 rs7222022 66763060  38 N/A 
 chr17-2-2 17 2 hCV2574303 rs2158906 66769428 6368 21 N/A 
 chr17-2-3 17 2 hCV6785 rs724856 66776439 7011 N/A N/A 
 chr17-2-4 17 2 hCV16151352 rs2190461 66787482 11043 48 N/A 
 chr17-2-5 17 2 hCV26366590 rs6501466 66789877 2395 36 N/A 
 chr18-1-1 18 1 hCV15866228 rs2940757 34847593  32 N/A 
 chr18-1-2 18 1 hCV15873100 rs2958610 34848055 462 45 N/A 
 chr18-1-3 18 1 hCV7458925 rs1509219 34852830 4775 32 N/A 
 chr18-1-4 18 1 hCV30437001 rs9304198 34854813 1983 17 N/A 
 chr18-1-5 18 1 hCV28986615 rs8083419 34856469 1656 47 N/A 
 chr18-2-1 18 2 hCV703794 rs165130 73464384  37 35 
 chr18-2-2 18 2 hCV3033039 rs905443 73464575 191 40 N/A 
 chr18-2-3 18 2 hCV738436 rs165128 73464782 207 18 N/A 
 chr18-2-4 18 2 hCV3033031 rs9952646 73470415 5633 37 N/A 
 chr18-2-5 18 2 hCV11740294 rs2407139 73472582 2167 21 N/A 
 chr19-1-1 19 1 hCV29353745 rs7256520 36812013  42 N/A 
 chr19-1-2 19 1 hCV29353740 rs8100570 36814702 2689 42 N/A 
 chr19-1-3 19 1 hCV9608785 rs892210 36817849 3147 N/A N/A 
 chr19-1-4 19 1 hCV31999332 rs8112540 36818052 203 31 N/A 
 chr19-1-5 19 1 hCV29353735 rs8101359 36822617 4565 33 N/A 
 chr19-2-1 19 2 hCV8710254 rs1654338 43228193  24 N/A 
 chr19-2-2 19 2 hCV2380087 rs734204 43231828 3635 25 30 
 chr19-2-3 19 2 hCV8710218 rs941037 43235466 3638 25 N/A 
 chr19-2-4 19 2 hCV2825747 rs1725467 43235743 277 25 N/A 
 chr19-2-5 19 2 hCV8710210 rs1725504 43238719 2976 35 N/A 
 chr20-1-1 20 1 hCV29643199 rs6085916 7112725  21 N/A 
 chr20-1-2 20 1 hCV8954459 rs1033604 7126839 14114 22 N/A 
 chr20-1-3 20 1 hCV8954461 rs1016264 7128675 1836 N/A N/A 
 chr20-1-4 20 1 hCV30166365 rs6133401 7129330 655 22 N/A 
 chr20-1-5 20 1 hCV29751510 rs6117693 7135874 6544 22 N/A 
 chr20-2-1 20 2 hCV3249308 rs2424383 21514717  12 N/A 
 chr20-2-2 20 2 hCV2808411 rs1014889 21518837 4120 31 45 
 chr20-2-3 20 2 hCV8890830 rs1014890 21519183 346 N/A N/A 
 chr20-2-4 20 2 hCV2808414 rs1074606 21521722 2539 30 N/A 
 chr20-2-5 20 2 hCV30615207 rs6035902 21530338 8616 36 N/A 
 chr21-1-1 21 1 hCV534386 rs150210 18284183  37 N/A 
 chr21-1-2 21 1 hCV569469 rs197562 18285788 1605 26 N/A 
 chr21-1-3 21 1 hCV2959026 rs2824593 18290483 4695 22 N/A 
 chr21-1-4 21 1 hCV534395 rs158077 18294083 3600 49 N/A 
 chr21-1-5 21 1 hCV9488991 rs1505265 18296572 2489 40 N/A 
 chr21-2-1 21 2 hCV31154097 rs8131079 24467571  37 N/A 
 chr21-2-2 21 2 hCV29107061 rs7280999 24469539 1968 37 N/A 
 chr21-2-3 21 2 hCV3236789 rs1024318 24475083 5544 22 29 
 chr21-2-4 21 2 hCV3236790 rs1910605 24480779 5696 49 40 
 chr21-2-5 21 2 hCV2787192 rs1910622 24483502 2723 49 N/A 
 321 
 chr22-1-1 22 1 hCV2463578 rs137462 31940397  N/A 23 
 chr22-1-2 22 1 hCV29637994 rs9306274 31940642 245 32 N/A 
 chr22-1-3 22 1 hCV2221204 rs137472 31945136 4494 32 28 
 chr22-1-4 22 1 hCV2221205 rs118033 31945610 474 18 N/A 
 chr22-1-5 22 1 hCV2221209 rs137475 31951775 6165 27 41 
 chr22-2-1 22 2 hCV15796647 rs2413378 34796843  17 N/A 
 chr22-2-2 22 2 hCV1088397 rs715550 34797853 1010 25 35 
 chr22-2-3 22 2 hCV1088400 rs715546 34798045 192 24 N/A 
 chr22-2-4 22 2 hCV30226396 rs7286844 34804171 6126 20 N/A 
 chr22-2-5 22 2 hCV1088401 rs739203 34805381 1210 42 N/A 
 
     
AVE 4347.0 
  
 
     
STD 3730.8 
  
* Excluded - Poor  Quality     Min 192   
 
          
Max 22332 
    
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 322 
Appendix E: Haplotype list of HVR I and HVR II variation 
 
 
HT N HG HVR 1 Variant sites HVR 2 Variant sites KAR COL CAC KHO CNC XEG DUM NAM GUG NAR JOH XUN KWE DRC HER SOT SWZ ZUX AFR EUR IND 
Ht_001 1 CRS                                               
Ht_002 1 NEAN 16037 A-G; 16078 A-G; 16129 G-A; 16139 A-T; 
16148 C-T; 16154 T-C; 16169 C-T; 16182 A-C; 
16183 A-C; 16189 T-C; 16209 T-C; 16223 C-T; 
16230 A-G; 16234 C-T; 16244 G-A; 16256 C-A; 
16258 A-G; 16262 C-T; 16263 insA; 16278 C-T; 
16299 A-G; 16311 T-C; 16320 C-T; 16362 T-C; 
16400 C-T; 16519 T-C 
73 A-G; 146 T-C; 150 C-T; 
152 T-C; 189 A-G; 200 A-G; 
243 A-G; 245 T-C; 247 G-A; 
262 C-T; 263 A-G; 417 G-A; 
438 C-T; 520 delCACAC; 
547 A-G 
                                          
Ht_003 2 L0a1b 16093 T-C; 16129 G-A; 16148 C-T; 16168 C-T; 
16172 T-C; 16187 C-T; 16188 C-G; 16189 T-C; 
16223 C-T; 16230 A-G; 16278 C-T; 16293 A-G; 
16311 T-C; 16320 C-T  
93 A-G; 95 A-C; 152 T-C; 
185 G-A; 189 A-G; 236 T-C; 
247 G-A; 263 A-G; 523 
delAC 
                                    2     
Ht_004 1 L0a1b 16129 G-A; 16148 C-T; 16168 C-T; 16172 T-C; 
16187 C-T; 16188 C-G; 16189 T-C; 16223 C-T; 
16230 A-G; 16278 C-T; 16293 A-C; 16311 T-C; 
16320 C-T  
93 A-G; 95 A-C; 185 G-A; 
189 A-G; 236 T-C; 247 G-A; 
263 A-G; 523 delAC 
  1                                       
Ht_005 1 L0a1b 16129 G-A; 16148 C-T; 16168 C-T; 16172 T-C; 
16187 C-T; 16188 C-G; 16189 T-C; 16223 C-T; 
16230 A-G; 16278 C-T; 16293 A-G; 16311 T-C; 
16320 C-T; 16368 T-C 
93 A-G; 95 A-C; 185 G-A; 
189 A-G; 236 T-C; 247 G-A; 
263 A-G; 523 delAC 
      1                                   
Ht_006 1 L0a1b 16129 G-A; 16148 C-T; 16168 C-T; 16172 T-C; 
16187 C-T; 16188 C-G; 16189 T-C; 16223 C-T; 
16230 A-G; 16278 C-T; 16293 A-G; 16311 T-C; 
16320 C-T  
89 T-C; 93 A-G; 95 A-C; 185 
G-A; 189 A-G; 236 T-C; 247 
G-A; 263 A-G; 507 T-C; 523 
delAC 
                                  1       
Ht_007 1 L0a1b 16129 G-A; 16148 C-T; 16168 C-T; 16172 T-C; 
16187 C-T; 16188 C-G; 16189 T-C; 16223 C-T; 
16230 A-G; 16278 C-T; 16293 A-G; 16311 T-C; 
16320 C-T  
89 T-C; 93 A-G; 95 A-C; 185 
G-A; 189 A-G; 236 T-C; 247 
G-A; 263 A-G; 523 delAC 
  1                                       
Ht_008 1 L0a1b 16129 G-A; 16148 C-T; 16168 C-T; 16172 T-C; 
16187 C-T; 16188 C-G; 16189 T-C; 16223 C-T; 
16230 A-G; 16278 C-T; 16293 A-G; 16311 T-C; 
16320 C-T  
93 A-G; 95 A-C; 152 T-C; 
185 G-A; 189 A-G; 236 T-C; 
247 G-A; 263 A-G; 523 
delAC 
  1                                       
Ht_009 6 L0a1b 16129 G-A; 16148 C-T; 16168 C-T; 16172 T-C; 
16187 C-T; 16188 C-G; 16189 T-C; 16223 C-T; 
16230 A-G; 16278 C-T; 16293 A-G; 16311 T-C; 
16320 C-T  
93 A-G; 95 A-C; 185 G-A; 
189 A-G; 236 T-C; 247 G-A; 
263 A-G; 523 delAC 
    2     1                   2   1       
Ht_010 1 L0a1b 16129 G-A; 16148 C-T; 16168 C-T; 16172 T-C; 
16187 C-T; 16188 C-G; 16189 T-C; 16223 C-T; 
16230 A-G; 16278 C-T; 16293 A-G; 16311 T-C; 
16320 C-T; 16344 C-T; 16519 T-C 
93 A-G; 95 A-C; 185 G-A; 
189 A-G; 236 T-C; 247 G-A; 
263 A-G; 523 delAC 
                        1                 
 323 
Ht_011 2 L0a2 16148 C-T; 16172 T-C; 16187 C-T; 16188 C-A; 
16189 T-C; 16223 C-T; 16230 A-G; 16311 T-C; 
16320 C-T; 16390 G-A; 16519 T-C 
64 C-T; 93 A-G; 152 T-C; 
195 T-C; 236 T-C; 247 G-A; 
263 A-G; 455 insT 
  2                                       
Ht_012 3 L0a2 16148 C-T; 16172 T-C; 16187 C-T; 16188 C-G; 
16189 T-C; 16223 C-T; 16230 A-G; 16311 T-C; 
16320 C-T; 16519 T-C 
64 C-T; 93 A-G; 152 T-C; 
189 A-G; 204 T-C; 207 G-A; 
236 T-C; 247 G-A; 263 A-G; 
523 delAC 
    1                     1       1       
Ht_013 1 L0a2 16148 C-T; 16172 T-C; 16187 C-T; 16188 C-G; 
16189 T-C; 16223 C-T; 16230 A-G; 16311 T-C; 
16320 C-T; 16519 T-C 
64 C-T; 93 A-G; 150 C-T; 
152 T-C; 189 A-G; 204 T-C; 
207 G-A; 236 T-C; 247 G-A; 
263 A-G; 523 delAC 
  1                                       
Ht_014 1 L0a2a1 16093 T-C; 16148 C-T; 16172 T-C; 16187 C-T; 
16188 C-A; 16189 T-C; 16223 C-T; 16230 A-G; 
16311 T-C; 16320 C-T; 16519 T-C 
64 C-T; 93 A-G; 152 T-C; 
189 A-G; 236 T-C; 247 G-A; 
263 A-G; 523 delAC 
                              1           
Ht_015 2 L0d1a 16086 T-C; 16111 C-T; 16129 G-A; 16187 C-T; 
16189 T-C; 16230 A-G; 16234 C-T; 16243 T-C; 
16266 C-G; 16294 C-T; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 195 T-C; 
247 G-A; 498 delC; 573 insC 
        2                                 
Ht_016 2 L0d1a 16086 T-C; 16111 C-T; 16129 G-A; 16187 C-T; 
16189 T-C; 16230 A-G; 16234 C-T; 16243 T-C; 
16266 C-G; 16294 C-T; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 189 A-G; 
195 T-C; 247 G-A; 498 delC; 
573 insC 
        2                                 
Ht_017 1 L0d1a 16086 T-C; 16111 C-T; 16129 G-A; 16187 C-T; 
16189 T-C; 16230 A-G; 16234 C-T; 16243 T-C; 
16266 C-G; 16294 C-T; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 189 A-G; 
247 G-A; 498 delC; 573 insC 
1                                         
Ht_018 1 L0d1a 16093 T-C; 16129 G-A; 16187 C-T; 16189 T-C; 
16209 T-C; 16230 A-G; 16234 C-T; 16243 T-C; 
16266 C-A; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 153 A-G; 
195 T-C; 199 T-C; 247 G-A; 
498 delC 
              1                           
Ht_019 3 L0d1a 16093 T-C; 16129 G-A; 16187 C-T; 16189 T-C; 
16230 A-G; 16234 C-T; 16243 T-C; 16266 C-G; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 199 T-C; 247 G-A; 
498 delC 
                    3                     
Ht_020 1 L0d1a 16093 T-C; 16129 G-A; 16187 C-T; 16189 T-C; 
16214 C-T; 16230 A-G; 16234 C-T; 16243 T-C; 
16266 C-G; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 199 T-C; 247 G-A; 
498 delC 
                    1                     
Ht_021 4 L0d1a 16129 G-A; 16187 C-T; 16189 T-C; 16209 T-C; 
16230 A-G; 16234 C-T; 16243 T-C; 16266 C-A; 
16311 T-C; 16362 T-C; 16519 T-C 
73 A-G; 146 T-C; 153 A-G; 
195 T-C; 199 T-C; 247 G-A; 
498 delC 
      4                                   
Ht_022 1 L0d1a 16129 G-A; 16187 C-T; 16189 T-C; 16209 T-C; 
16230 A-G; 16234 C-T; 16243 T-C; 16266 C-A; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 153 A-G; 
195 T-C; 199 T-C; 247 G-A; 
498 delC 
  1                                       
Ht_023 2 L0d1a 16129 G-A; 16187 C-T; 16189 T-C; 16209 T-C; 
16230 A-G; 16234 C-T; 16243 T-C; 16266 C-A; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 153 A-G; 
199 T-C; 247 G-A; 498 delC 
  1   1                                   
Ht_024 1 L0d1a 16129 G-A; 16187 C-T; 16189 T-C; 16209 T-C; 
16230 A-G; 16234 C-T; 16243 T-C; 16266 C-A; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 153 A-G; 
189 A-G; 195 T-C; 199 T-C; 
247 G-A; 498 delC 
  1                                       
Ht_025 2 L0d1a 16129 G-A; 16187 C-T; 16189 T-C; 16230 A-G; 
16234 C-T; 16243 T-C; 16266 C-A; 16311 T-C; 
16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
199 T-C; 247 G-A; 498 delC; 
524 insAC 
1 1                                       
Ht_026 1 L0d1a 16129 G-A; 16187 C-T; 16189 T-C; 16230 A-G; 
16234 C-T; 16243 T-C; 16266 C-A; 16311 T-C; 
16519 T-C 
73 A-G; 146 T-C; 195 T-C; 
199 T-C; 247 G-A; 318 T-C; 
498 delC 
  1                                       
 324 
Ht_027 4 L0d1a 16129 G-A; 16187 C-T; 16189 T-C; 16230 A-G; 
16234 C-T; 16243 T-C; 16266 C-A; 16311 T-C; 
16519 T-C 
73 A-G; 146 T-C; 195 T-C; 
199 T-C; 247 G-A; 498 delC 
  2   2                                   
Ht_028 1 L0d1a 16129 G-A; 16187 C-T; 16189 T-C; 16230 A-G; 
16234 C-T; 16243 T-C; 16266 C-A; 16311 T-C; 
16519 T-C 
73 A-G; 146 T-C; 199 T-C; 
247 G-A; 498 delC 
  1                                       
Ht_029 1 L0d1a 16129 G-A; 16187 C-T; 16189 T-C; 16230 A-G; 
16234 C-T; 16243 T-C; 16266 C-A; 16311 T-C; 
16519 T-C 
73 A-G; 146 T-C; 188 A-G; 
195 T-C; 199 T-C; 206 T-C; 
247 G-A; 498 delC 
      1                                   
Ht_030 1 L0d1a 16129 G-A; 16187 C-T; 16189 T-C; 16230 A-G; 
16234 C-T; 16243 T-C; 16266 C-A; 16301 C-T; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 195 T-C; 
247 G-A; 498 delC 
                                  1       
Ht_031 1 L0d1a 16129 G-A; 16187 C-T; 16189 T-C; 16230 A-G; 
16234 C-T; 16243 T-C; 16264 C-T; 16266 C-G; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 199 T-C; 247 G-A; 
498 delC 
                      1                   
Ht_032 1 L0d1a 16129 G-A; 16187 C-T; 16189 T-C; 16192 C-T; 
16230 A-G; 16234 C-T; 16243 T-C; 16266 C-G; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 195 T-C; 
198 C-T; 199 T-C; 247 G-A; 
498 delC 
        1                                 
Ht_033 2 L0d1a 16129 G-A; 16146 A-G; 16187 C-T; 16189 T-C; 
16230 A-G; 16234 C-T; 16243 T-C; 16245 C-T; 
16266 C-A; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 195 T-C; 
199 T-C; 247 G-A; 498 delC; 
524 insAC 
  2                                       
Ht_034 2 L0d1a 16051 A-G; 16129 G-A; 16187 C-T; 16189 T-C; 
16230 A-G; 16234 C-T; 16243 T-C; 16266 C-G; 
16311 T-C; 16320 C-T; 16519 T-C 
73 A-G; 146 T-C; 195 T-C; 
199 T-C; 247 G-A; 498 delC; 
524 C-T  
                2                         
Ht_035 3 L0d1a 16051 A-G; 16129 G-A; 16187 C-T; 16189 T-C; 
16230 A-G; 16234 C-T; 16243 T-C; 16266 C-G; 
16291 C-T; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 195 T-C; 
199 T-C; 207 G-A; 247 G-A; 
498 delC 
      2 1                                 
Ht_036 1 L0d1b 16093 T-C; 16129 G-A; 16187 C-T; 16189 T-C; 
16223 C-T; 16230 A-G; 16239 C-T; 16243 T-C; 
16294 C-T; 16311 T-C; 16325 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC 
          1                               
Ht_037 4 L0d1b 16129 G-A; 16140 T-C; 16187 C-T; 16189 T-C; 
16223 C-T; 16239 C-T; 16243 T-C; 16294 C-T; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC; 
523 delAC; 573 insC 
1 1   1       1                           
Ht_038 2 L0d1b 16129 G-A; 16172 T-C; 16187 C-T; 16189 T-C; 
16223 C-T; 16230 A-G; 16239 C-T; 16243 T-C; 
16294 C-T; 16311 T-C; 16320 C-T; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC 
                2                         
Ht_039 1 L0d1b 16129 G-A; 16189 T-C; 16223 C-T; 16230 A-G; 
16239 C-T; 16243 T-C; 16294 C-T; 16311 T-C; 
16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC 
                      1                   
Ht_040 2 L0d1b 16129 G-A; 16187 C-T; 16189 T-C; 16223 C-T; 
16243 T-C; 16257 C-T; 16294 C-T; 16311 T-C; 
16482 A-G; 16519 T-C; 16527 C-T  
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 236 T-C; 247 G-A; 
498 delC; 573 insC 
      2                                   
Ht_041 3 L0d1b 16129 G-A; 16187 C-T; 16189 T-C; 16223 C-T; 
16243 T-C; 16257 C-T; 16294 C-T; 16311 T-C; 
16482 A-G; 16519 T-C; 16527 C-T  
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC; 
573 insC 
      3                                   
Ht_042 1 L0d1b 16129 G-A; 16187 C-T; 16189 T-C; 16223 C-T; 
16239 C-T; 16243 T-C; 16271 T-C; 16292 C-T; 
16294 C-T; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC; 
523 delAC; 573 insC 
  1                                       
 325 
Ht_043 3 L0d1b 16129 G-A; 16187 C-T; 16189 T-C; 16223 C-T; 
16239 C-T; 16243 T-C; 16294 C-T; 16311 T-C; 
16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC; 
523 delAC; 573 insC 
                    2             1       
Ht_044 1 L0d1b 16129 G-A; 16187 C-T; 16189 T-C; 16223 C-T; 
16239 C-T; 16243 T-C; 16294 C-T; 16311 T-C; 
16519 T-C; 16527 C-T  
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC; 
573 insC 
              1                           
Ht_045 5 L0d1b 16129 G-A; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16239 C-T; 16243 T-C; 16294 C-T; 
16311 T-C; 16325 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC 
      1 4                                 
Ht_046 1 L0d1b 16129 G-A; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16239 C-T; 16243 T-C; 16294 C-T; 
16311 T-C; 16325 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 241 A-G; 247 G-A; 
498 delC 
                              1           
Ht_047 18 L0d1b 16129 G-A; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16239 C-T; 16243 T-C; 16294 C-T; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC 
2 5 3 3 1             1       1 1 1       
Ht_048 1 L0d1b 16129 G-A; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16239 C-T; 16243 T-C; 16294 C-T; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
498 delC 
1                                         
Ht_049 2 L0d1b 16129 G-A; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16239 C-T; 16243 T-C; 16294 C-T; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
247 G-A; 498 delC 
  1           1                           
Ht_050 3 L0d1b 16129 G-A; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16239 C-T; 16243 T-C; 16294 C-T; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 195 T-C; 
247 G-A; 498 delC 
                    3                     
Ht_051 1 L0d1b 16129 G-A; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16239 C-T; 16243 T-C; 16294 C-T; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 189 A-G; 
195 T-C; 247 G-A; 498 delC 
                    1                     
Ht_052 1 L0d1b 16129 G-A; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16239 C-T; 16243 T-C; 16294 C-T; 
16311 T-C; 16320 C-T; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC 
                                  1       
Ht_053 1 L0d1b 16129 G-A; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16239 C-T; 16243 T-C; 16266 C-T; 
16294 C-T; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC 
  1                                       
Ht_054 6 L0d1b 16129 G-A; 16187 C-T; 16189 T-C; 16218 C-T; 
16223 C-T; 16239 C-T; 16243 T-C; 16294 C-T; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC; 
523 delAC; 573 insC 
    1 2 1     2                           
Ht_055 3 L0d1b 16129 G-A; 16187 C-T; 16189 T-C; 16218 C-T; 
16223 C-T; 16227 A-G; 16239 C-T; 16243 T-C; 
16294 C-T; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC; 
523 delAC; 573 insC 
2           1                             
Ht_056 6 L0d1b 16129 G-A; 16187 C-T; 16189 T-C; 16192 C-T; 
16223 C-T; 16239 C-T; 16243 T-C; 16294 C-T; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC; 
523 delAC; 573 insC 
  1 1 2 1                   1             
Ht_057 1 L0d1b 16129 G-A; 16187 C-T; 16189 T-C; 16192 C-T; 
16223 C-T; 16239 C-T; 16271 T-C; 16294 C-T; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC; 
523 delAC; 573 insC 
        1                                 
Ht_058 1 L0d1b 16187 C-T; 16189 T-C; 16223 C-T; 16239 C-T; 
16243 T-C; 16294 C-T; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC; 
523 delAC; 573 insC 
  1                                       
 326 
Ht_059 1 L0d1b 16037 A-G; 16129 G-A; 16187 C-T; 16189 T-C; 
16223 C-T; 16239 C-T; 16243 T-C; 16294 C-T; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC; 
523 delAC; 573 insC 
                                  1       
Ht_060 5 L0d1b_x 16129 G-A; 16153 G-A; 16187 C-T; 16189 T-C; 
16223 C-T; 16230 A-G; 16243 T-C; 16294 C-T; 
16311 T-C; 16474 G-T; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 207 G-A; 247 G-A; 
498 delC 
                    5                     
Ht_061 1 L0d1b_x 16129 G-A; 16153 G-A; 16187 C-T; 16189 T-C; 
16223 C-T; 16230 A-G; 16243 T-C; 16294 C-T; 
16311 T-C; 16474 G-T; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC 
              1                           
Ht_062 1 L0d1b_x 16129 G-A; 16153 G-A; 16187 C-T; 16189 T-C; 
16223 C-T; 16230 A-G; 16243 T-C; 16294 C-T; 
16311 T-C; 16474 G-T; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 456 C-T; 
498 delC 
                      1                   
Ht_063 2 L0d1b_x 16129 G-A; 16153 G-A; 16187 C-T; 16189 T-C; 
16223 C-T; 16230 A-G; 16243 T-C; 16294 C-T; 
16311 T-C; 16474 G-T; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 delG; 498 delC 
                    2                     
Ht_064 1 L0d1b_x 16129 G-A; 16153 G-A; 16187 C-T; 16189 T-C; 
16223 C-T; 16230 A-G; 16243 T-C; 16294 C-T; 
16311 T-C; 16474 G-T; 16519 T-C 
73 A-G; 146 T-C; 195 T-C; 
247 G-A; 498 delC 
      1                                   
Ht_065 8 L0d1c 16093 T-C; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16234 C-T; 16243 T-C; 16311 T-C; 
16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 265 T-C; 
456 C-T; 498 delC; 523 
delAC 
                    6 2                   
Ht_066 1 L0d1c 16187 C-T; 16189 T-C; 16223 C-T; 16230 A-G; 
16234 C-T; 16243 T-C; 16244 G-C; 16294 C-G; 
16311 T-C; 16354 C-T  
73 A-G; 146 T-C; 195 T-C; 
247 G-A; 456 C-T; 498 delC; 
523 delAC 
                    1                     
Ht_067 2 L0d1c 16187 C-T; 16189 T-C; 16223 C-T; 16230 A-G; 
16234 C-T; 16243 T-C; 16249 T-C; 16311 T-C; 
16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 456 C-T; 
498 delC 
  1             1                         
Ht_068 6 L0d1c 16187 C-T; 16189 T-C; 16223 C-T; 16230 A-G; 
16234 C-T; 16242 C-T; 16243 T-C; 16311 T-C; 
16519 T-C 
73 A-G; 114 C-A; 146 T-C; 
152 T-C; 195 T-C; 247 G-A; 
456 C-T; 498 delC; 523 
delAC 
                6                         
Ht_069 1 L0d1c 16183 A-C; 16189 T-C; 16223 C-T; 16230 A-G; 
16234 C-T; 16243 T-C; 16249 T-C; 16311 T-C; 
16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 456 C-T; 
498 delC 
        1                                 
Ht_070 5 L0d1c 16148 C-T; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16234 C-T; 16242 C-T; 16243 T-C; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 456 C-T; 
498 delC; 523 delAC 
      2 3                                 
Ht_071 2 L0d1c1 16167 C-T; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16234 C-T; 16242 C-T; 16243 T-C; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
294 T-C; 456 C-T; 498 delC; 
523 delAC 
                2                         
Ht_072 3 L0d1c1 16167 C-T; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16234 C-T; 16242 C-T; 16243 T-C; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
456 C-T; 498 delC; 523 
delAC; 593 T-C 
                    1 2                   
Ht_073 27 L0d1c1 16167 C-T; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16234 C-T; 16242 C-T; 16243 T-C; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
456 C-T; 498 delC; 523 
delAC 
      2 1       6 1 3 14                   
 327 
Ht_074 1 L0d1c1 16167 C-T; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16234 C-T; 16242 C-T; 16243 T-C; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
456 C-T; 498 delC; 502 C-T; 
523 delAC 
              1                           
Ht_075 2 L0d1c1 16167 C-T; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16234 C-T; 16242 C-T; 16243 T-C; 
16311 T-C; 16497 A-G; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
456 C-T; 498 delC; 523 
delAC 
                      1           1       
Ht_076 1 L0d1c1 16167 C-T; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16234 C-T; 16240 A-C; 16242 C-T; 
16243 T-C; 16311 T-C; 16497 A-G; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
456 C-T; 498 delC; 523 
delAC 
                      1                   
Ht_077 1 L0d2a 16093 T-C; 16129 G-A; 16187 C-T; 16189 T-C; 
16223 C-T; 16230 A-G; 16243 T-C; 16311 T-C; 
16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
498 delC; 523 delAC; 597 C-
 T  
                                  1       
Ht_078 4 L0d2a 16093 T-C; 16129 G-A; 16187 C-T; 16189 T-C; 
16212 A-G; 16223 C-T; 16230 A-G; 16243 T-C; 
16311 T-C; 16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
498 delC; 523 delAC; 597 C-
 T  
  2   1 1                                 
Ht_079 1 L0d2a 16129 G-A; 16145 G-A; 16187 C-T; 16189 T-C; 
16212 A-G; 16223 C-T; 16230 A-G; 16243 T-C; 
16311 T-C; 16390 G-A; 16519 T-C; 16524 A-G 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
498 delC; 523 delAC; 597 C-
 T  
                              1           
Ht_080 1 L0d2a 16129 G-A; 16145 G-A; 16187 C-T; 16189 T-C; 
16212 A-G; 16223 C-T; 16230 A-G; 16243 T-C; 
16264 C-T; 16311 T-C; 16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
498 delC; 523 delAC; 597 C-
 T  
              1                           
Ht_081 2 L0d2a 16129 G-A; 16172 T-C; 16187 C-T; 16189 T-C; 
16212 A-G; 16223 C-T; 16230 A-G; 16243 T-C; 
16311 T-C; 16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
498 delC; 523 delAC; 597 C-
 T  
  1                   1                   
Ht_082 3 L0d2a 16129 G-A; 16189 T-C; 16212 A-G; 16223 C-T; 
16230 A-G; 16243 T-C; 16311 T-C; 16390 G-A; 
16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
498 delC; 523 delAC; 597 C-
 T  
          1                     1 1       
Ht_083 1 L0d2a 16129 G-A; 16189 T-C; 16212 A-G; 16223 C-T; 
16230 A-G; 16243 T-C; 16311 T-C; 16390 G-A; 
16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
463 C-T; 498 delC; 523 
delAC; 597 C-T  
        1                                 
Ht_084 1 L0d2a 16129 G-A; 16187 C-T; 16189 T-C; 16212 A-G; 
16223 C-T; 16243 T-C; 16311 T-C; 16390 G-A; 
16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
390 A-G; 498 delC; 523 
delAC; 597 C-T  
              1                           
Ht_085 8 L0d2a 16129 G-A; 16187 C-T; 16189 T-C; 16212 A-G; 
16223 C-T; 16230 A-G; 16243 T-C; 16311 T-C; 
16362 T-C; 16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
498 delC; 523 delAC; 597 C-
 T  
      8                                   
 328 
Ht_086 1 L0d2a 16129 G-A; 16187 C-T; 16189 T-C; 16212 A-G; 
16223 C-T; 16230 A-G; 16243 T-C; 16311 T-C; 
16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC; 
523 delAC; 597 C-T  
1                                         
Ht_087 1 L0d2a 16129 G-A; 16187 C-T; 16189 T-C; 16212 A-G; 
16223 C-T; 16230 A-G; 16243 T-C; 16311 T-C; 
16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
291 insA; 498 delC; 523 
delAC; 597 C-T  
        1                                 
Ht_088 4 L0d2a 16129 G-A; 16187 C-T; 16189 T-C; 16212 A-G; 
16223 C-T; 16230 A-G; 16243 T-C; 16311 T-C; 
16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
498 delC; 597 C-T  
2 1 1                                     
Ht_089 41 L0d2a 16129 G-A; 16187 C-T; 16189 T-C; 16212 A-G; 
16223 C-T; 16230 A-G; 16243 T-C; 16311 T-C; 
16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
498 delC; 523 delAC; 597 C-
 T  
8 9 2 5 9     2   1   2           3       
Ht_090 5 L0d2a 16129 G-A; 16187 C-T; 16189 T-C; 16212 A-G; 
16223 C-T; 16230 A-G; 16243 T-C; 16311 T-C; 
16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
498 delC; 515 A-G; 523 
delAC; 597 C-T  
4 1                                       
Ht_091 1 L0d2a 16129 G-A; 16187 C-T; 16189 T-C; 16212 A-G; 
16223 C-T; 16230 A-G; 16243 T-C; 16311 T-C; 
16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
456 C-T; 498 delC; 597 C-T  
1                                         
Ht_092 1 L0d2a 16129 G-A; 16187 C-T; 16189 T-C; 16212 A-G; 
16223 C-T; 16230 A-G; 16243 T-C; 16311 T-C; 
16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 delG; 
498 delC; 523 delAC; 597 C-
 T  
  1                                       
Ht_093 1 L0d2a 16129 G-A; 16187 C-T; 16189 T-C; 16212 A-G; 
16223 C-T; 16230 A-G; 16243 T-C; 16311 T-C; 
16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
188 A-G; 195 T-C; 198 C-T; 
247 G-A; 498 delC; 523 
delAC; 597 C-T  
                                  1       
Ht_094 4 L0d2a 16129 G-A; 16187 C-T; 16189 T-C; 16212 A-G; 
16223 C-T; 16230 A-G; 16243 T-C; 16311 T-C; 
16320 C-T; 16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 195 T-C; 
198 C-T; 247 G-A; 498 delC; 
523 delAC; 597 C-T  
      4                                   
Ht_095 1 L0d2a 16129 G-A; 16187 C-T; 16189 T-C; 16212 A-G; 
16223 C-T; 16230 A-G; 16243 T-C; 16266 C-T; 
16311 T-C; 16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
498 delC; 523 delAC; 597 C-
 T  
                      1                   
Ht_096 1 L0d2a 16129 G-A; 16187 C-T; 16189 T-C; 16212 A-G; 
16223 C-T; 16230 A-G; 16243 T-C; 16260 C-T; 
16311 T-C; 16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
498 delC; 523 delAC; 597 C-
 T  
              1                           
Ht_097 2 L0d2a 16129 G-A; 16187 C-T; 16188 C-T; 16189 T-C; 
16212 A-G; 16223 C-T; 16230 A-G; 16243 T-C; 
16311 T-C; 16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
498 delC; 523 delAC; 597 C-
 T  
1     1                                   
Ht_098 1 L0d2a 16129 G-A; 16179 C-T; 16187 C-T; 16189 T-C; 
16212 A-G; 16223 C-T; 16230 A-G; 16243 T-C; 
16311 T-C; 16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
498 delC; 523 delAC; 597 C-
 T  
                                  1       
 329 
Ht_099 1 L0d2a 16129 G-A; 16148 C-T; 16187 C-T; 16189 T-C; 
16212 A-G; 16223 C-T; 16230 A-G; 16243 T-C; 
16311 T-C; 16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
498 delC; 597 C-T  
                              1           
Ht_100 1 L0d2a 16129 G-A; 16148 C-T; 16173 C-T; 16187 C-T; 
16189 T-C; 16212 A-G; 16223 C-T; 16230 A-G; 
16243 T-C; 16311 T-C; 16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
498 delC; 523 delAC; 597 C-
 T  
                                  1       
Ht_101 1 L0d2a 16187 C-T; 16189 T-C; 16212 A-G; 16223 C-T; 
16230 A-G; 16243 T-C; 16311 T-C; 16390 G-A; 
16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
498 delC; 523 delAC; 597 C-
 T  
  1                                       
Ht_102 1 L0d2a 16187 C-T; 16189 T-C; 16212 A-G; 16223 C-T; 
16230 A-G; 16243 T-C; 16311 T-C; 16390 G-A; 
16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
498 delC; 523 delAC; 573 
insC; 597 C-T  
              1                           
Ht_103 1 L0d2a 16182 A-C; 16183 A-C; 16189 T-C; 16212 A-G; 
16223 C-T; 16230 A-G; 16243 T-C; 16311 T-C; 
16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 198 C-T; 247 G-A; 
463 C-T; 498 delC; 523 
delAC; 597 C-T  
1                                         
Ht_104 1 L0d2b 16069 C-T; 16126 T-C; 16129 G-A; 16169 C-T; 
16182 A-C; 16183 A-C; 16189 T-C; 16212 A-G; 
16223 C-T; 16230 A-G; 16243 T-C; 16258 A-C; 
16291 C-T; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 195 T-C; 
247 G-A; 265 T-C; 498 delC; 
523 delAC; 573 insC 
        1                                 
Ht_105 1 L0d2b 16069 C-T; 16126 T-C; 16129 G-A; 16169 C-T; 
16182 A-C; 16183 A-C; 16189 T-C; 16212 A-G; 
16223 C-T; 16230 A-G; 16243 T-C; 16258 A-C; 
16291 C-T; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 247 G-A; 
265 T-C; 498 delC; 523 
delAC; 573 insC 
                1                         
Ht_106 2 L0d2b 16069 C-T; 16126 T-C; 16129 G-A; 16169 C-T; 
16182 A-C; 16183 A-C; 16188 C-T; 16189 T-C; 
16212 A-G; 16223 C-T; 16230 A-G; 16243 T-C; 
16258 A-C; 16291 C-T; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 195 T-C; 
247 G-A; 265 T-C; 498 delC; 
523 delAC; 573 insC 
  2                                       
Ht_107 2 L0d2b 16069 C-T; 16129 G-A; 16169 C-T; 16187 C-T; 
16189 T-C; 16212 A-G; 16223 C-T; 16230 A-G; 
16243 T-C; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 498 delC; 
523 delAC 
              1               1           
Ht_108 1 L0d2c 16086 T-C; 16129 G-A; 16187 C-T; 16189 T-C; 
16223 C-T; 16230 A-G; 16243 T-C; 16261 C-G; 
16311 T-C; 16355 C-T; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 294 T-A; 
408 T-A; 498 delC; 523 
delAC 
                      1                   
Ht_109 1 L0d2c 16129 G-A; 16187 C-T; 16189 T-C; 16230 A-G; 
16243 T-C; 16311 T-C; 16519 T-C 
73 A-G; 94 G-A; 146 T-C; 
195 T-C; 247 G-A; 294 T-A; 
498 delC; 523 delAC 
              1                           
Ht_110 11 L0d2c 16129 G-A; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16243 T-C; 16311 T-C; 16519 T-C 
73 A-G; 94 G-A; 146 T-C; 
195 T-C; 247 G-A; 294 T-A; 
498 delC; 523 delAC 
  1 1 7       1                     1     
Ht_111 1 L0d2c 16129 G-A; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16243 T-C; 16311 T-C; 16519 T-C 
73 A-G; 94 G-A; 140 C-T; 
146 T-C; 195 T-C; 247 G-A; 
294 T-A; 498 delC; 523 
delAC 
  1                                       
 330 
Ht_112 2 L0d2c 16129 G-A; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16243 T-C; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 247 G-A; 294 T-A; 
498 delC 
              2                           
Ht_113 1 L0d2c 16081 A-G; 16129 G-A; 16187 C-T; 16189 T-C; 
16223 C-T; 16230 A-G; 16243 T-C; 16311 T-C; 
16519 T-C 
73 A-G; 94 G-A; 146 T-C; 
195 T-C; 247 G-A; 294 T-A; 
498 delC; 523 delAC 
        1                                 
Ht_114 2 L0d2d 16129 G-A; 16187 C-T; 16189 T-C; 16212 A-G; 
16223 C-T; 16230 A-G; 16243 T-C; 16311 T-C; 
16390 G-T; 16519 T-C 
73 A-G; 125 T-C; 127 T-C; 
146 T-C; 150 C-T; 152 T-C; 
188 A-G; 195 T-C; 247 G-A; 
498 delC; 523 delAC; 573 
insC 
                    2                     
Ht_115 1 L0d2d 16129 G-A; 16187 C-T; 16189 T-C; 16212 A-G; 
16223 C-T; 16230 A-G; 16243 T-C; 16291 C-T; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
188 A-G; 195 T-C; 247 G-A; 
498 delC; 523 delAC; 593 T-
 C 
                                        1 
Ht_116 1 L0d3 16172 T-C; 16187 C-T; 16189 T-C; 16223 C-T; 
16230 A-G; 16243 T-C; 16266 C-T; 16274 G-A; 
16278 C-T; 16290 C-T; 16300 A-G; 16311 T-C; 
16519 T-C 
73 A-G; 146 T-C; 150 C-T; 
195 T-C; 247 G-A; 316 G-A  
  1                                       
Ht_117 2 L0d3 16187 C-T; 16189 T-C; 16223 C-T; 16230 A-G; 
16243 T-C; 16274 G-A; 16278 C-T; 16290 C-T; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 150 C-T; 
152 T-C; 195 T-C; 247 G-A; 
316 G-A; 523 delAC 
                      1           1       
Ht_118 3 L0d3 16187 C-T; 16189 T-C; 16223 C-T; 16230 A-G; 
16243 T-C; 16274 G-A; 16278 C-T; 16290 C-T; 
16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 150 C-T; 
195 T-C; 247 G-A; 316 G-A; 
523 delAC 
      1 1     1                           
Ht_119 2 L0d3 16187 C-T; 16189 T-C; 16223 C-T; 16230 A-G; 
16243 T-C; 16274 G-A; 16278 C-T; 16290 C-T; 
16300 A-G; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 150 C-T; 
195 T-C; 247 G-A; 316 G-A  
  2                                       
Ht_120 10 L0d3 16187 C-T; 16189 T-C; 16223 C-T; 16230 A-G; 
16243 T-C; 16266 C-T; 16274 G-A; 16278 C-T; 
16290 C-T; 16300 A-G; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 150 C-T; 
195 T-C; 247 G-A; 316 G-A  
4 4     2                                 
Ht_121 1 L0d3 16187 C-T; 16189 T-C; 16223 C-T; 16230 A-G; 
16243 T-C; 16266 C-T; 16274 G-A; 16278 C-T; 
16290 C-T; 16300 A-G; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 150 C-T; 
195 T-C; 247 G-A; 316 G-A; 
523 delAC 
                                  1       
Ht_122 2 L0d3 16187 C-T; 16189 T-C; 16214 C-T; 16223 C-T; 
16230 A-G; 16243 T-C; 16274 G-A; 16278 C-T; 
16290 C-T; 16300 A-G; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 150 C-T; 
195 T-C; 247 G-A; 316 G-A  
  1     1                                 
Ht_123 1 L0dx 16129 G-A; 16179 C-T; 16187 C-T; 16189 T-C; 
16223 C-T; 16230 A-G; 16243 T-C; 16311 T-C; 
16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
188 A-G; 195 T-C; 247 G-A; 
498 delC 
                      1                   
Ht_124 1 L0dx 16129 G-A; 16179 C-T; 16187 C-T; 16189 T-C; 
16223 C-T; 16230 A-G; 16243 T-C; 16311 T-C; 
16519 T-C 
73 A-G; 146 T-C; 189 A-G; 
195 T-C; 199 T-C; 247 G-A; 
498 delC; 523 delAC 
                      1                   
Ht_125 2 L0dx 16129 G-A; 16179 C-T; 16187 C-T; 16189 T-C; 
16223 C-T; 16230 A-G; 16243 T-C; 16311 T-C; 
16399 A-G; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
188 A-G; 195 T-C; 247 G-A; 
498 delC; 573 insC 
                        2                 
 331 
Ht_126 8 L0k1 16166 A-C; 16172 T-C; 16189 T-C; 16209 T-C; 
16214 C-T; 16223 C-T; 16230 A-G; 16278 C-T; 
16291 C-G; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
189 A-G; 195 T-C; 198 C-T; 
204 T-C; 207 G-A; 247 G-A; 
523 delAC 
                    3 5                   
Ht_127 1 L0k1 16166 A-C; 16172 T-C; 16187 C-T; 16189 T-C; 
16209 T-C; 16214 C-T; 16223 C-T; 16230 A-G; 
16278 C-T; 16291 C-A; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
189 A-G; 195 T-C; 198 C-T; 
207 G-A; 247 G-A; 523 
delAC 
                      1                   
Ht_128 4 L0k1 16166 A-C; 16172 T-C; 16187 C-T; 16189 T-C; 
16209 T-C; 16214 C-T; 16223 C-T; 16230 A-G; 
16278 C-T; 16291 C-G; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
189 A-G; 195 T-C; 198 C-T; 
204 T-C; 207 G-A; 247 G-A; 
523 delAC 
              1     3                     
Ht_129 13 L0k1 16166 A-C; 16172 T-C; 16187 C-T; 16189 T-C; 
16209 T-C; 16214 C-T; 16223 C-T; 16230 A-G; 
16278 C-T; 16291 C-G; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
189 A-G; 195 T-C; 198 C-T; 
207 G-A; 247 G-A; 523 
delAC 
              1     4 7           1       
Ht_130 5 L0k1 16166 A-C; 16172 T-C; 16187 C-T; 16189 T-C; 
16209 T-C; 16214 C-T; 16223 C-T; 16230 A-G; 
16278 C-T; 16291 C-G; 16311 T-C; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
189 A-G; 195 T-C; 198 C-T; 
247 G-A; 523 delAC 
                        5                 
Ht_131 3 L1b 16126 T-C; 16187 C-T; 16189 T-C; 16223 C-T; 
16264 C-T; 16270 C-T; 16278 C-T; 16311 T-C; 
16519 T-C 
73 A-G; 152 T-C; 182 C-T; 
185 G-T; 195 T-C; 247 G-A; 
263 A-G; 357 A-G; 523 
delAC 
  1                       2               
Ht_132 1 L1c1 16129 G-A; 16172 T-C; 16173 C-T; 16188 C-A; 
16189 T-C; 16223 C-T; 16256 C-T; 16278 C-T; 
16293 A-G; 16294 C-T; 16311 T-C; 16360 C-T; 
16368 T-C; 16519 T-C 
73 A-G; 151 C-T; 152 T-C; 
182 C-T; 186 C-A; 189 A-G; 
195 T-C; 198 C-T; 247 delG; 
263 A-G; 297 A-G; 316 G-A; 
523 delAC 
                                  1       
Ht_133 2 L1c2 16129 G-A; 16187 C-T; 16189 T-C; 16214 C-T; 
16223 C-T; 16265 A-C; 16278 C-T; 16286 C-A; 
16291 C-T; 16294 C-T; 16311 T-C; 16360 C-T; 
16519 T-C; 16527 C-T  
73 A-G; 151 C-T; 152 T-C; 
182 C-T; 186 C-A; 189 A-C; 
195 T-C; 198 C-T; 247 G-A; 
263 A-G; 297 A-G; 316 G-A; 
513 G-A  
                2                         
Ht_134 1 L1c2 16172 T-C; 16187 C-T; 16189 T-C; 16223 C-T; 
16265 A-C; 16278 C-T; 16286 C-G; 16294 C-T; 
16311 T-C; 16360 C-T; 16519 T-C; 16527 C-T  
73 A-G; 151 C-T; 152 T-C; 
182 C-T; 186 C-A; 189 A-C; 
195 T-C; 198 C-T; 247 G-A; 
263 A-G; 297 A-G; 316 G-A; 
385 A-G; 471 T-C; 523 
delAC 
                              1           
Ht_135 1 L1c2 16108 C-T; 16129 G-A; 16187 C-T; 16189 T-C; 
16260 C-T; 16265 A-C; 16278 C-T; 16286 C-A; 
16294 C-T; 16311 T-C; 16360 C-T; 16519 T-C; 
16527 C-T  
73 A-G; 151 C-T; 152 T-C; 
182 C-T; 186 C-A; 189 A-C; 
195 T-C; 198 C-T; 247 G-A; 
263 A-G; 297 A-G; 316 G-A  
    1                                     
Ht_136 1 L1c2b1a 16071 C-T; 16129 G-A; 16145 G-A; 16187 C-T; 
16189 T-C; 16213 G-A; 16223 C-T; 16234 C-T; 
16265 A-C; 16278 C-T; 16286 C-G; 16294 C-T; 
16311 T-C; 16360 C-T; 16365 C-T; 16527 C-T  
73 A-G; 151 C-T; 152 T-C; 
182 C-T; 186 C-A; 189 A-C; 
195 T-C; 198 C-T; 247 G-A; 
263 A-G; 297 A-G; 316 G-A  
                        1                 
 332 
Ht_137 1 L1c3a 16129 G-A; 16183 A-C; 16189 T-C; 16215 A-G; 
16223 C-T; 16278 C-T; 16294 C-T; 16311 T-C; 
16360 C-T; 16368 T-C; 16519 T-C 
73 A-G; 152 T-C; 182 C-T; 
186 C-A; 189 A-C; 247 G-A; 
263 A-G; 316 G-A; 523 
delAC 
                            1             
Ht_138 1 L1c3a 16129 G-A; 16183 A-C; 16189 T-C; 16215 A-G; 
16223 C-T; 16278 C-T; 16294 C-T; 16311 T-C; 
16355 C-T; 16360 C-T; 16390 G-A  
73 A-G; 151 C-T; 152 T-C; 
182 C-T; 183 A-G; 186 C-A; 
189 A-C; 247 G-A; 263 A-G; 
316 G-A; 523 delAC 
                                  1       
Ht_139 1 L1c3a 16129 G-A; 16182 A-C; 16183 A-C; 16189 T-C; 
16215 A-G; 16223 C-T; 16278 C-T; 16294 C-T; 
16311 T-C; 16360 C-T; 16519 T-C 
73 A-G; 152 T-C; 182 C-T; 
186 C-A; 189 A-C; 247 G-A; 
263 A-G; 316 G-A; 523 
delAC; 573 insC 
              1                           
Ht_140 1 L2a 16093 T-C; 16223 C-T; 16278 C-T; 16294 C-T; 
16311 T-C; 16390 G-A; 16399 A-G; 16519 T-C 
73 A-G; 143 G-A; 146 T-C; 
152 T-C; 182 C-T; 195 T-C; 
263 A-G; 523 delAC 
                                  1       
Ht_141 1 L2a1 16093 T-C; 16189 T-C; 16192 C-T; 16223 C-T; 
16278 C-T; 16294 C-T; 16309 A-G; 16390 G-A; 
16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 263 A-G 
                          1               
Ht_142 3 L2a1 16189 T-C; 16223 C-T; 16278 C-T; 16294 C-T; 
16309 A-G; 16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 263 A-G 
  1 1                         1           
Ht_143 5 L2a1 16189 T-C; 16192 C-T; 16223 C-T; 16278 C-T; 
16294 C-T; 16309 A-G; 16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 263 A-G 
                      1 3         1       
Ht_144 1 L2a1 16223 C-T; 16278 C-T; 16294 C-T; 16309 A-G; 
16390 G-A; 16519 T-C 
64 C-T; 73 A-G; 146 T-C; 
152 T-C; 195 T-C; 263 A-G 
                              1           
Ht_145 5 L2a1 16223 C-T; 16278 C-T; 16286 C-T; 16294 C-T; 
16309 A-G; 16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 263 A-G 
  4                     1                 
Ht_146 2 L2a1f 16182 A-C; 16183 A-C; 16189 T-C; 16223 C-T; 
16278 C-T; 16290 C-T; 16294 C-T; 16309 A-G; 
16390 G-A; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 263 A-G 
  1                   1                   
Ht_147 9 L2a1f 16182 A-C; 16183 A-C; 16189 T-C; 16223 C-T; 
16278 C-T; 16290 C-T; 16294 C-T; 16309 A-G; 
16390 G-A  
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 263 A-G 
  2                           4   3       
Ht_148 1 L2a1f 16182 A-C; 16183 A-C; 16189 T-C; 16223 C-T; 
16278 C-T; 16290 C-T; 16294 C-T; 16309 A-G; 
16390 G-A  
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 263 A-G; 498 delC; 
523 delAC; 597 C-T  
  1                                       
Ht_149 1 L2a1f 16182 A-C; 16183 A-C; 16189 T-C; 16223 C-T; 
16266 C-T; 16278 C-T; 16290 C-T; 16294 C-T; 
16309 A-G; 16390 G-A  
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 263 A-G 
  1                                       
Ht_150 1 L2b 16114 C-A; 16129 G-A; 16213 G-A; 16223 C-T; 
16278 C-T; 16355 C-T; 16390 G-A  
73 A-G; 150 C-T; 152 T-C; 
182 C-T; 195 T-C; 198 C-T; 
204 T-C; 263 A-G; 418 C-T; 
523 delAC 
  1                                       
Ht_151 2 L2b1 16114 C-A; 16129 G-A; 16153 G-A; 16213 G-A; 
16223 C-T; 16278 C-T; 16311 T-C; 16362 T-C; 
16390 G-A  
73 A-G; 146 T-C; 150 C-T; 
152 T-C; 182 C-T; 183 A-G; 
195 T-C; 198 C-T; 204 T-C; 
263 A-G; 385 A-G; 418 C-T; 
523 delAC 
                      1       1           
 333 
Ht_152 1 L2b1 16114 C-A; 16129 G-A; 16213 G-A; 16223 C-T; 
16278 C-T; 16284 A-G; 16355 C-T; 16362 T-C; 
16390 G-A  
73 A-G; 150 C-T; 151 C-T; 
152 T-C; 182 C-T; 186 C-A; 
195 T-C; 198 C-T; 204 T-C; 
263 A-G; 418 C-T; 523 
delAC 
                        1                 
Ht_153 1 L2b2 16114 C-A; 16129 G-A; 16213 G-A; 16223 C-T; 
16274 G-A; 16278 C-T; 16390 G-A  
73 A-G; 146 T-C; 150 C-T; 
152 T-C; 182 C-T; 183 A-G; 
195 T-C; 198 C-T; 204 T-C; 
263 A-G 
  1                                       
Ht_154 1 L2c1 16223 C-T; 16264 C-T; 16278 C-T; 16390 G-A  73 A-G; 93 A-G; 146 T-C; 
150 C-T; 152 T-C; 182 C-T; 
195 T-C; 198 C-T; 263 A-G; 
325 C-T; 523 delAC 
                          1               
Ht_155 1 L2c1 16223 C-T; 16264 C-T; 16265 A-G; 16278 C-T; 
16311 T-C; 16390 G-A; 16527 C-T  
73 A-G; 93 A-G; 146 T-C; 
150 C-T; 152 T-C; 182 C-T; 
183 A-G; 195 T-C; 198 C-T; 
263 A-G; 325 C-T; 523 
delAC 
                                  1       
Ht_156 1 L3b 16124 T-C; 16223 C-T; 16278 C-T; 16362 T-C 73 A-G; 185 G-A; 189 A-G; 
249 delA; 263 A-G; 523 
delAC 
                                    1     
Ht_157 1 L3b1 16124 T-C; 16223 C-T; 16278 C-T; 16311 T-C; 
16362 T-C; 16519 T-C 
73 A-G; 263 A-G; 523 delAC                           1               
Ht_158 1 L3c 16129 G-A; 16172 T-C; 16174 C-T; 16218 C-T; 
16223 C-T; 16256 C-A; 16311 T-C; 16325 T-C; 
16362 T-C; 16519 T-C 
73 A-G; 151 C-T; 152 T-C; 
189 A-C; 195 T-C; 263 A-G; 
294 T-C; 523 delAC 
                            1             
Ht_159 4 L3d1a 16124 T-C; 16223 C-T; 16319 G-A  73 A-G; 150 C-T; 152 T-C; 
263 A-G; 523 delAC 
  1                               2     1 
Ht_160 1 L3d1a 16124 T-C; 16223 C-T; 16319 G-A  73 A-G; 150 C-T; 263 A-G; 
523 delAC 
                              1           
Ht_161 1 L3d3 16124 T-C; 16183 A-C; 16189 T-C; 16223 C-T; 
16278 C-T; 16304 T-C; 16311 T-C; 16519 T-C 
73 A-G; 152 T-C; 195 T-C; 
263 A-G; 523 delAC 
                      1                   
Ht_162 1 L3d3 16124 T-C; 16183 A-C; 16189 T-C; 16223 C-T; 
16278 C-T; 16304 T-C; 16311 T-C 
73 A-G; 152 T-C; 195 T-C; 
263 A-G; 523 delAC 
                                  1       
Ht_163 12 L3d3 16124 T-C; 16183 A-C; 16189 T-C; 16223 C-T; 
16278 C-T; 16304 T-C; 16311 T-C 
73 A-G; 152 T-C; 263 A-G; 
523 delAC 
              3           1 8             
Ht_164 1 L3d3 16124 T-C; 16183 A-C; 16189 T-C; 16223 C-T; 
16278 C-T; 16304 T-C; 16311 T-C 
73 A-G; 152 T-C; 263 A-G; 
523 delAC; 573 insC 
                            1             
Ht_165 1 L3e1 16189 T-C; 16223 C-T; 16311 T-C; 16327 C-T  73 A-G; 150 C-T; 189 A-G; 
200 A-G; 204 T-C; 263 A-G 
                          1               
Ht_166 1 L3e1 16223 C-T; 16327 C-T; 16519 T-C 73 A-G; 150 C-T; 189 A-G; 
200 A-G; 263 A-G 
                              1           
Ht_167 1 L3e1 16223 C-T; 16327 C-T  73 A-G; 150 C-T; 263 A-G                           1               
Ht_168 1 L3e1 16223 C-T; 16327 C-T  73 A-G; 150 C-T; 189 A-G; 
200 A-G; 263 A-G 
    1                                     
Ht_169 1 L3e1 16176 C-T; 16223 C-T; 16327 C-T  73 A-G; 150 C-T; 200 A-G; 
263 A-G 
                          1               
 334 
Ht_170 1 L3e1a 16185 C-T; 16223 C-T; 16311 T-C; 16519 T-C 73 A-G; 150 C-T; 185 G-A; 
189 A-G; 263 A-G 
                                  1       
Ht_171 4 L3e1b 16185 C-T; 16209 T-C; 16223 C-T; 16327 C-T  73 A-G; 150 C-T; 152 T-C; 
189 A-G; 195 T-C; 200 A-G; 
207 G-A; 263 A-G 
                        3   1             
Ht_172 1 L3e1e 16185 C-T; 16223 C-T; 16234 C-T; 16390 G-A; 
16519 T-C 
73 A-G; 150 C-T; 152 T-C; 
189 A-G; 200 A-G; 263 A-G 
                              1           
Ht_173 2 L3e1g 16223 C-T; 16325 delT; 16327 C-T  73 A-G; 150 C-T; 185 G-A; 
189 A-G; 263 A-G 
                              1   1       
Ht_174 3 L3e1g 16223 C-T; 16239 C-T; 16325 delT  73 A-G; 150 C-T; 185 G-A; 
189 A-G; 263 A-G 
  1                           1 1         
Ht_175 1 L3e1g 16188 C-T; 16223 C-T; 16239 C-T; 16325 delT  73 A-G; 150 C-T; 185 G-A; 
189 A-G; 263 A-G 
                                  1       
Ht_176 1 L3e2b 16172 T-C; 16189 T-C; 16223 C-T; 16320 C-T; 
16519 T-C 
73 A-G; 150 C-T; 152 T-C; 
195 T-C; 263 A-G 
                        1                 
Ht_177 1 L3e2b 16172 T-C; 16183 A-C; 16189 T-C; 16223 C-T; 
16320 C-T; 16519 T-C 
73 A-G; 150 C-T; 195 T-C; 
263 A-G 
                                1         
Ht_178 1 L3e2b 16172 T-C; 16183 A-C; 16189 T-C; 16223 C-T; 
16320 C-T  
73 A-G; 150 C-T; 195 T-C; 
263 A-G 
                          1               
Ht_179 1 L3e3 16223 C-T; 16265 A-C; 16519 T-C 73 A-G; 150 C-T; 195 T-C; 
263 A-G; 523 delAC; 573 
insC 
    1                                     
Ht_180 1 L3e3 16223 C-T; 16265 A-T; 16519 T-C 73 A-G; 150 C-T; 195 T-C; 
263 A-G; 523 delAC; 573 
insC 
                                1         
Ht_181 1 L3f 16209 T-C; 16223 C-T; 16311 T-C; 16519 T-C 73 A-G; 150 C-T; 189 A-G; 
200 A-G; 207 G-A; 263 A-G 
              1                           
Ht_182 5 L3f 16209 T-C; 16223 C-T; 16311 T-C; 16519 T-C 73 A-G; 150 C-T; 189 A-G; 
200 A-G; 263 A-G 
                          2 2     1       
Ht_183 1 L3f1b1 16129 G-A; 16209 T-C; 16223 C-T; 16291 C-T; 
16292 C-T; 16295 C-T; 16311 T-C; 16519 T-C 
73 A-G; 152 T-C; 189 A-G; 
200 A-G; 263 A-G; 272 A-G 
                          1               
Ht_184 1 L4b2 16051 A-G; 16114 C-T; 16189 T-C; 16192 C-T; 
16223 C-T; 16293 A-T; 16311 T-C; 16316 A-G; 
16355 C-T; 16362 T-C; 16399 A-G; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
195 T-C; 244 A-G; 263 A-G; 
340 C-T; 523 delAC 
                              1           
Ht_185 1 L4b2a2 16172 T-C; 16223 C-T; 16293 A-T; 16311 T-C; 
16327 C-T; 16355 C-T; 16362 T-C; 16399 A-G; 
16519 T-C 
73 A-G; 146 T-C; 189 A-G; 
244 A-G; 263 A-G; 391 T-C 
        1                                 
Ht_186 1 L4b2a2 16162 A-G; 16172 T-C; 16223 C-T; 16293 A-T; 
16311 T-C; 16327 C-T; 16355 C-T; 16356 T-C; 
16362 T-C; 16399 A-G; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
244 A-G; 263 A-G; 391 T-C 
                    1                     
Ht_187 1 L4b2a2 16162 A-G; 16169 C-T; 16172 T-C; 16223 C-T; 
16293 A-T; 16311 T-C; 16327 C-T; 16355 C-T; 
16356 T-C; 16362 T-C; 16399 A-G; 16519 T-C 
73 A-G; 146 T-C; 152 T-C; 
244 A-G; 263 A-G; 391 T-C 
                    1                     
Ht_188 1 L5a 16111 C-T; 16129 G-A; 16148 C-T; 16166 A-G; 
16187 C-T; 16189 T-C; 16223 C-T; 16254 A-G; 
16278 C-T; 16311 T-C; 16360 C-T  
73 A-G; 152 T-C; 182 C-T; 
195 T-C; 247 G-A; 263 A-G; 
455 insC; 523 delAC 
              1                           
 335 
Ht_189 1 L5b2 16129 G-A; 16148 C-T; 16166 A-G; 16183 delA; 
16187 C-T; 16189 T-C; 16192 C-T; 16223 C-T; 
16278 C-T; 16311 T-C; 16355 C-T; 16362 T-C 
73 A-G; 152 T-C; 182 C-T; 
247 G-A; 263 A-G; 444 A-G; 
455 insTC; 523 delAC; 527 
C-T  
                                  1       
Ht_190 2 M 16093 T-C; 16223 C-T; 16519 T-C 73 A-G; 199 T-C; 263 A-G; 
482 T-C; 489 T-C 
                                    1   1 
Ht_191 1 M 16126 T-C; 16223 C-T; 16519 T-C 73 A-G; 263 A-G; 482 T-C; 
489 T-C 
                                        1 
Ht_192 1 M 16126 T-C; 16223 C-T; 16290 C-T; 16519 T-C 73 A-G; 263 A-G; 489 T-C     1                                     
Ht_193 1 M 16145 G-A; 16174 C-T; 16223 C-T; 16343 A-G; 
16463 A-G; 16519 T-C 
73 A-G; 263 A-G; 489 T-C; 
523 delAC 
                                        1 
Ht_194 1 M 16153 G-A; 16223 C-T; 16292 C-T; 16519 T-C 73 A-G; 199 T-C; 263 A-G; 
489 T-C 
                                        1 
Ht_195 1 M 16179 delC; 16223 C-T; 16519 T-C 73 A-G; 195 T-A; 263 A-G; 
489 T-C; 523 delAC 
                                        1 
Ht_196 1 M 16223 C-T; 16519 T-C 73 A-G; 263 A-G; 489 T-C; 
523 delAC 
                                        1 
Ht_197 1 M 16223 C-T; 16519 T-C 73 A-G; 151 C-T; 152 T-C; 
239 T-C; 249 delA; 263 A-G; 
489 T-C 
        1                                 
Ht_198 1 M 16212 A-G; 16223 C-T; 16266 C-T; 16318 A-T; 
16519 T-C 
73 A-G; 93 A-G; 246 T-C; 
263 A-G; 489 T-C 
                                        1 
Ht_199 1 M 16185 C-T; 16223 C-T; 16519 T-C 73 A-G; 195 T-A; 204 T-C; 
263 A-G; 489 T-C; 523 
delAC 
                                        1 
Ht_200 1 M 16184 C-T; 16223 C-T; 16234 C-G; 16519 T-C 73 A-G; 195 T-C; 198 C-T; 
204 T-C; 263 A-G; 489 T-C 
  1                                       
Ht_201 1 M 16183 A-G; 16223 C-T; 16320 C-T; 16325 T-C; 
16519 T-C 
73 A-G; 194 C-T; 195 T-A; 
263 A-G; 489 T-C; 523 
delAC 
                                        1 
Ht_202 1 M_D 16223 C-T; 16291 C-T; 16362 T-C; 16390 G-A; 
16519 T-C 
73 A-G; 119 T-C; 121 G-A; 
263 A-G; 489 T-C 
    1                                     
Ht_203 1 M_G2 16086 T-C; 16172 T-C; 16189 T-C; 16223 C-T; 
16227 A-G; 16278 C-T; 16362 T-C 
73 A-G; 263 A-G; 489 T-C                                         1 
Ht_204 1 M_M2a 16223 C-T; 16270 C-T; 16319 G-A; 16352 T-C; 
16519 T-C 
73 A-G; 195 T-C; 204 T-C; 
207 G-A; 263 A-G; 447 C-G; 
489 T-C 
  1                                       
Ht_205 1 M_M4a 16145 G-A; 16176 C-T; 16223 C-T; 16261 C-T; 
16311 T-C; 16519 T-C 
73 A-G; 263 A-G; 489 T-C; 
508 A-G 
                                        1 
Ht_206 1 M_M7c 16223 C-T; 16295 C-T; 16519 T-C 73 A-G; 194 C-T; 263 A-G; 
489 T-C 
                                        1 
Ht_207 1 M_M7c / 
D 
16223 C-T; 16295 C-T; 16362 T-C; 16519 T-C 73 A-G; 146 T-C; 199 T-C; 
263 A-G; 489 T-C 
                                    1     
Ht_208 2 N 16223 C-T; 16263 T-C; 16274 G-A; 16311 T-C; 
16318 A-C; 16343 A-G; 16357 T-C; 16519 T-C 
73 A-G; 152 T-C; 263 A-G   2                                       
 336 
Ht_209 1 N_N1a 16086 T-C; 16147 C-A; 16223 C-T; 16248 C-T; 
16320 C-T; 16355 C-T; 16519 T-C 
73 A-G; 152 T-C; 199 T-C; 
204 T-C; 207 G-A; 263 A-G; 
573 insC 
                                    1     
Ht_210 1 N_W 16145 G-A; 16189 T-C; 16223 C-T; 16292 C-T; 
16320 C-T; 16519 T-C 
73 A-G; 143 G-A; 189 A-G; 
194 C-T; 195 T-C; 196 T-C; 
204 T-C; 207 G-A; 263 A-G 
                                        1 
Ht_211 1 N_W 16223 C-T; 16292 C-T; 16295 C-T; 16324 T-C; 
16519 T-C 
73 A-G; 189 A-G; 195 T-C; 
204 T-C; 207 G-A; 263 A-G 
                                    1     
Ht_212 1 N_W 16223 C-T; 16259 C-T; 16288 T-C; 16292 C-T; 
16519 T-C 
73 A-G; 152 T-C; 189 A-G; 
195 T-C; 204 T-C; 207 G-A; 
263 A-G 
    1                                     
Ht_213 1 R 16356 T-C 150 C-T; 189 A-G; 263 A-G; 
298 C-T; 337 A-G; 594 C-T  
                                        1 
Ht_214 1 R 16147 C-T; 16183 A-C; 16184 C-A; 16189 T-C; 
16217 T-C; 16235 A-G; 16519 T-C 
73 A-G; 263 A-G                                         1 
Ht_215 1 R_H 16093 T-C; 16221 C-T; 16519 T-C 263 A-G                                       1   
Ht_216 1 R_H 16129 G-A; 16519 T-C 263 A-G                                     1     
Ht_217 2 R_H 16189 T-C; 16311 T-C; 16519 T-C 263 A-G; 327 C-T                                      1 1   
Ht_218 3 R_H 16311 T-C; 16519 T-C 263 A-G                                     3     
Ht_219 1 R_H 16311 T-C; 16519 T-C 93 A-G; 263 A-G                                       1   
Ht_220 1 R_H 16356 T-C; 16519 T-C 263 A-G                                     1     
Ht_221 1 R_H 16519 T-C 263 A-G                                       1   
Ht_222 1 R_H 16256 C-T; 16519 T-C 263 A-G                                     1     
Ht_223 1 R_H 16239 C-T; 16519 T-C 263 A-G                                       1   
Ht_224 1 R_H 16183 A-C; 16189 T-C; 16319 G-A; 16356 T-C; 
16519 T-C 
263 A-G                                       1   
Ht_225 1 R_J 16069 C-T; 16093 T-C; 16126 T-C 73 A-G; 263 A-G; 295 C-T; 
462 C-T; 489 T-C; 524 insAC 
                                      1   
Ht_226 1 R_J 16069 C-T; 16126 T-C; 16519 T-C 73 A-G; 146 T-C; 185 G-A; 
188 A-G; 198 C-T; 263 A-G; 
295 C-T; 462 C-T; 489 T-C 
                                      1   
Ht_227 1 R_J 16069 C-T; 16126 T-C; 16193 C-T; 16217 T-C 73 A-G; 150 C-T; 152 T-C; 
263 A-G; 295 C-T; 489 T-C 
                                    1     
Ht_228 1 R_J 16069 C-T; 16126 T-C; 16138 A-C; 16519 T-C 73 A-G; 185 G-A; 188 A-G; 
228 G-A; 263 A-G; 295 C-T; 
462 C-T; 489 T-C 
                                      1   
 337 
Ht_229 1 R_K 16224 T-C; 16311 T-C; 16519 T-C 73 A-G; 195 T-C; 263 A-G; 
417 G-A; 497 C-T; 525 
insGC 
                                    1     
Ht_230 1 R_K 16224 T-C; 16293 A-G; 16311 T-C; 16519 T-C 73 A-G; 146 T-C; 152 T-C; 
263 A-G 
                                    1     
Ht_231 1 R_R5 16266 C-T; 16297 T-C; 16304 T-C; 16311 T-C; 
16355 C-T; 16356 T-C; 16524 A-G 
73 A-G; 152 T-C; 263 A-G; 
523 delAC 
                                        1 
Ht_232 1 R_R9a 16220 A-C; 16265 A-G; 16298 T-C; 16362 T-C 73 A-G; 150 C-T; 152 T-C; 
200 A-G; 249 delA; 263 A-G 
  1                                       
Ht_233 1 R_U 16126 T-C; 16181 A-G; 16209 T-C 73 A-G; 222 C-T; 228 G-A; 
263 A-G 
                                        1 
Ht_234 1 R_U 16311 T-C; 16390 G-A; 16519 T-C 73 A-G; 146 T-C; 195 T-C; 
263 A-G 
    1                                     
Ht_235 1 R_U CRS 73 A-G; 263 A-G; 296 C-T; 
523 delAC 
                                    1     
Ht_236 1 R_U 16292 C-T; 16497 A-G; 16519 T-C 73 A-G; 152 T-C; 263 A-G; 
373 A-G 
                                        1 
Ht_237 1 R_U 16292 C-T; 16497 A-G; 16519 T-C 73 A-G; 263 A-G; 373 A-G                                         1 
Ht_238 1 R_U 16242 C-T; 16292 C-T; 16497 A-G; 16519 T-C 73 A-G; 263 A-G; 373 A-G                                         1 
Ht_239 1 R_U 16192 C-T; 16291 C-T; 16294 C-T; 16311 T-C; 
16390 G-A; 16519 T-C 
73 A-G; 263 A-G         1                                 
Ht_240 1 R_U2 16051 A-G; 16093 T-G; 16154 T-C; 16206 A-C; 
16230 A-G; 16311 T-C 
73 A-G; 263 A-G                                         1 
Ht_241 1 R_U2 16051 A-G; 16154 T-C; 16206 A-C; 16230 A-G; 
16311 T-C; 16519 T-C 
73 A-G; 263 A-G                                         1 
Ht_242 1 R_U2a 16051 A-G; 16093 T-A; 16154 T-C; 16206 A-C; 
16230 A-G; 16311 T-C 
73 A-G; 263 A-G; 472 A-G                                         1 
Ht_243 1 R_U2b 16051 A-G; 16209 T-C; 16239 C-T; 16352 T-C; 
16353 C-T  
73 A-G; 146 T-C; 152 T-C; 
234 A-G; 263 A-G 
  1                                       
Ht_244 1 R_U5a1a 16256 C-T; 16270 C-T; 16362 T-C; 16399 A-G 73 A-G; 185 G-A; 189 A-G; 
204 T-C; 207 G-A; 263 A-G 
                                    1     
Ht_245 1 R_U5a1a 16256 C-T; 16270 C-T; 16399 A-G 73 A-G; 263 A-G                                     1     
Ht_246 1 R_U5a1a 16192 C-T; 16256 C-T; 16270 C-T; 16362 T-C; 
16399 A-G; 16428 G-A  
73 A-G; 263 A-G                                       1   
Ht_247 1 R_V 16298 T-C; 16311 T-C 72 T-C; 195 T-C; 263 A-G                                       1   
    
                                                
Total 540       30 77 20 57 40 3 1 28 22 2 42 49 18 14 15 22 5 36 21 11 25 
 
 338 
Appendix F: Graphs ? Physical vs. Genetic distance (L0d/k sequences and L0d sequences) 
 
 
 
 
 
 
 
 339 
Appendix G: Haplotype list of 12 marker Y-STR panel 
 
HT HG DYS19 DYS390 DYS391 DYS392 DYS393 DYS385a DYS385b DYS389I DYS389II DYS437 DYS438 DYS439 KAR COL CAC KHO CNC XEG NAM GUG NAR JOH XUN KWE DRC HER SOT SWZ ZUX AFR EUR IND TOT 
Ht001 A-M114 12 21 10 14 11 15 17 13 27 16 10 14    1                  1 
Ht002 A-M114 12 21 10 14 11 15 18 13 27 16 10 13          2            2 
Ht003 A-M114 12 21 10 14 11 15 18 13 27 16 10 14          1            1 
Ht004 A-M114 12 21 10 14 11 17 17 13 27 16 10 11           1           1 
Ht005 A-M114 12 21 10 15 11 17 17 13 27 16 10 11           1           1 
Ht006 A-M14 12 21 10 13 11 14 17 14 29 15 10 11           1           1 
Ht007 A-M14 12 21 10 13 11 15 17 14 29 16 11 11          1            1 
Ht008 A-M14 12 21 10 13 11 16 16 14 29 16 10 10          1            1 
Ht009 A-M14 12 22 10 14 11 17 18 13 27 16 9 13         1             1 
Ht010 A-M51 14 21 10 10 13 14 18 13 30 15 11 12 1                     1 
Ht011 A-M51 14 23 10 11 13 14.2 15 14 30 14 9 11            1          1 
Ht012 A-M51 15 18 10 10 13 14 16 13 29 15 11 13    1                  1 
Ht013 A-M51 15 18 10 10 13 15 16 12 30 15 10 11    2                  2 
Ht014 A-M51 15 19 10 10 13 15 15 13 30 17 11 11     1                 1 
Ht015 A-M51 15 19 10 10 13 15 16 12 29 16 11 11     1                 1 
Ht016 A-M51 15 19 10 10 13 15 16 13 30 15 11 11           3           3 
Ht017 A-M51 15 20 10 10 13 15 16 12 29 16 11 11     1                 1 
Ht018 A-M51 15 21 10 10 13 14 17 12 29 15 11 12  1                    1 
Ht019 A-M51 15 22 10 10 13 14 19 14 30 15 12 12                 1     1 
Ht020 A-M51 15 22 10 10 13 16 17 14 31 15 12 13 1                     1 
Ht021 A-M51 16 18 10 10 13 14 16 12 30 14 11 10           1           1 
Ht022 A-M51 16 18 11 10 13 14 18 12 28 14 11 10          1            1 
Ht023 A-M51 16 19 10 10 13 15 16 13 30 14 11 11     1                 1 
Ht024 A-M51 16 19 10 9 13 14 20 12 28 14 11 10          1            1 
Ht025 A-M51 16 19 11 10 13 14 15 13 29 14 9 11               1       1 
Ht026 A-M51 16 20 11 10 13 15 17 12 30 17 11 11    1                  1 
Ht027 A-M51 16 22 10 10 13 15 17 13 30 14 12 13    1   1               2 
Ht028 A-M51 16 22 11 10 13 17 17 12 29 14 11 12     1   1              2 
Ht029 A-M51 16 22 11 10 13 17 18 12 29 14 11 12               1       1 
Ht030 A-M51 16 22 11 11 14 13.2 14.2 13 30 14 11 11           4           4 
Ht031 A-M51 16 22 11 11 14 13.2 14.2 13 30 14 11 12           4           4 
 340 
Ht032 A-M51 16 22 11 11 14 13.2 14.2 13 31 14 11 12           1           1 
Ht033 A-M51 17 18 11 10 13 14 18 12 28 14 11 10          3            3 
Ht034 A-M51 17 19 10 9 13 14 19 12 28 14 11 10          2            2 
Ht035 A-M51 17 19 10 9 13 14 19 12 28 14 9 11           1           1 
Ht036 A-M51 17 20 10 10 13 14 15 13 29 14 11 12           2           2 
Ht037 A-M51 17 22 10 10 13 15 17 13 30 14 11 12       1               1 
Ht038 A-M51 17 22 10 10 13 15 17 13 30 14 12 11       1               1 
Ht039 A-M51 17 22 10 10 13 15 17 13 30 14 12 12    1                  1 
Ht040 A-M51 17 22 10 10 13 15 17 13 30 14 12 13     2                 2 
Ht041 A-M51 17 22 10 10 13 15 18 13 30 14 12 13    3 1                 4 
Ht042 A-M51 17 22 11 11 14 13.2 14.2 13 30 14 11 11           1           1 
Ht043 A-P28 13 21 10 13 11 14 17 12 26 16 10 12           1           1 
Ht044 A-P28 13 21 10 13 11 14 18 12 26 16 10 12           1           1 
Ht045 A-P28 13 21 10 13 11 15 16 13 28 16 10 12          1            1 
Ht046 A-P28 13 21 10 13 11 15 17 13 28 16 10 12       1   2            3 
Ht047 A-P28 13 21 10 13 11 16 17 13 28 16 10 12          1            1 
Ht048 B-M112 16 24 10 11 13 11 13 12 28 15 10 12           1           1 
Ht049 B-M152 15 23 10 11 13 11 11 14 32 14 10 13                 1     1 
Ht050 B-M152 15 24 10 11 13 11 11 13 31 14 10 12  1                    1 
Ht051 B-M152 15 24 10 11 13 11 11 14 32 14 10 12 1     1  9         1     12 
Ht052 B-M152 15 24 10 11 13 11 11 14 33 14 10 12               1       1 
Ht053 B-M152 15 24 10 11 13 11 12 14 32 14 10 12  1                    1 
Ht054 B-M152 15 24 10 12 13 11 12 14 33 14 10 11    1                  1 
Ht055 B-M152 15 25 10 11 13 11 11 13 31 14 10 12               1       1 
Ht056 B-M152 16 25 10 11 13 11 11 13 31 15 10 12                 1     1 
Ht057 B-P6 14 24 10 11 13 14 14 13 29 15 10 13  1                    1 
Ht058 B-P6 15 23 11 11 12 11 14 13 25 15 11 10          2            2 
Ht059 B-P6 15 24 10 11 12 11 14 14 26 15 11 10          1            1 
Ht060 B-P6 15 24 10 11 12 12 14 13 25 15 11 10           2           2 
Ht061 B-P6 15 24 11 11 12 11 15 13 25 15 11 10          2            2 
Ht062 B-P6 15 25 10 11 13 14 15 13 29 14 11 12              1        1 
Ht063 B-P6 16 24 10 11 12 12 14 14 26 14 10 12          4            4 
Ht064 B-P8 16 20 10 10 13 15 16 14 29 15 10 miss           1           1 
Ht065 B-P8 16 20 9 11 13 15 16 14 30 15 10 miss          1            1 
Ht066 B-P8 17 21 8 11 13 15 16 13 29 15 10 miss           1           1 
 341 
Ht067 C* 15 24 10 11 13 13 16 12 29 14 9 11   1                   1 
Ht068 E-M154 15 21 10 11 13 16 16 12 29 14 11 11               1       1 
Ht069 E-M154 15 21 10 11 13 17 17 12 30 14 11 11            2          2 
Ht070 E-M154 16 21 10 11 13 16 17 12 29 14 11 12                 1     1 
Ht071 E-M154 16 21 10 11 13 17 17 12 29 14 11 12               1       1 
Ht072 E-M154 16 21 11 11 13 16 17 12 29 14 11 12                 2     2 
Ht073 E-M191 14 21 10 11 14 17 19 13 29 14 10 11       1               1 
Ht074 E-M191 14 21 10 11 15 17 20 14 31 14 11 12               1       1 
Ht075 E-M191 15 21 10 10 14 17 21 13 29 14 11 11              1        1 
Ht076 E-M191 15 21 10 11 14 17 18 12 29 14 11 11             1         1 
Ht077 E-M191 15 21 10 11 14 17 19 13 29 14 10 11       1               1 
Ht078 E-M191 15 21 10 11 14 17 19 13 30 13 11 11              1        1 
Ht079 E-M191 15 21 10 11 14 17 20 13 29 14 11 11       1               1 
Ht080 E-M191 15 21 10 11 15 17 18 13 30 14 11 13       1    1           2 
Ht081 E-M191 15 21 10 11 15 17 19 13 31 14 11 12                 1     1 
Ht082 E-M191 15 21 10 11 15 17 20 13 30 14 11 12               1       1 
Ht083 E-M191 15 21 10 11 16 17 17 13 30 14 11 12                 1     1 
Ht084 E-M191 15 22 10 11 14 17 18 13 30 14 11 12               1       1 
Ht085 E-M191 15 22 10 11 15 17 17 13 30 14 11 11  1                    1 
Ht086 E-M191 16 21 10 11 14 16 18 13 30 14 11 12               1       1 
Ht087 E-M191 16 21 10 11 14 16 18 13 30 14 11 13       1               1 
Ht088 E-M191 16 21 10 11 14 17 20 13 29 14 11 11              1        1 
Ht089 E-M191 16 21 10 11 14 19 21 13 30 14 11 13             1         1 
Ht090 E-M191 16 21 10 11 15 16 18 13 30 14 11 11               1  1     2 
Ht091 E-M191 16 21 10 11 15 16 20 14 31 14 11 12 1 1                    2 
Ht092 E-M191 16 21 10 11 15 17 18 13 30 14 11 12 1                     1 
Ht093 E-M191 16 21 10 11 15 17 19 12 28 14 11 12     1                 1 
Ht094 E-M191 16 21 10 11 15 17 19 13 29 14 11 10               1       1 
Ht095 E-M191 16 21 10 11 15 17 19 13 29 14 11 11        1              1 
Ht096 E-M191 16 21 10 11 15 17 20 14 31 14 11 12  1                    1 
Ht097 E-M191 16 21 10 11 15 17 20 15 32 14 11 12                 1     1 
Ht098 E-M191 16 21 10 11 15 17 21 14 31 14 11 12  1                    1 
Ht099 E-M191 16 21 10 11 15 18 19 12 29 14 10 12           1           1 
Ht100 E-M191 16 21 10 11 15 18 19 13 29 14 11 12              1        1 
Ht101 E-M191 16 21 10 12 14 16 18 13 30 14 11 12                 1     1 
 342 
Ht102 E-M191 16 21 10 12 15 16 18 13 30 14 11 12               1       1 
Ht103 E-M191 16 21 10 12 15 16 18 13 31 14 11 12                 1     1 
Ht104 E-M191 16 21 10 12 15 16 19 13 30 14 11 11                1      1 
Ht105 E-M191 16 21 10 12 15 16 19 13 30 14 11 12                1      1 
Ht106 E-M191 16 21 10 12 15 18 18 13 30 14 11 12                 1     1 
Ht107 E-M191 16 22 10 11 14 17 17 12 29 14 11 12  1                    1 
Ht108 E-M191 17 20 10 11 15 17 18 13 29 14 11 12           1           1 
Ht109 E-M191 17 21 10 11 14 17 17 13 30 13 11 12             1         1 
Ht110 E-M191 17 21 10 11 15 16 17 13 30 13 11 12             1         1 
Ht111 E-M191 17 21 10 11 15 17 19 14 31 14 11 11  1                    1 
Ht112 E-M191 17 21 10 11 15 17 19 14 31 14 11 12           1           1 
Ht113 E-M191 17 21 10 11 15 17 20 12 28 14 11 13              1        1 
Ht114 E-M191 17 21 10 11 15 18 18 12 28 14 11 12     1                 1 
Ht115 E-M191 17 21 10 11 15 18 19 13 30 13 11 12             1         1 
Ht116 E-M191 17 21 10 11 16 17 19 13 30 14 11 11               1       1 
Ht117 E-M191 17 21 10 9 13 17 17 13 30 14 11 12 1                     1 
Ht118 E-M191 17 21 10 9 14 16 17 13 31 14 11 12    1                  1 
Ht119 E-M2 15 21 10 11 13 14 19 13 32 14 11 12  1                    1 
Ht120 E-M2 15 21 10 11 13 15 16 14 32 14 11 12  2  2                  4 
Ht121 E-M2 15 21 10 11 13 15 17 13 31 14 11 11                 1     1 
Ht122 E-M2 15 21 10 11 13 15 17 14 30 14 11 12  1                    1 
Ht123 E-M2 15 21 10 11 13 15 17 14 31 14 11 12        3     1         4 
Ht124 E-M2 15 21 10 11 13 15 17 14 31 14 11 13   1                   1 
Ht125 E-M2 15 21 10 11 13 15 18 12 29 14 11 12               1       1 
Ht126 E-M2 15 21 10 11 13 15 18 13 31 14 11 12             1         1 
Ht127 E-M2 15 21 10 11 13 15 18 14 30 14 11 12                 1     1 
Ht128 E-M2 15 21 10 11 13 15 18 14 31 14 11 12  1                    1 
Ht129 E-M2 15 21 10 11 13 15 20 13 30 14 11 13            1          1 
Ht130 E-M2 15 21 10 11 13 16 17 13 30 14 11 12    1                  1 
Ht131 E-M2 15 21 10 11 13 16 17 13 31 14 11 11             1         1 
Ht132 E-M2 15 21 10 11 13 16 17 13 31 14 11 12      1                1 
Ht133 E-M2 15 21 10 11 13 16 17 13 31 14 12 12               1       1 
Ht134 E-M2 15 21 10 11 13 16 17 14 32 14 11 11           1           1 
Ht135 E-M2 15 21 10 11 13 16 19 13 32 14 11 11     1                 1 
Ht136 E-M2 15 21 10 11 13 17 17 13 31 14 11 11        1              1 
 343 
Ht137 E-M2 15 21 10 11 13 17 17 13 31 14 11 12     2                 2 
Ht138 E-M2 15 21 10 11 13 17 17 13 32 14 11 12                 1     1 
Ht139 E-M2 15 21 10 11 13 17 18 12 30 14 11 12        2              2 
Ht140 E-M2 15 21 10 11 13 17 18 13 31 14 11 12     1        1         2 
Ht141 E-M2 15 21 10 11 13 18 18 13 30 14 11 12 1                     1 
Ht142 E-M2 15 21 10 11 14 15 18 12 30 14 11 12              1        1 
Ht143 E-M2 15 21 10 11 14 15 19 13 30 14 11 13        2              2 
Ht144 E-M2 15 21 10 11 14 16 16 13 29 14 11 12                 1     1 
Ht145 E-M2 15 21 10 11 14 16 16 13 30 14 11 12                 1     1 
Ht146 E-M2 15 21 10 11 14 16 17 13 31 14 11 11           1           1 
Ht147 E-M2 15 21 10 11 14 16 18 13 30 14 11 13  1                    1 
Ht148 E-M2 15 21 11 11 13 15 17 12 30 14 11 11           1           1 
Ht149 E-M2 15 21 11 11 13 16 16 13 30 14 11 11           2  1         3 
Ht150 E-M2 15 21 11 11 13 16 17 13 30 14 11 11               1       1 
Ht151 E-M2 15 21 11 11 13 16 17 13 30 14 11 12          1            1 
Ht152 E-M2 15 21 11 11 13 16 17 13 31 14 11 11           1  1         2 
Ht153 E-M2 15 21 11 11 13 16 17 13 31 14 11 12                 1     1 
Ht154 E-M2 15 21 11 11 13 16 18 13 31 14 10 11               1       1 
Ht155 E-M2 15 21 11 11 13 17 17 13 31 14 11 11             1         1 
Ht156 E-M2 15 21 11 11 13 17 18 13 31 14 11 12            1          1 
Ht157 E-M2 15 22 10 11 13 16 17 13 32 14 11 11 3                     3 
Ht158 E-M2 16 21 10 11 13 15 17 14 31 13 11 12  1                    1 
Ht159 E-M2 16 21 10 11 13 15 20 14 31 14 11 12                 1     1 
Ht160 E-M2 16 21 10 11 13 16 16 13 30 14 11 11     1                 1 
Ht161 E-M2 16 21 10 11 13 16 17 13 31 14 12 12 1     1           1     3 
Ht162 E-M2 16 21 10 11 14 15 20 14 31 14 11 12  1                    1 
Ht163 E-M2 16 21 10 11 14 15 20 14 31 14 11 13                 1     1 
Ht164 E-M2 16 21 10 12 13 15 18 13 30 14 11 12            1          1 
Ht165 E-M2 16 21 11 11 13 16 16 13 31 14 11 11           1           1 
Ht166 E-M2 16 21 11 11 13 16 17 13 31 14 11 11           1           1 
Ht167 E-M2 16 21 11 11 13 16 17 13 31 14 12 12                 1     1 
Ht168 E-M2 16 21 11 11 13 17 17 13 31 14 11 11           1           1 
Ht169 E-M2 17 21 10 11 13 14 19 13 32 14 11 12               1       1 
Ht170 E-M2 17 21 10 11 13 16 17 13 31 14 12 13  1                    1 
Ht171 E-M2 17 21 11 11 13 16 16 13 30 14 11 11              1        1 
 344 
Ht172 E-M34 13 25 10 11 13 16 16 13 31 14 10 12    1                  1 
Ht173 E-M35 13 23 10 11 14 16 16 10 27 14 10 11           1           1 
Ht174 E-M35 13 23 11 11 14 16 16 10 27 14 10 11    1       1           2 
Ht175 E-M35 13 23 11 11 14 16 17 10 27 14 10 11    1                  1 
Ht176 E-M35 13 24 10 11 14 15 17 10 27 14 10 11    1                  1 
Ht177 E-M35 13 24 10 11 14 16 16 10 27 14 10 11    2 1  1               4 
Ht178 E-M35 13 24 10 11 14 16 17 10 27 14 10 11     1  1    1           3 
Ht179 E-M35 13 24 11 11 14 16 16 10 27 14 10 10  1                    1 
Ht180 E-M35 13 24 11 11 14 16 16 10 27 14 10 11            1          1 
Ht181 E-M35 13 24 11 11 14 16 16 10 27 14 10 13    2                  2 
Ht182 E-M35 13 24 11 11 14 16 17 10 27 14 10 11       1               1 
Ht183 E-M35 13 24 11 11 14 16 17 10 27 14 10 12    1                  1 
Ht184 E-M35 13 24 11 11 14 16 21 10 27 14 10 11  1                    1 
Ht185 E-M35 13 24 11 12 14 16 16 13 30 14 10 13           1           1 
Ht186 E-M35 13 24 12 12 14 16 16 10 27 14 10 11    1                  1 
Ht187 E-M35 13 24 8 11 13 16 16 10 28 14 10 11            4          4 
Ht188 E-M35 13 25 11 11 14 16 16 11 28 14 10 11           1           1 
Ht189 E-M35 14 24 10 11 14 16 16 14 31 14 10 12           2           2 
Ht190 E-M35 14 24 11 11 14 16 17 14 31 14 10 12            1          1 
Ht191 E-M35 14 24 11 11 14 17 17 10 27 14 10 13          1            1 
Ht192 E-M35 14 25 11 11 14 16 17 10 27 14 10 11    1                  1 
Ht193 E-M58 15 21 11 10 14 15 16 13 30 14 11 12             1         1 
Ht194 E-M58 15 21 11 11 14 16 16 12 29 14 11 12    1                  1 
Ht195 E-M58 15 21 11 11 14 16 16 13 30 14 11 13  1                    1 
Ht196 E-M58 15 21 11 11 14 16 17 12 29 14 11 12  1                    1 
Ht197 E-M58 15 21 11 12 14 16 16 13 30 14 11 12              1        1 
Ht198 E-M58 16 21 10 11 14 16 16 13 30 14 11 12              2        2 
Ht199 E-M58 16 21 11 11 14 14 16 13 30 14 11 13              1        1 
Ht200 E-M75 14 23 11 11 14 14 21 12 28 14 11 11           1           1 
Ht201 E-M78 13 24 10 11 13 16 18 12 30 14 10 14                  1    1 
Ht202 E-M78 13 24 11 11 13 16 18 13 30 14 10 12 1                     1 
Ht203 E-M78 14 25 10 11 13 16 18 13 30 14 10 12     1                 1 
Ht204 E-M85 14 24 10 11 13 14 20 12 28 14 11 11             1         1 
Ht205 E-M85 14 24 11 11 13 14 19 12 29 14 11 11            1          1 
Ht206 E-M85 14 25 10 11 13 14 20 12 28 14 11 11              1 1  1     3 
 345 
Ht207 E-M85 14 25 10 11 13 16 20 12 28 14 10 11                 2     2 
Ht208 E-M85 14 25 11 11 13 14 19 12 28 14 11 11 3                     3 
Ht209 E-M85 14 26 10 11 13 13 14 12 28 14 11 11                 1     1 
Ht210 E-M85 14 26 10 11 13 14 20 12 28 14 11 11  1                    1 
Ht211 E-M85 14 26 10 11 13 15 20 12 28 14 11 11  1               2     3 
Ht212 E-M85 14 27 10 11 13 15 20 12 28 14 11 11       1               1 
Ht213 H-M69 15 25 11 11 13 12 17 12 27 14 10 8                    1 1 
Ht214 I-M170 14 22 10 11 13 13 15 12 29 16 10 12     1                 1 
Ht215 I-M170 14 22 10 11 13 13 16 12 29 16 10 11                  1    1 
Ht216 I-M170 15 22 10 11 13 13 13 12 28 16 10 11    1                  1 
Ht217 I-M170 15 23 10 11 14 15 15 13 30 14 10 11     1                 1 
Ht218 I-M170 15 23 10 11 14 15 15 14 31 14 10 11  1                    1 
Ht219 I-M170 16 24 11 11 13 14 15 13 31 15 10 13 1                     1 
Ht220 J-M172 14 23 10 11 13 13 16 14 30 15 9 11    1                  1 
Ht221 J-M172 14 23 10 11 13 13 17 14 30 15 9 11    1                  1 
Ht222 J-M172 14 23 9 11 12 13 16 13 29 15 9 10                    1 1 
Ht223 J-M172 14 24 10 11 12 13 14 13 29 15 9 11                    1 1 
Ht224 J-M172 15 24 10 11 12 12 17 12 28 15 9 11                    1 1 
Ht225 J-M172 16 24 10 11 12 13 17 12 29 16 9 12                    1 1 
Ht226 J-M172 17 24 10 11 12 13 17 12 28 16 9 12  1                    1 
Ht227 J-p12f2 14 21 10 11 14 15 17 13 29 14 10 12  1                    1 
Ht228 K2 15 25 9 13 13 12 17 12 29 14 10 12               1       1 
Ht229 L-M11 14 22 10 14 11 13 17 12 29 15 10 14                    1 1 
Ht230 L-M11 14 22 10 14 11 13 18 12 27 15 10 12                    1 1 
Ht231 P, Q-M74 15 24 10 14 14 13 17 12 29 14 11 11                    1 1 
Ht232 R-M124 14 23 10 10 14 13 18 13 29 16 11 13                    1 1 
Ht233 R-M17 16 24 11 11 13 11 15 13 30 14 11 10                    1 1 
Ht234 R-M17 16 25 10 11 13 11 14 13 29 14 11 11    1                  1 
Ht235 R-M17 17 25 11 11 13 11 14 13 31 14 11 10                    1 1 
Ht236 R-M198 15 25 10 11 13 11 14 13 29 14 11 10 1                     1 
Ht237 R-M198 15 25 10 11 13 11 14 13 30 14 11 10                  1    1 
Ht238 R-M198 16 25 10 11 14 12 14 13 30 14 11 10              1        1 
Ht239 R-M207* 
14 23 10 10 15 14 20 13 32 16 11 10  1                    1 
Ht240 R-M343 
13 24 11 13 13 11 14 13 29 14 12 14  1                    1 
Ht241 R-M343 
14 23 10 13 12 11 14 13 29 16 12 13                   1   1 
 346 
Ht242 R-M343 
14 23 10 13 13 10 14 13 29 15 12 12    2                  2 
Ht243 R-M343 
14 23 10 13 13 11 14 13 28 15 12 11                  1    1 
Ht244 R-M343 
14 23 10 13 13 11 14 13 29 15 12 12              1     1   2 
Ht245 R-M343 
14 23 11 13 13 11 14 13 29 15 12 11  1                1    2 
Ht246 R-M343 
14 23 11 13 13 11 15 13 28 15 12 12  1                    1 
Ht247 R-M343 
14 23 11 13 13 11 15 13 29 15 12 12   1                   1 
Ht248 R-M343 
14 24 10 13 13 11 12 12 30 15 12 12                  1    1 
Ht249 R-M343 
14 24 10 13 13 11 13 13 30 15 12 11                  1    1 
Ht250 R-M343 
14 24 10 13 13 11 14 13 29 14 12 12                  1    1 
Ht251 R-M343 
14 24 10 13 13 12 15 13 29 15 12 12     1                 1 
Ht252 R-M343 
14 24 10 13 14 11 15 13 29 15 12 12    1                  1 
Ht253 R-M343 
14 24 10 13 14 11 15 13 29 15 13 12                   1   1 
Ht254 R-M343 
14 24 11 13 12 11 13 13 29 14 11 11  1                    1 
Ht255 R-M343 
14 24 11 13 13 11 11 13 29 15 11 13       1               1 
Ht256 R-M343 
14 24 11 13 13 11 13 13 29 16 12 11    1                  1 
Ht257 R-M343 
14 24 11 13 13 11 14 12 28 15 12 12                  1    1 
Ht258 R-M343 
14 24 11 13 13 11 14 13 29 14 11 12                  1    1 
Ht259 R-M343 
14 24 11 13 13 11 14 13 29 15 12 11 1                 1    2 
Ht260 R-M343 
14 24 11 13 13 11 14 13 29 15 12 12  1  1                  2 
Ht261 R-M343 
14 24 11 13 13 11 14 14 30 15 12 12 1                     1 
Ht262 R-M343 
14 24 11 13 13 12 14 13 29 15 12 11  1                    1 
Ht263 R-M343 
14 24 11 13 13 12 15 13 30 15 12 11                  1    1 
Ht264 R-M343 
14 25 11 13 13 11 14 14 30 15 12 11                  1    1 
Ht265 R-M343 
15 23 11 13 13 11 14 13 30 14 12 11         1             1 
Ht266 R-M343 
15 24 11 13 13 11 13 16 32 15 12 12     1                 1 
Ht267 R-M343 
15 24 11 13 13 11 14 13 29 14 12 11    1                  1 
Ht268 R-M343 
15 25 10 13 13 11 14 13 29 15 11 13         1                               1 
TOTAL              19 35 3 37 23 3 14 19 2 28 48 13 14 15 21 2 30 13 3 11 353 
 347 
Appendix H: Bar charts showing haplotype frequencies for 44 inferred 
short haplotypes 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 348 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
01-01 01-02 
04-02 04-01 
03-02 03-01 
02-02 02-01 
 349 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
07-02 07-01 
06-02 06-01 
05-01 05-02 
08-01 08-02 
 350 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
09-01 09-02 
10-01 10-02 
12-02 12-01 
11-02 11-01 
 351 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13-01 13-02 
16-02 16-01 
15-02 15-01 
14-02 14-01 
 352 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20-02 20-01 
19-02 19-01 
18-02 18-01 
17-01 17-02 
 353 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21-01 21-02 
22-01 22-02