Subramoney, Kathleen2025-10-272024Subramoney, Kathleen . (2024). Molecular epidemiology and characteristics of immune adaptations across the SARS-CoV-2 spike glycoproteins from Gauteng, South Africa, 2020 to 2022 [PhD thesis, University of the Witwatersrand, Johannesburg]. WIReDSpace.https://hdl.handle.net/10539/47249A research report submitted in fulfillment of the requirements for the Doctor of Philosophy, in the Faculty of Health Sciences, School of Pathology, University of the Witwatersrand, Johannesburg, 2024The SARS-CoV-2 global pandemic has been fueled by several variants of concern (VOC) that have gained more efficient transmission or immune evasion properties over time. To better understand the diversity and evolutionary characteristics of SARS-CoV-2 lineages in South Africa we described the analysed the SARS-CoV-2 lineages and VOCs circulating during 2020 to 2022, as well the impact of the S protein and its potential to act as a candidate vaccine. The first objective of this study was to rapidly identify emerging VOCs based on key SARS- CoV-2 S protein mutations. The second objective was to describe the impact of intra-host immune adaptations on the evolution of SARS-CoV-2 S protein genes among individuals with SARS-CoV-2 infections. Thirdly, by timing the emergence SARS-CoV-2 dominant variants we aimed to unravel the significance and abundance of low-frequency lineages that emerged during five COVID-19 waves in South Africa. The final objective was to assess if accounting for diversity among SARS-CoV-2 S protein’s improved predicted epitope coverage of a derived immunogen. Single nucleotide polymorphism (SNP) PCR-based genotyping assays targeting specific mutations were used to detect VOCs that circulated in 2021. The allele frequencies (AF) as determined by SNP PCR analysis and variant calling from FASTQ reads using galaxy.eu were performed to describe intra-host SARS-CoV-2 S protein variants. Whole genome sequencing was performed to identify and analyse SARS-CoV-2 strains circulating in South Africa from 2020 to 2022 and detect low-frequency lineages. Mosaic vaccine suite tools were used to design an optimal S protein construct from sequences generated in this study. The construct was further tested for antigenicity, toxicity, N- and O-linked glycosylation sites and CTL predictions. A combination of P681R and L452R SNPs were detected in 73.6% (538/731) of the samples classified as Delta, while N501Y and del69/70 SNPs were detected in 3.6% (26/731) of samples classified as Alpha. The detection of the del69/70 and K417N coupled with SGTF is efficient to exclude Alpha and Beta variants and rapidly detect Omicron BA.1. SNP assays detected 5.3% of cases with Delta that displayed heterogeneity at delY144, E484Q, N501Y and P681H. However, heterogeneity was confirmed by sequencing only for the E484Q and Characterisation of SARS-CoV-2 Page 9 of 155 delY144 mutations. Variant calling from FASTQ reads identified intra-host diversity in the S protein among 9% of cases that were infected with Beta, Delta, Omicron BA.1, BA.2.15, and BA.4 lineages. Heterogeneity was primarily identified at positions 19 (1.4%) with T19IR 371 (92.3%) with S371FP, and 484 (1.9%) with E484AK, E484AQ and E484KQ. In 2020, 24 lineages were detected, with B.1 (3%; 8/278), B.1.1 (16%; 45/278), B.1.1.348 (3%; 8/278), B.1.1.52 (5%; 13/278), C.1 (13%; 37/278) and C.2 (2%; 6/278) circulating during the first wave. Beta dominating the second wave of infection in 2020. B.1 and B.1.1 continued to circulate at low frequencies in 2021 and B.1.1 re-emerged in 2022. Beta was outcompeted by Delta in 2021, which was thereafter outcompeted by Omicron sub-lineages during the 4th and 5th waves in 2022. Several significant mutations (del69-70, delY144, E484K, N501Y and D614G) identified in VOCs were also detected in low-frequency lineages. During the 5 waves of infection, B.1 and C.1/ C.2 lineages co-circulated with a dominant VOC. Following our findings of co-circulation of VOCs and other lineages and evidence of quasispecies we investigated if accounting for diversity of SARS-CoV-2 strains would render an improved S immunogen. The optimal mosaic S protein generated had predicted CTL epitope coverage of ~95% to 98% and was classified as an antigen based on a prediction score of 0.47. Reverse translation was used to generate the novel S gene for the expression construct SC2M2. The NTD and RBD regions were non-toxic, and the derived novel S protein comprised 10 additional N-linked glycosylation sites and 4 O-linked glycosylation sites when compared to the Wuhan Hu-1 strain. Our study findings have shown that (i) rapid detection of emerging VOCs was possible using SNP genotyping assays, and can be used by low to middle income countries to detect Alpha, Beta, Delta and Omicron BA.1; (ii) heterogeneity within the S protein encourages escape from neutralising antibodies and the evolution of SARS-CoV-2, which may contribute to the ongoing emergence of new variants associated with continued outbreaks globally; (iii) low frequency lineages that share mutations with VOCs could lead to convergence and recombination events that result in the next novel lineages or variants that may further increase transmissibility, infectivity and escape immunity; and lastly (iv) the novel S expression construct designed, based on previous and currently circulating VOCs and lineages, could potentially be used to develop improved SARS-CoV-2 vaccines.en© 2024 University of the Witwatersrand, Johannesburg. All rights reserved. The copyright in this work vests in the University of the Witwatersrand, Johannesburg. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of University of the Witwatersrand, Johannesburg.UCTDSARS-CoV-2spike proteinvariants of concernMolecular epidemiology and characteristics of immune adaptations across the SARS-CoV-2 spike glycoproteins from Gauteng, South Africa, 2020 to 2022ThesisUniversity of the Witwatersrand, JohannesburgSDG-3: Good health and well-being