i Investigating the utility of different methods to detect SMN DNA copy number and RNA expression in black South African patients with spinal muscular atrophy Kgomotso Odirile Peggy Tabane 0609571G Supervisor: Prof Amanda Krause Co-Supervisor: Mrs Elana Vorster Division of Human Genetics, National Health Laboratory Service and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, South Africa Johannesburg, 2021 ii DECLARATION I, Kgomotso Odirile Tabane, declare that this dissertation is my own work. It is being submitted for the degree of Masters of Science in Medicine in the University of the Witwatersrand, Johannesburg. It has not been submitted before for any degree or examination at this or any other University. (Signature of candidate) __29th__day of _October__2021__in_Johannesburg iii ACKNOWLEDGMENTS I would like to thank my supervisors Prof Amanda Krause and Mrs Elana Vorster for their continuous support through this study. They have been encouraging and patient throughout. They have guided and assisted me with the academic and administrative parts of my Master’s. I would also like to acknowledge my manager Ms Fahmida Essop who has been supportive and understanding during my studies. She has been kind and has allowed me to take study leave when I have needed. I would also like to thank the molecular diagnostic team; they have been encouraging and have cheered me on during the most challenging times. I would like to thank Alukhethi Singo and Marija Kvas from Whitehead Scientific for their technical assistance with qPCR and RT- qPCR. Thank you, Cassandra Soo, for assisting me with the QuantStudio instruments. Finally, I want to thank my family for their utmost support. Thank you to my mother and father Ruth and Walter Tabane for encouraging me to pursue this Master’s degree. Thank you to my brother Obakeng and sister Omolemo Tabane. Special thank you to my sister, Omolemo, who has been my support during this Master’s study. This dissertation is dedicated to my mother Ruth and father Walter Tabane, ke a leboga. Funders: National Health Laboratory Service Research Trust (NHLSRT) Faculty of Health Science Research Committee (FRC) iv PRESENTATIONS ARISING FROM THIS RESEARCH PROJECT Southern African Society for Human Genetics (SASHG), 03 Aug 2019 - 06 Aug 2019 Location: Century City Conference Centre, Cape Town, South Africa. Title: Determining RNA expression levels of the SMN protein in black South African patients clinically affected with spinal muscular atrophy who tested negative for the homozygous deletion of SMN1, exon 7. v ABSTRACT Spinal muscular atrophy (SMA) is a common neuromuscular disorder occurring as frequently as albinism in the black South African (SA) population. SMA is a neurodegenerative disease characterised by motor neuron loss in the spinal cord causing muscle weakness and atrophy. SMA is caused predominantly by mutations in the survival motor neuron 1 gene (SMN1). A homozygous deletion of SMN1, exon 7 is the main cause of SMA in ~95% of patients worldwide but only occurs in 51% of black SA patients. Mutations within SMN2, a gene copy, are not thought to cause SMA directly, but to modify disease severity. Black SA individuals have been shown to have copy number variations of the SMN1 and SMN2 genes which could potentially mask pathogenic mutations. The aim of the study was to determine the clinical utility of different DNA and RNA methods in testing clinically suggestive SMA patients without an SMN1 deletion. Three new methods, AmplideX® PCR/CE SMN1/2, qPCR and RT- qPCR were optimised and validated as part of this study. Ninety-two subjects were tested using these methods and results were compared to those obtained with Multiplex Ligation-Probe Dependent Amplification (MLPA). Comparative analysis of the three methods indicated that they were all adequate for diagnostic and carrier testing of SMA subjects. The RT-qPCR showed that DNA copy number does not necessarily correlate directly with RNA copy number. Interpreting expression of the SMNΔ7 transcript must be done in the context of the SMN1 and SMN2 DNA and RNA copy numbers. DNA copy number detection by qPCR was the most affordable method and AmplideX® PCR/CE SMN1/2 was the only method able to detect gene conversions, although the functional significance of these is uncertain. Diagnosis in subjects with clinical features suggestive of SMA could not be confirmed with RT- qPCR. We recommend further clinical assessment of these patients and testing using newer technologies. Further research into the molecular mechanism underlying SMA in patients with African ancestry is required. vi TABLE OF CONTENTS DECLARATION .....................................................................................................................ii ACKNOWLEDGMENTS ........................................................................................................ iii PRESENTATIONS ARISING FROM THIS RESEARCH PROJECT ...................................... iv ABSTRACT ........................................................................................................................... v TABLE OF CONTENTS ........................................................................................................ vi LIST OF FIGURES ............................................................................................................... ix LIST OF TABLES .................................................................................................................. x LIST OF ABBREVIATIONS AND SYMBOLS ........................................................................ xi 1. INTRODUCTION ........................................................................................................... 1 1.1 Clinical epidemiology .............................................................................................. 1 1.1. Clinical symptoms of SMA ...................................................................................... 2 1.2. The SMN genes ...................................................................................................... 3 1.2.1. SMN gene copies ............................................................................................ 4 1.2.2. Gene conversions and the evolution of the SMN region ................................... 5 1.3. SMN transcripts ...................................................................................................... 7 1.4. SMN Protein ........................................................................................................... 9 1.5. SMA disease modifiers ......................................................................................... 10 1.5.1. SMN2 copy number as a modifier of SMA disease. ....................................... 11 1.5.2. DNA methylation ............................................................................................ 12 1.5.3. Proteins influencing SMA expression. ............................................................ 13 1.6. Therapy for SMA - Nusinersen (Spinraza) treatment ............................................. 13 1.7. SMA research in Africa ......................................................................................... 14 1.8. Diagnostic testing at the Division of Human Genetics, Johannesburg ................... 16 1.9. Aim ....................................................................................................................... 18 1.10. Objectives ............................................................................................................. 18 2. SUBJECTS AND METHODS ....................................................................................... 19 2.1 Subjects and controls ............................................................................................ 19 2.2 Methods ................................................................................................................ 22 2.2.1 Genomic DNA Copy Number Detection ......................................................... 22 2.2.1.1 DNA Extraction ....................................................................................... 24 2.2.1.2 DNA yield, purity, integrity determination and routine diagnostic testing . 24 2.2.1.3 Homozygous SMN1/SMN2 deletion method using RFLP. ....................... 24 2.2.1.4 Multiplex ligation-dependent probe amplification (MLPA) ........................ 25 2.2.1.5 AmplideX® PCR/CE SMN1/2 copy number detection. ............................ 28 2.2.1.6 Real-time PCR (qPCR) ........................................................................... 30 2.2.2 SMN1/SMN2 RNA expression analysis.......................................................... 35 2.2.2.1 RNA extraction ........................................................................................ 36 vii 2.2.2.2 Reverse transcription of RNA to cDNA ................................................... 36 2.2.2.3 RT-qPCR experiment design .................................................................. 37 2.2.3 Comparative analysis of DNA and transcript levels of SMN1 and SMN2 in subjects and controls ................................................................................................... 38 2.2.4 Ethics ............................................................................................................. 40 2.2.5 Summary of methods ..................................................................................... 40 3 RESULTS .................................................................................................................... 41 3.1 DNA copy number detection: MLPA, AmplideX and qPCR ................................... 41 3.1.1 DNA copy number detection by MLPA ........................................................... 42 3.1.2 DNA copy number detection by AmplideX ..................................................... 46 3.1.3 Gene conversion and DNA copy number ....................................................... 48 3.1.4 DNA copy number detection by qPCR ........................................................... 49 3.1.5 Comparison of MLPA, AmplideX kit and qPCR methods in the detection of SMN1 DNA copy numbers ...................................................................................................... 53 3.1.6 Summary of comparison of SMN1 and SMN2 DNA copy number .................. 56 3.1.7 Costing of methods for determination of SMN DNA copy number .................. 61 3.2 Gene expression: RNA copy number detection ..................................................... 63 3.2.1 RNA expression results of M1/M1 subjects .................................................... 64 3.2.2 RNA expression results of M2/M2 subjects .................................................... 66 3.2.3 RNA expression results of N/M1 subjects: DNA SMN1 copy number versus FL- SMN1 expression ......................................................................................................... 67 3.2.4 RNA expression results of N/N subjects ......................................................... 69 3.2.5 RNA expression results of N/N subjects: DNA copy number of SMN2 versus FL- SMN2 transcript ........................................................................................................... 71 3.3 Summary of results ........................................................................................ 79 4 DISCUSSION .............................................................................................................. 81 4.1 DNA copy number methods – Comparison and their clinical utility ........................ 81 4.1.1 Comparison of MLPA DNA copy numbers ..................................................... 82 4.1.2 SMN1 gene detection by MLPA vs AmplideX vs qPCR .................................. 84 4.1.3 Possible reasons for SMN1 copy number discrepancies. ............................... 85 4.1.4 SMN1 gene copy number in U/U subjects ...................................................... 86 4.1.5 Comparative analysis of N/N subjects - SMN1 copy number in black population compared to other populations. .................................................................................... 87 4.1.6 SMN2 gene detection by MLPA vs AmplideX vs qPCR .................................. 89 4.1.7 Comparative analysis of N/N subjects - SMN2 copy number in black population compared ..................................................................................................................... 91 4.1.8 Gene conversion and DNA copy number ....................................................... 92 4.2. RNA expression .................................................................................................... 94 4.2.1 Comparison of SMN1 DNA copy number versus FL-SMN1 transcript ............ 95 4.2.2 DNA and RNA analysis of M1/M1 subjects .................................................... 95 4.2.3 DNA and RNA analysis of M2/M2 subjects .................................................... 96 viii 4.2.4 DNA and RNA analysis of N/M1 subjects ....................................................... 96 4.2.5 DNA and RNA analysis of N/N subjects ......................................................... 97 4.2.6 DNA and RNA analysis of U/U subjects ......................................................... 99 4.2.7 Comparison of SMN1/SMN2 DNA and RNA results ....................................... 99 4.3 Clinical utility of SMN1/SMN1 DNA and RNA testing .......................................... 102 4.4 Future studies ..................................................................................................... 103 4.4.1 SMA modifiers of disease ............................................................................ 103 4.4.2 Beyond the copy number ............................................................................. 104 4.4.3 Long-read SMN gene sequencing ................................................................ 105 4.5 Advantages and limitations of the study .............................................................. 105 4.5.1 Advantages and limitations of MLPA ............................................................ 105 4.5.2 Advantages and limitations of AmplideX ...................................................... 106 4.5.3 Advantages and limitations of qPCR ............................................................ 106 4.5.4 Advantages and limitations of RT-qPCR ...................................................... 106 5 CONCLUSION ........................................................................................................... 108 REFERENCES ................................................................................................................. 113 APPENDICES ................................................................................................................... 119 ix LIST OF FIGURES Figure 1.1. Genetic map of the spinal muscular atrophy locus .............................................. 4 Figure 1.2. The SMN genes. ................................................................................................ 4 Figure 1.3. Model of alleles present in the normal population and in patients affected with SMA.. .................................................................................................................................... 6 Figure 1.4. Six possible ways of gene conversion between SMN1 and SMN2 in exons 7 and 8............................................................................................................................................ 7 Figure 1.5. Difference between SMN1 and SMN2 transcription ............................................. 8 Figure 1.6. Modifiers of SMA disease .................................................................................. 11 Figure 1.7. Epigenetic changes can modify expression. ...................................................... 12 Figure 1.8. Pedigree of a family affected with SMA. ............................................................ 17 Figure 2.1. Strategy for testing of different methods ............................................................ 21 Figure 2.2. Flow diagram showing DNA copy number detection. ........................................ 23 Figure 2.3. Homozygous deletion SMN1/SMN2 method. .................................................... 25 Figure 2.4. Steps in the MLPA test process. ....................................................................... 26 Figure 2.5. Results of MLPA analysis using the SALSA MLPA probe-mix P021. ................. 27 Figure 2.6. Peak profile of a sample with 3 copies of SMN1, exon 7 and 1 copy of SMN2 .. 29 Figure 2.7. Example of SMN1/2 AmplideX Excel report of .................................................. 30 Figure 2.8. Quantitative PCR by use of TaqMan™ probes .................................................. 31 Figure 2.9. Annotation of SMN primers ............................................................................... 32 Figure 2.10. Annotation of CFTR primers ............................................................................ 33 Figure 2.11. An example of a qPCR amplification plot visualised on the QuantStudio 3 ...... 34 Figure 2.12. Example of a gene copy number plot visualised on the QuantStudio 3 ........... 34 Figure 2.13. Summary of RNA extraction protocol using Tempus™ Blood RNA Tube ........ 36 Figure 2.14. Summary of reverse transcription using the ImProm-II™ Reverse Transcription System (Promega, Madison, Wisconsin, USA). ................................................................... 37 Figure 2.15. Primer and probe sequences of FL-SMN1 and FL-SMN2................................ 38 Figure 2.16. Predicted comparative analysis of DNA SMN1 and SMN2 copy number versus RNA transcription. ............................................................................................................... 39 Figure 3.1. MLPA results obtained for the SMN region. ...................................................... 45 Figure 3.2. N/N, M1/M1 and M2/M2 AmplideX results. ........................................................ 46 Figure 3.3. AmplideX results in subjects with varying gene copy numbers. ......................... 48 Figure 3.4. Amplification plot of qPCR. ................................................................................ 50 Figure 3.5. Copy number ratio of SMN1 and SMN2 in 20 subjects. ..................................... 51 Figure 3.6. Comparison of SMN1 copy number results across three methods. ................... 54 Figure 3.7. Comparison of SMN2 copy number results across three methods. ................... 55 Figure 3.8. SMN2 copy number compared to genotype. ..................................................... 55 Figure 3.9. Comparison of DNA copy number between MLPA, AmplideX and qPCR. ......... 56 Figure 3.10. Comparison of MLPA, AmplideX and qPCR results in a single subject. .......... 60 Figure 3.11. DNA copy number versus RNA expression in an M1/M1 subject: SMA1988. .. 64 Figure 3.12. An M1/M1 subject with zero copies of SMN1 on DNA copy number detection and zero copies of FL-SMN1 transcript.. .................................................................................... 65 Figure 3.13. DNA copy number versus RNA expression in an M1/M1 subject: SMAR7. ..... 66 Figure 3.14. DNA copy number versus RNA expression in an M2/M2 subject: SMAR2. ..... 67 Figure 3.15. SMN1 DNA copy number versus RNA expression in N/M1 subjects ............... 68 Figure 3.16. SMN2 DNA copy number versus RNA expression in N/M1 subjects ............... 69 Figure 3.17. DNA copy number versus gene expression results of N/N subjects. ............... 70 Figure 3.18. SMN2 DNA copy number versus RNA expression results of N/N subjects. ..... 72 Figure 3.19. DNA copy number versus gene expression results of N/N subjects with matching copies of SMN1 and SMN2. ................................................................................................ 73 Figure 3.20. DNA copy number versus gene expression results of N/N subjects with variable copies of SMN1 and SMN2. ................................................................................................ 75 x Figure 3.21. DNA copy number versus RNA expression results of N/N subjects with three of more copies of SMN1.. ........................................................................................................ 76 Figure 3.22. DNA copy number versus RNA expression results of U/U subjects. ................ 77 Figure 4.1. U/U Subject with SMN copy number discrepancy.. ............................................ 85 Figure 4.2. Diagnostic algorithm of SMA (Mercuri et al., 2018). ........................................... 87 Figure 4.3. Comparative analysis of genomic SMN1 and SMN2 copy numbers versus RNA transcription. .................................................................................................................... 101 LIST OF TABLES Table 1.1. Clinical sub-types of spinal muscular atrophy (SMA). Adapted from Butchbach, 2016 ...................................................................................................................................... 3 Table 2.1. Groups of subjects and controls tested, together with numbers of individuals in each group. ................................................................................................................................. 20 Table 2.2. The relationship between DQ values and SMN1 copy number on MLPA analysis (“MLPA General Protocol MDP-v007.pdf,” n.d.) .................................................................. 28 Table 2.3. Expected peak sizes on Applied Biosystems® Genetic Analysers (3500 series) AmplideX PCR/CE SMN1/SMN2 kit protocol guide. ............................................................ 29 Table 2.4. Default SMN1 and SMN2 DNA copy number bins vs normalised ratio. .............. 30 Table 2.5. Default SMN1 Hybrid and SMN2 Hybrid DNA copy number bins vs normalised ratio ........................................................................................................................................... 30 Table 2.6. Primer and probe sequences of the SMN1 and SMN2 qPCR method ................ 32 Table 2.7. Description of RT-qPCR transcripts .................................................................... 35 Table 2.8. Primer and probe sequences of the SMN expression RT-qPCR relative quantification method designed by Vorster et al (2020) ....................................................... 37 Table 3.1. Ethnicity and genotype of subjects tested for DNA copy number. ....................... 42 Table 3.2. Comparison of SMN1, exon 7 and exon 8 copy number in M1/M1 subjects ....... 43 Table 3.3. Percentage of hybrid SMN2-to-SMN1 copies detected using AmplideX. ............ 49 Table 3.4. Comparison of SMN1 copy number in different subject groups using different methods. ............................................................................................................................. 53 Table 3.5. Summary of DNA copy number methods ........................................................... 59 Table 3.6. Costing results for MLPA, AmplideX and qPCR ................................................. 61 Table 4.1. Comparison of SMN1 copy numbers on MLPA/AmplideX/qPCR ........................ 85 Table 4.2. Frequency of N/N SMN1 Copy Number in Different Geographical Regions ........ 88 Table 4.3. Comparison of SMN2 copy numbers between MLPA/AmplideX/qPCR ............... 90 Table 4.4. Frequency of N/N SMN2 Copy Number in Different Geographical Regions ........ 92 xi LIST OF ABBREVIATIONS AND SYMBOLS A Adenine ABI Applied biosystems bp Base pairs C Cytosine cm Centimetre CNS Central nervous system CNV Copy number variation Ct Cycle threshold Ctrl Control ddH2O Deionised, distilled water ddNTPs Deoxyribonucleic triphosphates DNA Deoxyribonucleic acid dNTPs Deoxyribonucleotide triphosphates DQ Dosage quotient EC Endogenous control EDTA Ethylene-diamine-tetra-acetate EtBr Ethidium bromide F Forward primer FL-SMN Full length survival motor neuron g Grams G Guanine kb Kilobases kDa Kilo Dalton L Litre Mg2+ Magnesium ion MgCl2 Magnesium chloride min Minutes ml Millilitre MLPA Multiplex ligation-dependent probe amplification mm Millimetre mM Millimolar mRNA Messenger RNA NaCl Sodium Chloride ng Nanograms NGS Next generation sequencing NHLS National Health Laboratory Service nm Nanometre OMIM Online Mendelian Inheritance in Man PCR Polymerase chain reaction PLS3 Plastin 3 pmol Picomoles qPCR Quantitative/real-time polymerase chain reaction RT-qPCR Reverse transcription quantitative PCR RFLP Restriction fragment length polymorphism xii RNA Ribonucleic acid rpm Revolutions per minute RQ Relative quantification SDS Sodium Dodecyl Sulphate SMA Spinal muscular atrophy SMN Survival of Motor Neuron protein SMN1 Survival Motor Neuron 1 gene SMN2 Survival Motor Neuron 2 gene s Seconds T Thymine TBE Tris Borate EDTA TE Tris-EDTA Tris Tris (hydroxymethyl) methylamine U Units µl Micro litre V Volts UV Ultraviolet WES Whole exome sequencing WGS Whole genome sequencing 1 1. INTRODUCTION Spinal muscular atrophy is a common neuromuscular disorder occurring as frequently as albinism in the black South African (SA) population. Spinal muscular atrophy (SMA) is a neurodegenerative disease characterised by motor neuron loss in the spinal cord resulting in muscle weakness and atrophy. Five clinical sub-types of SMA have been identified according to the disease manifestations and age of onset (Butchbach, 2016). Mutations in the survival motor neuron 1 gene (SMN1) are the cause of SMA (Lefebvre et al., 1995). A homozygous deletion of SMN1, exon 7, is the main cause of SMA in ~95% of patients worldwide, but it only occurs in 51% of black SA patients (Labrum et al., 2007). Mutations within SMN2, a gene copy almost identical in sequence to SMN1, are not thought to be associated with SMA directly, but modify disease severity (Butchbach, 2016). The high homology of the SMN1 and SMN2 genes complicates analysis. Previous studies performed at the Division of Human Genetics, National Health Laboratory Services (NHLS) and the University of Witwatersrand (henceforth referred to as “the Division”), have shown that subjects of African ancestry may have a hypervariable SMN region. Altogether 50.8% of these subjects had multiple copies of SMN1 compared to 3.5% across various Caucasian populations (Labrum et al., 2007; Stevens et al., 1999). This has made diagnostic and carrier testing in African populations challenging, as large copy number variations (CNVs) of the SMN region could interfere with diagnostic tests and may play a role in the SMA disease mechanism. It is possible that some of the multiple SMN1 gene copies may not be completely functional (Vorster et al., 2020). 1.1 Clinical epidemiology Spinal muscular atrophy is an autosomal recessive disorder, caused by mutation of the SMN1 gene in most cases (Lefebvre et al., 1997). A homozygous deletion of exon 7, in SMN1 causes approximately 95% of SMA cases (Melki et al., 1994). The incidence of SMA worldwide is 1 in 3900–16,000 live births (Belter et al., 2018; Verhaart et al., 2017). The carrier frequency of SMA is 1 in 25-50 in most populations (Ben-Shachar et al., 2011) . Labrum et al (2007) estimated the incidence of SMA in 2 SA to be 1 in 3574 in the black population and 1 in 1945 in the white population. The carrier frequency in SA was estimated to be 1 in 50 in the black SA population and 1 in 23 in the white SA population (Labrum et al., 2007). The incidence of SMA in SA appears to be higher than in other countries, which could be due to the unique ethnic background of SA populations. The incidence of SMA is slightly higher than that of albinism (1 in 3900) in the SA black population (Kromberg and Jenkins, 1982) and almost as high as that of cystic fibrosis (1 in 2500) in the SA white population (Goldman et al., 2001; Kromberg and Jenkins, 1982). 1.1. Clinical symptoms of SMA SMA is an early-onset neurodegenerative disease characterised by α-motor neuron loss in the anterior horn of the spinal cord causing muscle weakness and atrophy. The weakness is located more proximal than distal in most SMA subtypes (Farrar and Kiernan, 2015) . Individuals suspected of having spinal muscular atrophy present with a history of motor difficulties, especially a loss of motor skills, proximal muscle weakness, hypotonia, areflexia/hyporeflexia, tongue fasciculations and/or evidence of motor neuron disease on physical examination (Prior and Finanger, 1993). SMA is a heterogeneous disease which ranges from severe to mild adult-onset phenotypes. It is classified into five clinical sub-types ranging in severity and age of onset; however, the clinical sub-types are more continuous than distinct (Farrar and Kiernan, 2015). Table 1.1 illustrates these subtypes of SMA; type 0 SMA has a prenatal onset with a survival rate of less than six months and is the most severe form. These patients present with very severe hypotonia and early respiratory failure and they are unable to reach any developmental milestones (Butchbach, 2016). Patients with type I SMA, also known as Werdnig-Hoffman disease, have an age of onset before six months, are unable to sit or walk and live for approximately two years. The patients usually present with proximal muscle weakness, hypotonia, and mild contractures; affected infants have problems sucking or swallowing leading to growth failure and recurrent aspiration (Prior and Finanger, 1993). Life expectancy has improved due to respiratory and nutritive care (Corsello et al., 2021). Patients with type II SMA or Dubowitz disease, are usually diagnosed by 6-18 months and survive into adulthood; however, they have trouble walking and sitting. Type II SMA tends to manifest as progressive proximal leg weakness that is greater than 3 weakness in the arms (Kolb and Kissel, 2015). Patients may experience finger trembling, general flaccidity, and scoliosis. A study done by Zerres et al in Germany and Poland showed that the SMA II patients were alive at age 25 years (Zerres et al., 1997). Patients with type III SMA also known as Kugelberg-Welander disease have an age of onset of 18 months or older. These patients walk with difficulty but usually have a normal life expectancy (Kolb and Kissel, 2015). Patients with SMA type IV have an adult age of onset, have the mildest form of the disease and usually have a normal life expectancy (Kolb and Kissel, 2015). Type IV patients are sometimes misdiagnosed with amyotrophic lateral sclerosis (ALS) due to the late onset of disease and similar clinical features (Kolb and Kissel, 2015). Table 1.1. Clinical sub-types of spinal muscular atrophy (SMA). Adapted from Butchbach, 2016 Type Age of onset Requires Respiratory support at birth Able to sit Able to stand Able to walk Life expectancy Predicted SMN2 copy number 0 Prenatal Yes No No No <6 months 1 I <6 months No No No No <2 years 2 II 6–18months No Yes No No 10–40years 3 III >18months No Yes Yes Assisted Adult 3–4 IV >5 years No Yes Yes Yes Adult >4 1.2. The SMN genes The SMN1 and SMN2 genes lie within a 500 kilobase (kb) inverted duplication of chromosome 5q13. There are five protein-coding genes (SERF1, SMN1, SMN2, NAIP and GTF2H2) within the chromosome 5q13 region as illustrated in Figure 1.1. The SMN1 gene was first identified and characterised by Lefebvre et al in 1995. Through genetic mapping Lefebvre et al (1995) demonstrated that patients presenting with SMA had large scale deletions on chromosome 5q13. The researchers further showed that deletion of the SMN1 gene was responsible for SMA. The SMN1 gene was found in the telomeric region of the chromosome 5q13 repeat. SMN1 and SMN2 share 99% nucleotide identity and the critical difference between the two genes is a C>T transition in exon 7 (SMN2 c.850C>T) that affects the splicing of the genes (Monani et al., 1999a). 4 Figure 1.1. Genetic map of the spinal muscular atrophy locus (Lunn and Wang, 2008). SMN genes are shown in blue with surrounding genes in black. The genes are in an inverted duplication of chromosome 5q13. The red and blue arrows indicate the direction of transcription of the genes. 1.2.1. SMN gene copies The telomeric copy, SMN1, produces the full length SMN protein (FL-SMN) and is the main gene involved in SMA pathogenicity (Zhang et al., 2003). The SMN genes are approximately 34kb in length, comprised of 10 exons and nine introns (Lefebvre et al., 1997; Wirth, 2000). More recent research has demonstrated that exon 2 is comprised of two separate exons. To avoid confusion it has now been renamed 2a and 2b (Bürglen et al., 1996). Exon 6b was recently discovered by Seo et al (2016). SMN1 produces approximately 80% and SMN2 about 20% of FL-SMN protein. The centromeric copy; SMN2, is only distinguishable from SMN1 by five nucleotides with a critical difference in exon 7 (c.840C>T) that disrupts an exonic splice enhancer (Monani et al., 1999b, 1999a). Figure 1.2 illustrates the difference between the SMN1 and SMN2 genes. Figure 1.2. The SMN genes. SMN1 and SMN2 are 99% similar with the critical difference in exon 7 c.840C>T (indicated by the arrow) that results in differential splicing (Wirth, 2000). The SMN1 gene can be distinguished from the SMN2 gene by one nucleotide difference in intron 6, one in exon 7, two in intron 7 and one in exon 8. These differences are used to distinguish between SMN1 and SMN2 in the design of diagnostic methods (Wirth, 2000). 5 1.2.2. Gene conversions and the evolution of the SMN region The SMN region is a complex region prone to gene rearrangement, therefore the sequence of genes in this region is poorly understood. Gene conversion may arise during meiosis, where non-Mendelian segregation of alleles occurs resulting in two copies of the same DNA sequence transferred to one gamete and “zero” DNA sequence to the other gamete. A variety of copies of the DNA sequence will occur therefore resulting in copy number variations (CNV). CNVs vary between individuals and may be disease causing, benign or may modify disease. The frequency of SMN1 copy number variants differs between ethnic groups. Hendrickson et al (2009) showed that approximately 90% of Caucasians have two copies of SMN1 compared with approximately 50% of African Americans. Approximately 46% of African Americans had three SMN1 copies compared to 6% of Caucasians. A similar study by Sangare et al (2014), on sub-Saharan African populations found high copy numbers of SMN1 and low copy number of SMN2. The high copy numbers may be due to a conversion of SMN2 to SMN1 (Hendrickson et al., 2009). The high homology between SMN1 and SMN2 and its adjacent genes suggests a gene duplication event occurred. The evolutionary history of the SMN genes was studied by Rochette et al (2001) to determine when the duplication of the SMN region occurred. They studied the variation of the region in different human controls compared to other non-human primates and mice. They showed that mice and other primates have only one copy of the SMN gene and that the duplication of the SMN region is a recent event that emerged after separation of human and other primate lineages. They further illustrated that other primates had SMN1 gene sequences like those occurring in human SMN1. However, the SMN2 gene was not found in the primates they studied and SMN2 was found to be unique to humans. They concluded that the SMN duplication occurred at least five million years ago prior to human-chimpanzee divergence but the SMN2 gene emerged after the separation of the two lineages. They further suggest that gene conversions events between SMN1 and SMN2 are an ongoing process (Rochette et al., 2001). Figure 1.3 illustrates the different conversions of SMN1 to SMN2 and how different alleles affect the SMA phenotype. There is evidence of gene conversion of SMN1 to 6 SMN2, and as well as SMN2 to SMN1. Gene conversion of SMN1 to SMN2 has been associated with less severe SMA phenotypes (Burghes, 1997). Individuals with greater than three copies of SMN1, were more likely to have fewer copies of SMN2 further supporting the hypothesis of gene conversions of SMN2 to SMN1 (Ogino et al., 2003). Figure 1.3. Model of alleles present in the normal population and in patients affected with SMA. The severity of SMA is predicted to be associated with the SMN1, exon 7 deletion, and the conversion of SMN1 to SMN2. Alleles with gene conversions are predicted to cause milder SMA. The C allele denotes the normal (wildtype) and the T denotes the mutant allele. Each box represents the number of copies. Six possible points of gene conversions between SMN1 and SMN2 have been proposed and were tested by Wang et al (2010) and illustrated in Figure 1.4. They designed an alternative method to multiplex ligation probe-dependent amplification (MLPA) that was able to detect gene conversions between SMN1 and SMN2, specifically in exons 7 and 8. Identification of gene conversions was helpful for SMA genotyping and diagnosis as gene conversions have been shown to cause a milder phenotype (Ogino et al., 2004). 7 Figure 1.4. Six possible ways of gene conversion between SMN1 and SMN2 in exons 7 and 8. The SMN1 gene, shown in dark blue can convert to SMN2 (white box) and vice versa. A-C indicate gene conversions of SMN1 to SMN2 and D-F indicate gene conversions of SMN2 to SMN1. Gene conversions of SMN genes are complex and only six possible ways are know (Wang et al., 2010). 1.3. SMN transcripts Transcription in eukaryotic organisms, including humans, is the process of copying genetic information from DNA into the complementary RNA strand. The RNA polymerase enzyme, together with transcription factors, binds to the promoter region of the DNA strand, to create a complementary precursor messenger RNA (pre-mRNA) strand. The pre-mRNA is then spliced to form a mRNA. Splicing is the removal of introns from the pre-mRNA. This is carried out by splicing factors, enhancers, and silencers. A variety of transcripts can be generated from a single pre-mRNA due to alternative splicing (Singh et al., 2012). The SMN genes are transcribed into various mRNA transcripts due to variants in the SMN gene or regulatory elements. Only 20% of full-length (FL) transcripts are transcribed from SMN2; the majority of SMN2 transcripts are truncated as depicted in Figure 1.5 (He et al., 2013; Zhang et al., 2003). The C to T change in exon 7 in SMN2 reduces the recognition of exon 7 during splicing, resulting in the exclusion of exon 7 in SMN2 transcripts as illustrated in Figure 1.5. The altered transcripts lacking exon 7 (SMN∆7) are highly unstable and degrade rapidly (Moulard et al., 1998). 8 Figure 1.5. Difference between SMN1 and SMN2 transcription (Butchbach, 2016). SMN1 is transcribed into mRNA including exon 7 and is translated into a functional SMN protein. Most of the SMN2 transcripts result in truncated mRNA strands lacking exon 7 (SMNΔ7); these are translated into an unstable SMN protein which is easily degraded. There are four possible transcripts that can be generated from the SMN genes: FL- SMN, SMN∆7, axonal SMN (a-SMN) and SMN6B. The FL-SMN transcript (produced by both SMN1 and SMN2) is the main transcript required by most cells. The SMN∆7 transcript does not contain exon 7 and is produced from alternative splicing of the SMN2 gene or by a mutant SMN1 gene in patients affected with SMA. The a-SMN and SMN6B are alternative transcripts produced from the SMN1 and SMN2 genes, respectively (Seo et al., 2016). The FL-SMN transcript is made up of 80% SMN1 transcripts and approximately 20% of SMN2 transcripts are transcribed with the included exon 7. The FL-SMN transcript is the most critical transcript as it produces the functional SMN protein. The SMN∆7 transcript lacking exon 7 is the predominant transcript derived from SMN2 in all tissues except testis. SMN∆7 is translated into an unstable and easily degraded SMN protein. The a-SMN plays a role in mammalian brain development; the a-SMN transcript is preferentially transcribed from the SMN1 gene (Setola et al., 2007). They identified a- SMN in the human spinal cord, specifically in motor neuron axons in rats, mice, and humans. The protein was observed to be expressed early in embryonic development and was downregulated in adult cells. The a-SMN protein promotes axon growth, stimulates cell mobility, regulates expression of chemokines (CCL2 and CCL7) and insulin-like growth factor-1 (IGF1) (Locatelli et al., 2012). The a-SMN transcript was 9 not included in this study as it is present in early embryonic development and its association with SMA is not well understood. The recently identified SMN6B transcript was detected using a mouse model which harbours the SMN2 human gene and a hybrid SMN1/SMN2 at the same locus (Seo et al., 2016). Seo et al (2016) sequenced the full transcript and identified a novel SMN transcript with a longer exon 6, the exon contained a portion of intron 6 and was therefore termed exon 6B with the transcript named SMN6B. They went on to determine possible expression of SMN6B in human tissues and identified SMN6B by reverse transcribed qPCR (RT-qPCR) in all human tissues. This transcript was expressed highest in brain and lowest in skeletal muscles, but its role is poorly understood and therefore it was not included as part of the present study (Seo et al., 2016). Other transcripts of SMN have been identified, indicating the diverse splicing involved in pre-mRNA of the SMN gene and possibly other mechanisms of disease pathogenesis. Singh et al (2012) used a multiple exon detection method (MESDA) to identify isoforms of SMN. They identified new splice sites and demonstrated that different transcripts were formed from regulation in the promoter region and splice sites. 1.4. SMN Protein The SMN protein has numerous roles in all cell types and is required for survival of all mammals. The protein is formed from translation of SMN mRNA into protein; the different transcripts of SMN produce different isoforms of protein with the FL-SMN being the most stable and therefore most significant. The survival motor neuron protein is a molecular chaperone and is essential for the assembly of ribonucleoprotein (RNP) complexes. The RNPs are formed when mRNA and non-coding RNAs (nc-RNA) associate with RNA binding proteins to form complexes that are involved in RNA processing and post-transcriptional regulation (Cooper et al., 2009). The SMN protein is involved in numerous biological functions such as gene regulation and expression. The protein accumulates in nuclear bodies named gems and cajal bodies (Pellizzoni, 2007). The SMN protein then forms an SMN complex by interacting with Gemin 2-8 proteins and an unrip protein (Pellizzoni, 2007). The SMN complex functions in the assembly of small nuclear ribonucleoproteins (snRNPs) which are 10 essential in expression of all protein-coding genes (Pillai et al., 2003). Another function of the complex is in the signal recognition particle (SRP) biogenesis which targets polypeptides into the endoplasmic reticulum (Piazzon et al., 2013). The SMN protein is involved in telomerase biogenesis, selenoprotein translation, 3′ end processing of histone mRNAs, pre-mRNA splicing, interaction with transcription factors and enhancers, RNA trafficking, translation regulation of protein arginine methyl transferase 4 (PRMT4), selenoprotein synthesis and Stress granule (SG) formation (Seo et al., 2016; Singh and Singh, 2018). The numerous biological functions of the complex shows the importance of the SMN protein and therefore a mutation in the SMN gene or any other proteins involved in its regulation may be involved in SMA pathogenesis. The SMN1 gene, exon 7 critical in SMA disease will be referred to as SMN1 going forward. Further the SMN2 gene, exon 7 will be referred to as SMN2 for simplicity. When referring to another specific exon other than 7 in the SMN genes that exon will be clearly stated. 1.5. SMA disease modifiers Disease modifiers are genetic or environmental factors that either cause a milder or more severe form of a disease (Genin et al., 2008). There are several modifiers of SMA, SMN2 copy numbers, regulatory proteins and DNA methylation which all have the potential to be used for treatment of SMA. Possible modifiers of SMA disease are indicated in Figure 1.6, not all modifiers were included in this study, and further research may indicate other modifiers of SMA. 11 Figure 1.6. Modifiers of SMA disease 1.5.1. SMN2 copy number as a modifier of SMA disease. The number of SMN2 copies is the main modifier of SMA disease - more copies result in a milder form of SMA. Asymptomatic individuals with a homozygous deletion of SMN1 have been observed with increased copies of SMN2 (Jedrzejowska et al., 2008). Patients with more copies of the SMN2 gene are more likely to have milder symptoms and a later onset of disease (Maretina et al., 2018). However, there have been patients with more than two copies of SMN2 with severe SMA or patients with only two copies having a milder phenotype - this highlights the possibility of other disease modifiers (Maretina et al., 2018). The number of SMN2 gene copies in the genome varies from 0 to 8, with an inverse relationship between SMN2 copy number and severity of SMA disease (Butchbach, 2016). The number of SMN2 gene copies is significantly increased in black and mixed ancestry populations (12.4% and 18.8% respectively) (Vorster et al., 2020). 12 1.5.2. DNA methylation Epigenetics refers to heritable changes in chromosomes that modify gene expression without altering the DNA sequence (Berger et al., 2009). Epigenetic mechanisms include DNA methylation, imprinting and X chromosome inactivation (Berger et al., 2009). DNA methylation is the addition of a methyl group preferentially to cytosine residues within CpG dinucleotides (cytosine followed by guanine) which are concentrated in CpG islands, (see Figure 1.7). Methylation of gene promoters leads to transcription inactivation and aberrant methylation patterns are associated with various diseases including neurodevelopmental and neurogenerative disorders (Maretina et al., 2018). Hauke et al (2009) showed that SMN2 gene expression levels are reduced by DNA methylation. There are four CpG islands surrounding the translational site of SMN2 and the methylation patterns in patients affected with SMA were studied. Significant methylation was demonstrated in SMA type II compared to SMA type III patients (Hauke et al., 2009). Further analysis by whole genome methylation analysis of SMA patients revealed different methylation patterns between SMA types involving other regulatory genes SLC23A2, NCOR2 and DYNC1H1 genes (Maretina et al., 2019, 2018). DNA methylation was shown to be a modifier of SMA disease and may be important in designing new therapies. Figure 1.7. Epigenetic changes can modify expression. A methyl group is added to CpG islands and gene expression is repressed (Yousefi et al., 2013). 13 1.5.3. Proteins influencing SMA expression. Proteins that modify the actin-binding and cytoskeleton regulation have been studied to determine their role in SMA disease. The Plastin 3 (PLS3) expression level was increased in unaffected individuals compared to affected patients (Maretina et al., 2018). Other regulatory proteins have been implicated in SMA pathogenesis, these include phosphatase and tensin homolog (PTEN), Zinc finger protein (ZPR1) and Prolactin (Maretina et al., 2018). The SMN protein is degraded by the ubiquitin/proteome pathway. Treatment of SMA mice models with the proteasome inhibitor, bortezomib, has shown reduced degradation of SMN protein and improved motor function (Kwon et al., 2011). 1.6. Therapy for SMA - Nusinersen (Spinraza) treatment Once a diagnosis of SMA is confirmed the family of the patient is referred for genetic counselling. Appropriate specialised clinicians can refer patients to clinical trials or treatments available. Most treatments of SMA are supportive but do not cure SMA. Nusinersen was the first drug therapy approved for the treatment of SMA in December 2016 by the US Food and Drug Administration (FDA), for paediatric and adult patients with SMA. Nusinersen is an antisense oligonucleotide that modifies SMN2 pre-mRNA to include exon 7 during splicing therefore increasing FL-SMN production (Hua et al., 2010). A clinical study of infants diagnosed with SMA was carried out by Finkel et al (2017). The patients treated with Nusinersen displayed motor milestones responses, survived longer and milestones appeared to improve with time, when compared to patients without Nusinersen treatment (Finkel et al., 2017). Furthermore, patients with later onset SMA treated with Nusinersen also displayed improved milestones (Finkel et al., 2017; Zuluaga-Sanchez et al., 2019). Treatment outcomes varied in different countries, which was likely due to dissimilar supportive care for affected patients (Zuluaga-Sanchez et al., 2019). No adverse effects were observed in most clinical trials (Finkel et al., 2017). Nusinersen is administered via lumbar puncture starting with an initial dose, followed by four doses every four months thereafter. This treatment costs R7,562,402.70 for the initial dose, followed by R3,782,006.72 annually (Zuluaga-Sanchez et al., 2019). Treatment for SMA is therefore extremely expensive and is not affordable for the 14 majority of South Africans. South African patients with a confirmed diagnosis of SMA may be included in a clinical trial with a recommendation from a clinician. All patients included in the clinical trials had a confirmed diagnosis of SMA and at least one copy of the SMN2 gene. At present molecular diagnosis of SMA is not achieved in approximately 49% of black South African patients presenting with clinical symptoms of SMA. These patients would not be eligible for Nusinersen treatment without a confirmed diagnosis of SMA due to a deletion or mutations in the SMN1 gene. 1.7. SMA research in Africa Research in Africa on SMA has been sparse and most studies reported have been done in South Africa. The lack of research undertaken may be due to the high likelihood of early death of patients with SMA in infancy since patients may not have received care particularly in poorly resourced countries. Initial studies focused on clinical features and muscle biopsies of patients suggestive of SMA (Kiepiela et al., 1988; Moosa and Dawood, 2008). Kiepiela et al (1988) found no significant difference in T cells and B cells from patients clinically diagnosed with SMA, and their study indicated that the SMN protein does not play a role in immune regulation. In 1990, research by Moosa and Dawood on 45 patients showed similar clinical characteristics in African patients compared to those of European ancestry, with an additional feature of facial weakness in African children (Moosa and Dawood, 1990). They also noted a lack of positive family history of SMA in African families, possibly because of a reluctance of affected families to provide a family history, due to cultural reasons or rarity of the disease. Stevens et al (1999) aimed to identify the molecular basis of SMA in black patients. Their research was based in Johannesburg and included black patients ascertained through the Division. The SMN genes and the NAIP gene were analysed in samples from patients confirmed to have SMA on muscle biopsies. They found a homozygous deletion of SMN1 in 65.5% of their patients, which is far less than found in other studies. They were the first group to suggest a possibility of a different mechanism of SMA disease in black patients. 15 Wilmshurst et al (2002) did further research in Cape Town, South Africa, with strict inclusion criteria for the patients with SMA. They found a homozygous deletion of SMN1 in 100% of their black patients. Wilmshurst et al (2002) excluded patients with facial weakness and had a smaller sample size of 12 compared to 29 samples in Johannesburg (Stevens et al, 1999). The difference in the two studies could be due to the sample size, the inclusion criteria, or the different geographical location. Another study was carried out by Labrum et al (2007), to further understand the molecular basis of SMA in black SA patients. They found the carrier frequency to be higher than previously thought, at 1 in 50 and 1 in 23 in the black and white populations, respectively. Only 51% of black patients clinically affected with SMA had a homozygous deletion of SMN1 compared with 95% of the patients in the white population. The diagnostic yield of Labrum et al (2007) was less than Stevens et al (1999) most likely due to sample size which was 116 in the former study and 29 in the latter study. Both studies further support the hypothesis that there may be a different molecular basis for black SA patients with an SMA phenotype. A study performed in Mali to determine the carrier frequency of SMA in Africans was done on 628 Malians, 120 Nigerians, and 120 Kenyans (Sangare, et al., 2014). The results showed that the carrier frequency of SMA in Malians was lower, at 1 in 209 compared to 1 in 30-50 in Europeans (Labrum et al., 2007; Ogino et al., 2004; Sangare, et al., 2014). The carrier frequency was calculated by testing healthy individuals to determine if they had one copy of SMN1. Sangare et al (2014) did not account for silent carriers. Their low carrier frequency could have been masked by silent carriers. They further observed the presence of multiple copies of SMN1, compared to European and Asian populations. Africans were more likely to have three or more copies of SMN1 (Sangaré et al., 2014). They also investigated SMN2 copies and found them to be deleted more frequently in the sub-Saharan Africans compared to other populations (Sangaré et al., 2014). Multiple studies have shown that people of African ancestry are more likely to have three or more copies of SMN1 (Hendrickson et al., 2009). Furthermore, Sangare et al (2014) aimed to determine the cause of the SMN frequency difference in sub-Saharan Africans, and they found hybrid copies of SMN1/2 in 14% in Malians with three or more copies of SMN1. They observed the same pattern 16 in other African populations they studied. They hypothesised that gene conversion may be a cause of multiple copies of SMN1. However, gene conversion does not fully explain the multiple copies of SMN1 (Sangaré et al., 2014). 1.8. Diagnostic testing at the Division of Human Genetics, Johannesburg The Division of Human Genetics performs diagnostic, carrier, and prenatal testing for SMA. Diagnostic testing is performed by detecting the absence or presence of a homozygous deletion of SMN1. The absence/presence is detecting by amplification of the SMN1 and SMN2 gene, followed by digestion with a restriction enzyme that recognises the critical difference between the two genes. The absence/presence of the genes is then visualised by agarose gel electrophoresis. This method does not quantify the genes and only detects exon 7 of both genes. This method cannot determine carrier status, extent of the SMN1 or SMN2 deletion or gene conversions. A recent study performed by Vorster et al (2020) aimed to further understand the molecular basis of SMA in black patients by studying copy number variations in black SA patients. Patients were tested by Multiplex Ligation-dependent Probe Amplification (MLPA, MRC Holland, Amsterdam, Netherlands), which determines the copy numbers of the SMN1 and SMN2 genes, as well as the adjacent genes. The MLPA method was also assessed as a second-tier test to the absence/presence method. The advantage of the study was that adjacent genes in the SMN region were included in the kit, therefore, the extent of the deletion could be determined. Gene conversions could be predicted by analysing the extent of the deletion. Only 50.8% of black patients clinically suggestive of having SMA had a homozygous deletion of SMN1 and 49% of these patients thus remained without a diagnosis. The results presented by Vorster et al (2020) were consistent with those of previous studies by Labrum et al (2007) and Stevens et al (1999). The multiple copies observed in black populations make it difficult to assess carrier frequencies as some individuals may be silent carriers. It was concluded that MLPA is not an appropriate second tier diagnostic test in the black SA population as no additional diagnoses could be unequivocally confirmed (Vorster et al., 2020). It was hypothesized that some of these multiple gene copies may not be completely functional. 17 However, MLPA may be useful when testing families with informative pedigrees. Pedigrees from families together with copy number may be used to ascertain carrier status, (see Figure 1.8). A silent carrier is an individual who has two or more copies of the SMN1 gene on one chromosome and a deletion on the other chromosome. Silent carriers can pass on the chromosome with the deletion and therefore have children who are affected with SMA. The Figure below shows how a silent carrier may be identified when copy number detection of SMN1, is carried out on families with an affected individual or child. Figure 1.8. Pedigree of a family affected with SMA. The mother has two copies of SMN1, however both copies are found on one chromosome. The mother is therefore a silent carrier of SMA. The father has one copy of SMN1 and is a confirmed carrier of SMA. Their child has zero copies of SMN1 and two copies of SMN2. The above studies conducted in Africa indicate the need to increase understanding of SMA on the continent and in South Africa specifically. Approximately 49% of black patients with clinical features suggestive of SMA test negative for the homozygous deletion of SMN1. These cases could be due to mutations in the SMN1 gene not detected by routine diagnostic methods, or perhaps other genes affecting expression of SMN1. Furthermore, these patients could have a different neuromuscular disease. Nevertheless, further research needs to be performed to elucidate the disease in the many black patients with clinical features of SMA. 18 1.9. Aim The aim of the study was to investigate the utility of different methods used to detect SMN gene copy number and measure RNA expression in black South African patients with spinal muscular atrophy, and in those with clinical features suggestive of SMA. 1.10. Objectives • Objective 1: To optimise and validate the AmplideX® PCR/CE SMN1/2 Kit (Asuragen, Austin, Texas, USA) as an alternative method to determine DNA copy number of SMN genes in patients and controls. • Objective 2: To optimise and validate qPCR as a method to determine DNA copy number of SMN1 and SMN2 in patients and controls. • Objective 3: To compare the qPCR and AmplideX® PCR/CE SMN1/2 Kit (Asuragen, Austin, Texas, USA) methods to the already validated/verified MLPA kit (MRC Holland, Amsterdam, Netherlands), to determine the most appropriate approach for DNA copy number detection in black SA SMA patients. • Objective 4: To optimise and validate reverse transcription of RNA and compare the SMN expression levels with DNA copy number, to identify potential pathogenic copy number variations, and to compare the level of transcripts in patients and controls. 19 2. SUBJECTS AND METHODS This chapter will outline the process used to select subjects and control samples and describe the materials and methods used to achieve the aim of the study. The SMA database previously created from MLPA (MRC Holland, Amsterdam, Netherlands) analysis in the Division was used to select samples with various SMN1 and SMN2 copy numbers for validation of qPCR and the AmplideX® PCR/CE SMN1/2 Kit (Asuragen, Austin, Texas, USA), a new kit designed by Asuragen (introduced in 2019). Fifty samples were used from the previous study by Vorster et al (2020) and 42 new samples were collected for this study. Samples of patients with clinical features suggestive of SMA were used for validation of RT-qPCR and comparison of DNA copy number and RNA expression levels of the SMN1 and SMN2 genes. 2.1 Subjects and controls • SMN1 exon 7 homozygous deletion controls (M1/M1) Patients with a confirmed diagnosis of SMA, homozygous deletion of SMN1 (genotype M1M1: Mutation 1: deletion of SMN1/ Mutation 1: deletion of SMN1), results were obtained from previous diagnostic testing. These samples were used as positive controls for validation of qPCR, AmplideX® PCR/CE SMN1/2 Kit (Asuragen, Austin, Texas, USA) and RT-qPCR. • SMN2 exon 7 homozygous deletion controls (M2/M2) Samples with a homozygous deletion of SMN2 (genotype M2M2: Mutation 2: deletion of SMN2/ Mutation 2: deletion of SMN2), results were obtained from previous diagnostic testing. These samples were used as positive SMN2 controls for validation of qPCR, AmplideX® PCR/CE SMN1/2 Kit (Asuragen, Austin, Texas, USA) and RT- qPCR. 20 • SMA carriers (N/M1) Samples with a heterozygous deletion of SMN1 on MLPA (genotype N/M1: Negative/ Mutation 1: deletion of SMN1) were used to determine if RT-qPCR could identify carriers and if so, what range of SMN transcript expression were represented in likely carriers. Table 2.1 illustrates how samples were grouped according to DNA analysis and which controls were used for RT-qPCR. • Negative controls (N/N) Fresh blood samples were collected from 10 unrelated and unaffected random black SA subjects with no family history of SMA. The CNV status of these subjects was determined by MLPA analysis, and they were included as negative controls in the RT- qPCR method. These samples were used to validate the qPCR method and AmplideX® PCR/CE SMN1/2 Kit (Asuragen, Austin, Texas, USA). • Patient cohort-Non-deletion (U/U) This group forms the focus of this study. Patients of African ancestry who had clinical features suggestive of SMA but who tested negative for a SMN1 homozygous deletion on DNA studies were identified in collaboration with Prof John Rodda, Head of the Paediatric Neurology Division, Chris Hani Baragwanath Academic Hospital, Johannesburg. The subjects were assumed to have two unidentified mutations (U/U: Unidentified mutation/Unidentified mutation). Table 2.1. Groups of subjects and controls tested, together with numbers of individuals in each group. Genotype Definition MLPA, AmplideX, qPCR RT-qPCR Grand Total M1/M1 Homozygous deletion of SMN1, exon 7: Affected with SMA 14 4 14 M2/M2 Homozygous deletion of SMN2, exon 7 14 1 14 N/N Two or more copies of SMN1, exon 7 19 10 19 N/M1 One copy of SMN,1 exon 7: True Carrier of SMA 15 2 15 U/U Unidentified mutations, clinical features suggestive of SMA 30 2 30 Total tested 92 19 92 This table represents the five different subject groups and the number of subjects tested by each method. *Note: All samples were tested for DNA copy number but not all were tested for RNA expression analysis since RNA was not available from all of these subjects. The total number of samples tested was 92. Each method will be explained fully under methods section. 21 Previously extracted DNA was stored in the Division and used as part of this study. New blood samples were collected in EDTA and Tempus™ Blood RNA Tubes (Applied Biosystems, Thermo Fisher Scientific, Waltham, Massachusetts, USA) from additional black subjects and their parents when available, for DNA and RNA extraction respectively. The strategy for this study is demonstrated by means of a flow diagram in Figure 2.1. Figure 2.1. Strategy for testing of different methods 22 2.2 Methods 2.2.1 Genomic DNA Copy Number Detection The first objective was to determine the utility of alternative methods to detect DNA copy numbers of SMN1 and SMN2. These alternative methods, AmplideX® PCR/CE SMN1/2 Kit (Asuragen, Austin, Texas, USA) and qPCR were first optimised and validated according to NHLS standards. These methods were assessed for cost efficiency, labour intensity, and ability to predict accurate copy number changes. These methods were further assessed to determine if they could provide further insight and diagnosis in patients with suspected but unidentified mutations. To assess the utility of the qPCR and AmplideX® PCR/CE SMN1/2 kit, both methods were compared with the routine diagnostic test, homozygous deletion of SMN1 method and MLPA which is used mostly for carrier testing. Figure 2.2 illustrates the procedure followed to determine the diagnostic utility of alternative methods for DNA copy number detection in black SA patients. 23 Figure 2.2. Flow diagram showing DNA copy number detection. The SMN1 and SMN2 copy numbers were detected by MLPA, qPCR and the AmplideX kit to determine the utility of the methods. 24 2.2.1.1 DNA Extraction All the copy number investigations were carried out on DNA extracted from blood samples collected in EDTA tubes. Samples collected before December 2018 were extracted using the salting out DNA extraction method (Miller et al., 1988). Samples collected from January 2019 were extracted using the Flexi Gene DNA kit (QIAGEN Venlo, Netherlands), and the 400µl and 3ml whole blood extraction protocol (Qiagen, 2014). All samples were extracted according to diagnostic protocols of the Division of Human Genetics, see appendix two for DNA extraction worksheet. 2.2.1.2 DNA yield, purity, integrity determination and routine diagnostic testing DNA yield was measured on the NanoDrop® ND-1000 UV-Vis Spectrophotometer (Thermo Fisher Scientific, Waltham, Massachusetts, USA) with absorbance at 260nm. DNA purity was determined from the A260/A280 ratio (expected 1.7 to 1.9) and A260/A230 ratio (expected 1.8 to 2.0). Samples with impurities were dialysed using Millipore® Filter Membranes (Merk, Darmstadt, Germany). DNA integrity was checked by electrophoresis through a 0.8% agarose gel, and by visualisation under UV light. 2.2.1.3 Homozygous SMN1/SMN2 deletion method using RFLP. The method utilises a restriction fragment length polymorphism (RFLP) and is available as part of the routine diagnostic testing procedure in the Division. It was used to determine the absence or presence of SMN1 or SMN2 genes. The method utilises the critical one base pair difference in exon 7 of SMN1 and SMN2 which can be identified by digesting the PCR product with HinfI endonuclease enzyme. Currently the Division performs RFLP analysis on all samples referred for diagnostic testing and MLPA is used only for carrier testing or where accurate determination of copy number is relevant. The limitations of the method are its inability to test for carriers, copy number of SMN1 or SMN2 and to determine the extent of the SMN deletion. The results for the subjects tested were interpreted in comparison with the known control samples, as illustrated in Figure 2.3. All the samples that were referred in for routine diagnostic testing were first assessed on RFLP. Samples without the homozygous 25 SMN1 deletion, but where the patient had clinical features suggestive of SMA, were then labelled as (U/U). Figure 2.3. Homozygous deletion SMN1/SMN2 method. A, the yellow highlighted region indicates the primer pair mapping to the genomic reference sequence of the SMN1 and SMN2 genes; the forward primer has an A introduced instead of T to create a restriction enzyme site. B, The HinfI recognition site is illustrated. C, A 101bp PCR product is amplified and digested with HinfI which produces three digest products. N/N or U/U represents subjects who have at least one copy of SMN1 and SMN2, with three bands of 101bp, 78bp and 23bp. M1/M1 represents samples with a homozygous SMN1 deletion (101bp band only) and M2/M2 represents samples with a homozygous SMN2 deletion (78bp and 23bp bands only). 2.2.1.4 Multiplex ligation-dependent probe amplification (MLPA) The MLPA method is a semi-quantitative PCR capable of detecting copy number changes of up to 60 probes in a single reaction. MLPA consists of four basic steps, denaturation, hybridisation, ligation, and amplification (see Figure 2.4 for a summary of the method and appendix 5 for the step-by-step procedure). 26 Figure 2.4. Steps in the MLPA test process. Each MLPA kit has up to 60 probes, which are fluorescently labelled with a universal primer sequence and an oligonucleotide specific to the target DNA sequence. The reverse probe oligonucleotide includes a region specific to the target DNA, a stuffer sequence, and a universal reverse primer sequence. Target DNA is denatured and hybridised to disease specific probes, the probes are ligated and amplified by PCR. The PCR products are separated by size using capillary electrophoresis on the 3130xl or the 3500xl Genetic Analyzer (Applied Biosystems, Thermo Fisher Scientific, Waltham, Massachusetts, USA). MLPA analysis was carried out using GeneMapper™ Software (Applied Biosystems, Thermo Fisher Scientific, Waltham, Massachusetts, USA) and Coffalyser.Net™ (MRC Holland Amsterdam, Netherlands).(BQUB13-Ecarmona, 2013) The P021 probe mix The SALSA MLPA probe-mix P021 (MRC Holland Amsterdam, Netherlands) was used to determine copy number changes of the SMN1 and SMN2 genes. This is the kit that will be referred to when mentioning the MLPA method. The P021 probe mix has 37 MLPA probes with amplification products between 140 to 463 nucleotides (nt). Furthermore, the kit contains 9 control probes for DNA quality, denaturation, ligation, and X and Y controls to confirm sex. The kit also contains probes for SMN1, exons 7 and 8, SMN2, exons 7 and 8, and neighbouring genes GTF2H2, RAD17, NAIP, SERF1B and reference regions. MLPA was able to detect homozygous and heterozygous SMN1 deletions and to determine up to six copies of SMN1 in black SA 27 subjects (Vorster et al., 2020). DNA samples were diluted to 40ng and MLPA was carried out according to the one tube protocol described by MRC Holland. P021 MLPA data analysis DNA fragments were separated by capillary electrophoresis followed by initial analysis using GeneMapper™ Software (ABI Foster City, California, United States). A GeneScan™ 500 LIZ™ dye size standard (Applied Biosystems, Thermo Fisher Scientific, Waltham, Massachusetts, USA) was used to label peaks and determine run quality. All DNA fragments were assessed to confirm proper labelling/binning of sample peaks. Amplified products were further analysed using Coffalyser.Net™ (MRC Holland Amsterdam, Netherlands), with quality checks carried out according to the MRC one tube protocol recommendations. The negative and positive controls were analysed to ensure reproducibility. The SMN1 and SMN2 copy numbers were recorded in an Excel© (Microsoft, Redmond, Washington, United States) database and genotypes were assigned for further comparison with the AmplideX kit and DNA qPCR method. Figure 2.5 illustrates a negative subject with no deletions and a subject with a homozygous SMN1 exon 7and exon 8 deletions. Figure 2.5. Results of MLPA analysis using the SALSA MLPA probe-mix P021. A, sample with normal SMN1 and SMN2 copy number as well as of adjacent genes. B, sample with homozygous SMN1, exons 7 and 8 (indicated by the black arrow) deletion and three copies of SMN2, exons 7 and 8, suggesting a gene conversion from SMN1 to SMN2. Some variability in other probes is also observed, but these probes are not located in the critical SMN1, exon 7 region. 28 Accurate interpretation of MLPA results depends on the selection of appropriate controls. Three negative controls (with 2 copies of SMN1 and 2 copies of SMN2) were included in each run. A heterozygous SMN1 deletion (copy number = 1) and a homozygous SMN1 deletion (copy number = 0) sample were also included as positive controls. The copy number ratio is calculated as a dosage quotient (DQ), which is the final probe height ratio as compared to the internal reference probes of that sample, known as intra-normalisation, and external reference probes (negative controls) for the whole run, known as inter-normalisation. The expected DQ ranges and related copy numbers is indicated in table 2.2. Table 2.2. The relationship between DQ values and SMN1 copy number on MLPA analysis (“MLPA General Protocol MDP-v007.pdf,” n.d.) Copy number status Copy Number DQ Peak Height Homozygous deletion 0 copies DQ = 0 Heterozygous deletion 1 copy 0.40 0.700 2.2.1.6 Real-time PCR (qPCR) Quantitative polymerase chain reaction (qPCR) or real time PCR is a method to quantify the amount of PCR product, as it is being amplified (Schefe et al., 2006). There are two qPCR detection methods: TaqMan™ assay (Applied Biosystems, Thermo Fisher Scientific, Waltham, Massachusetts, USA) which is sequence specific and the SYBR green method which is based on incorporation of generic non- sequence-specific double-stranded DNA-binding dye. The four stages of qPCR are: linear ground, early exponential, linear exponential phase (log) and plateau phases. The fluorescent signal is calculated during the last phase (Schefe et al., 2006). The method is used for multiple applications such as, genotyping, copy number detection, gene expression etc. In the present study the method was used with TaqMan™ probes 31 for copy number detection of SMN1 and SMN2. An illustration of TaqMan™ qPCR steps is shown in Figure 2.8, starting with denaturation, annealing of primers and probe, followed by extension of primers and cleavage of fluorophore (dye) and repetition of cycles. The method was selected for evaluation as an alternative method for diagnostic, carrier, and prenatal testing of SMA. Ten samples were selected for optimisation and validation of qPCR, those selected included three negative controls (N/N), three positive controls (M1/M1), two samples heterozygous for SMN1 (N/M1) and two samples with a homozygous deletion for SMN2 (M2/M2). Once qPCR was validated other samples were tested to determine the utility of this method. Figure 2.8. Quantitative PCR by use of TaqMan™ probes (Anhuf et al., 2003). Each qPCR method contains a forward and a reverse primer for the target DNA sequence. A TaqMan™ probe has a fluorophore (FAM, VIC, NED etc) on the 5’ end which emits light upon excitation. The quencher is located on the 3’ end of the TaqMan™ probe and absorbs the fluorescence of the fluorophore. The DNA is denatured into single strands followed by the annealing of primers and TaqMan™ probes to target sequences. This is followed by extension of the primers by DNA polymerase. Once 32 extension reaches the bound TaqMan™ probe the fluorophore is cleaved and as the fluorophore and quencher are not in close proximity, light is released and is detected by the qPCR machine. The target sequence is amplified, and the fluorescence emitted from the fluorophore is quantified to determine copy number. SMN1 gene copy qPCR All DNA samples were diluted to 50 ng/µl. Three negative (N/N) controls were included with each run on the QuantStudio 3 (Applied Biosystems, Thermo Fisher Scientific, Waltham, Massachusetts, USA). Primers and TaqMan™ probes were used for SMN1, SMN2 and the CFTR gene, which was used as an internal control, as illustrated in Table 2.6. The primers and probes were previously designed as part of a Master’s project (Vorster et al., 2020). The primers for SMN1 and SMN2 were the same, however the probes for the two genes were different, as illustrated in table 2.6 and Figure 2.9. Furthermore, the CFTR gene was used as an internal control and primers were designed as indicated in table 2.6 and Figure 2.10. See appendix seven for step- by-step procedure. Table 2.6. Primer and probe sequences of the SMN1 and SMN2 qPCR method Gene Primer Sequence (5’-3’) Probe Sequence (5’-3’) SMN1 SMN-F CTTGTGAAACAAAATGCTTTTTAACATCCAT SMN1-Ex7 FAM-R-TTTTGTCTGAAACCC SMN-R GAATGTGAGCACCTTCCTTCTTTTT SMN2 Was amplified by SMN-F/R primers SMN2-Ex7 VIC-ATTTTGTCTAAAACCC CFTR CFTR-F CAACCTGCCTTCTCTGGGAAT CFTR NED-CTGCTGCCTGAACAT CFR-R CAAGCCTGGCAATAAACAATGA *The one nucleotide difference (c.840C>T) between SMN1 and SMN2 is highlighted in blue. Figure 2.9. Annotation of SMN primers, (SMN1- NG_008691.1 and SMN2- NG_008728.1). A single primer pair SMN-F and SMN-R illustrated in purple was used to amplify both SMN1 and SMN2, exon 7 template DNA. Each SMN copy had its own specific probe, blue (FAM) for SMN1 and yellow (VIC) for the SMN2 probe. Primer and probe sequences are detailed in table 2.6. 33 Figure 2.10. Annotation of CFTR primers, (NG_016465.4). A single primer pair CFTR-F and CFTR-R in pink and the CFTR probe in orange. Primer and probe sequences detailed in table 2.6. SMN1 and SMN2 qPCR data analysis The QuantStudio design and analysis software v1.4.1 (Applied Biosystems, Thermo Fisher Scientific, Waltham, Massachusetts, USA) was used to create the experiment and import run file onto the QuantStudio 3 (Applied Biosystems, Thermo Fisher Scientific, Waltham, Massachusetts, USA). The software, freely available for download from the ThemoFisher website, was designed specifically for the QuantStudio qPCR instruments to create a run, sample worksheet, PCR conditions and analysis of experiment results. Each experiment was set up in triplicate. Amplification data were imported into the QuantStudio design and analysis software v1.4.1 (Applied biosystems, Thermo Fisher Scientific, Waltham, Massachusetts, USA), and standard curve analysis was used for copy number detection. The amplification plot was analysed to ensure adequate amplification followed by assessment of quality control checks as recommended by the manufacturer. The cycle threshold (Ct) is defined as the number cycles needed for the fluorescent signal to exceed the background level. The Ct was calculated after PCR amplification, outliers were omitted from analysis and results were reviewed. The copy numbers were determined using relative quantification calculations for relative standard curve and comparative Ct experiments as indicated in Figure 2.11 and Figure 2.12. The SMN copy number was measured by determining the coefficient of variation (CV). The CV for each sample was calculated by dividing the standard deviation (SD) of three negative controls in a given subject by the average SMN copy number for that subject (Gómez-Curet et al., 2007; Passon et al., 2009). 34 Figure 2.11. An example of a qPCR amplification plot visualised on the QuantStudio 3. Each colour represents a different concentration of DNA used. Figure 2.12. Example of a gene copy number plot visualised on the QuantStudio 3. The plot was used to evaluate the level of different samples relative to the reference sample. The fold change is indicated as numbers on top of each sample. Four samples were analysed and the relative quantification (RQ) to the internal control is indicated above each sample. 35 The qPCR method was more informative than the RFLP, as it determines the DNA copy number of SMN1. It therefore detects heterozygous carriers of SMA and is less expensive than MLPA. The qPCR method detected the homozygous SMN1 deletion, heterozygous deletions of SMN1 (carriers) and multiple copies of SMN1 and SMN2 (Anhuf et al., 2003). 2.2.2 SMN1/SMN2 RNA expression analysis Gene expression studies were undertaken to provide an alternative molecular method for testing patients who present with clinical features of SMA but have tested negative for a homozygous SMN1 deletion using DNA analysis. Reverse transcription quantitative PCR (RT-qPCR) was used to determine mRNA levels of the full length SMN1 (FL-SMN1), full length SMN2 (FL-SMN2) as well as the SMN mRNA lacking exon 7 of both SMN1 and SMN2 which was labelled as SMN∆7 (illustrated in table 2.9). The design of primers and probes is outlined below. Blood was collected in specialised Tempus™ Blood RNA Tube (Applied Biosystems, Thermo Fisher Scientific, Waltham, Massachusetts, USA). RNA extraction was performed using the Tempus™ Spin RNA Isolation Kit (Applied Biosystems, Thermo Fisher Scientific, Waltham, Massachusetts, USA). See appendix three for RNA extraction procedure and worksheet. Reverse transcription of RNA into cDNA was performed using the ImProm-II™ Reverse Transcription System (Promega, Madison, Wisconsin, USA). The cDNA was amplified using TaqMan™ probes and RNA expression levels were determined with the comparative Ct experiments with an endogenous control Glyceraldehyde 3- phosphate dehydrogenase (GAPDH) included in each experiment. Table 2.7. Description of RT-qPCR transcripts Transcript abbreviation Explanation FL-SMN1 Full length SMN1 FL-SMN2 Full length SMN2 SMN∆7 SMN transcript lacking exon 7 GAPDH Endogenous control 36 2.2.2.1 RNA extraction Blood was collected in Tempus™ Blood RNA Tubes (Applied Biosystems, Thermo Fisher Scientific, Waltham, Massachusetts, USA), so that RNA was stabilised, and RNA expression patterns preserved. Tempus RNA tubes were stored at -70°C. RNA extraction was done was using a Tempus™ Spin RNA Isolation Kit (Applied Biosystems, Thermo Fisher Scientific, Waltham, Massachusetts, USA) according to the manufacturer’s guidelines. A summary of RNA extraction performed is shown in Figure 2.13 (Duale et al., 2012). RNA yield was measured on the NanoDrop® ND- 1000 UV-Vis Spectrophotometer (Thermo Fisher Scientific, Waltham, Massachusetts, USA) with an absorbance at 260nm. RNA integrity was detected on 0.8% agarose gel electrophoresis and visualised by UV. See appendix three four RNA extraction procedure. Figure 2.13. Summary of RNA extraction protocol using Tempus™ Blood RNA Tube (Applied biosystems, Thermo Fisher scientific, Waltham, Massachusetts, USA). 2.2.2.2 Reverse transcription of RNA to cDNA Reverse transcription from freshly collected RNA was performed using the ImProm- II™ Reverse Transcription System (Promega, Madison, Wisconsin, USA), Figure 2.14 (Duale et al., 2012). An endogenous control, GAPDH, was used to normalise SMN expression levels using relative quantification. See appendix four for the step-by-step reverse transcription procedure. A worksheet was created for each test and was tabulated using appendix nine. 37 Figure 2.14. Summary of reverse transcription using the ImProm-II™ Reverse Transcription System (Promega, Madison, Wisconsin, USA). Each reaction included a positive control, negative no transcriptase enzyme, no template RNA control and a negative no template RNA control. 2.2.2.3 RT-qPCR experiment design Table 2.8 and Figure 2.15 illustrates the transcript primers used and their probes. The experiment was carried out in quadruplicate with three controls, each control had two DNA copies of SMN1 and two DNA copies of SMN2. The GAPDH endogenous control was included in each experiment to measure the relative gene expression. Analysis of results was carried out on the QuantStudio design and analysis software (Applied Biosystems, Thermo Fisher Scientific, Waltham, Massachusetts, USA). Table 2.8. Primer and probe sequences of the SMN expression RT-qPCR relative quantification method designed by Vorster et al (2020) Transcript Primers Primer Sequence (5’-3’) Probe Probe Sequence (5’-3’) FL-SMN1 FL-SMN- F TACATGAGTGGCTATCATACTGGCTA FL-SMN1 NED-TATGGGTTTCAGACAAA FL-SMN- R AATGTGAGCACCTTCCTTCTTTTT FL-SMN2 Same as FL-SMN1 FL-SMN2 VIC-ATATGGGTTTTAGACAAAA SMNΔ2 SMN∆7-F CTGATGCTTTGGGAAGTATGTTAATT Ex7del- SMN NED- CATGGTACATGAGTGGCTA SMN∆7-R CCAGCATTTCTCCCATATAATAGCCAGTA GAPDH GAPDH-F GGGTGTGAACCATGAGAAGTATGA GAPDH FAM-CAAGATCATCAGCAATGC GAPDH- R CTAAGCAGTTGGTGGTGCAGG *The one nucleotide difference (c.840C>T) between FL-SMN1 and FL-SMN2 is highlighted in blue. 38 Figure 2.15. Primer and probe sequences of FL-SMN1 (NM_000344.4) and FL-SMN2 (NM_017411.4). The FL-SMN forward and reverse primers are indicated in yellow. The same primers were used to amplify both SMN1 and SMN2 full length transcripts. The probes used to distinguish FL-SMN1 and FL-SMN2 transcripts had a critical nucleotide difference indicated in green. 2.2.3 Comparative analysis of DNA and transcript levels of SMN1 and SMN2 in subjects and controls Comparison of expression levels of the SMN transcripts with the DNA copy number was performed to identify potential pathogenic copy number variations in patients with clinical features suggestive of SMA, who have tested negative for a homozygous deletion of SMN1. Figure 2.15 illustrates an approach to compare SMN1 and SMN2 gene copy number with RNA expression levels in this study. Please note that this Figure may not necessarily be an accurate representation of the situation in black South African subjects, but it presents a guideline to assist with comparative analysis of DNA copy number and RNA expression levels. 39 Figure 2.16. Predicted comparative analysis of DNA SMN1 and SMN2 copy number versus RNA transcription. A indicates the genotypes that may be found from DNA analysis, N indicates a normal SMN1 copy, M1 is a deletion of SMN1, exon 7 and U indicates an unidentified mutation. B indicates the genotype of subjects and controls. C indicates the potential SMN1 and SMN2 RNA expression predicted to occur according to each genotype. The RNA expression level of SMN1 in subjects with M1/U and U/U genotypes is unknown. Three possible transcripts were measured by RT-qPCR - FL-SMN1, FL-SMN2 or truncated SMNΔ7. The percentages of each transcript are estimated according to general SMN1 and SMN2 copy number, these percentages may differ in the SA population as they have hypervariable SMN1 and SMN2 CNVs. 40 2.2.4 Ethics An ethics application was approved unconditionally by the University of the Witwatersrand Committee for Research on Human Subjects (Medical) (Ethics clearance number: M180498) see appendix 1 for certificate. 2.2.5 Summary of methods Three methods for quantification of SMN1 and SMN2 DNA copy numbers were evaluated for their clinical utility in diagnostic and carrier testing of SMA. The RFLP methods detects the presence or absence of both SMN1 and SMN2 DNA copies. Subjects previously tested by RFLP who had a homozygous deletion of either SMN1 or SMN2 were used as M1/M1 and M2/M2 controls, respectively. All subjects’ SMN copy numbers were quantified by MLPA, AmplideX and qPCR. The AmplideX kit is a new commercial kit for testing SMA. The kit was validated as part of this research study. Another method qPCR was used to detect SMN1 and SMN2 copies and compared with the already utilised P021 Probe mix for MLPA and the new AmplideX kit. RNA expression of 19 subjects who were also tested by MLPA, AmplideX and qPCR was performed. SMN transcripts were quantified to determine the relationship between DNA copy number and RNA copy number as well as to possibly diagnosis subjects with SMA who do not have a typical homozygous deletion of SMN1. These suggestive SMA subjects were hypothesised to have no SMN1 transcripts on RNA expression analysis. 41 3 RESULTS This chapter outlines results obtained from DNA copy number analysis (section 3.1) as well as RNA expression studies (section 3.2). Three methods designed to detect DNA copy number were compared – MLPA analysis, an in house designed qPCR- based method and a commercial kit AmplideX® PCR/CE SMN1/2 kit (Asuragen, Austin, United States). The qPCR method and the AmplideX® PCR/CE SMN1/2 kit have not been previously tested in the Division and were optimised and validated for copy number quantification of SMN1 and SMN2. Both methods were compared with the P021 SALSA MLPA kit (MRC Holland Amsterdam, Netherlands), which is the recommended method for diagnostic testing of SMA and is the current method used for diagnostic testing in the Division when copy number determination is required. The purpose of the comparison was firstly to select the method with the best utility in SA populations. Further, the methods were evaluated to assess whether they would aid in understanding the complex mechanism of SMA disease, specifically in the black SA population, who appear to have a complex SMN1/SMN2 architecture which complicates diagnostic testing. The AmplideX kit (Asuragen, Austin, United States) and P021 SALSA MLPA kit (MRC Holland Amsterdam, Netherlands) kits will be referred to as AmplideX and MLPA going forward for simplicity. This chapter will further present data from the validation and optimisation of a RT- qPCR method designed to determine the gene expression of three different SMN transcripts. Since the analysis of DNA copy number methods has previously been shown to be complicated by the complex nature of the SMN region, especially in African populations, a gene expression method was investigated as an alternative or second line method. Transcripts levels will be compared with DNA results to determine the relationship between DNA copy number (DNA) and gene expression (RNA). 3.1 DNA copy number detection: MLPA, AmplideX and qPCR A total of 92 subjects were tested using MLPA, qPCR and AmplideX methods. There were 71 samples from black subjects (71/92, 77.2%) and 21 samples from white subjects (21/92, 22.8%) (as shown in Table 3.1). The subjects were categorised in five different genotype groups - M1/M1, M2/M2, N/N, N/M1 and U/U. Subjects were 42 labelled according to Table 2.1 in the “Subjects and Methods” section. Because of the concern that the diversity of copy number in black SA populations could complicate optimisation and analysis, previously tested white subjects were also used to validate and optimise the new methods: qPCR and AmplideX. Table 3.1. Ethnicity and genotype of subjects tested for DNA copy number. Ethnicity Genotype* Number of subjects tested Black SA Total black SA subjects 71 M1/M1 12 M2/M2 10 N/M1 9 N/N 10 U/U 30 White SA Total white SA subjects 21 M1/M1 1 M2/M2 4 N/M1 7 N/N 9 Total 92 * Genotypes defined in Section 2.1.1 – 2.1.5 3.1.1 DNA copy number detection by MLPA Fifty subjects of various genotypes were previously tested on MLPA as part of a Master’s project and a further 42 subjects were included in this project as part of validation and optimisation of the new methods (Vorster et al., 2020). Figure 3.1 illustrates examples of typical results obtained using MLPA analysis. The P021-A1 kit has 37 probes, 11 of which are specific to the SMN region as defined in section 2.2.1.4. MLPA analysis focussed on SMN1, exons 7 and 8 as well as SMN2, exons 7 and 8. A deletion or duplication of additional probes in the region could provide details about the possible extent of deletions or duplications as well as the presence of gene conversions. For the purposes of the study only exon 7 of SMN1 and SMN2 was analysed and quantified and will therefore be referred as SMN1 or SMN2. M1/M1 subjects - MLPA DNA copy number detection A total of 13 M1/M1 subjects were included in the study (12 black, 1 white). These subjects are affected with SMA and were used as positive controls. All 13 had a homozygous deletion of SMN1, exon 7 (Table 3.2). However only 4/13 (31%) had a homozygous deletion of SMN1, exon 7 and exon 8. Deletion of both exons in the SMN1 gene illustrate the extent of the deletion as well as the complexity of the SMN 43 region. A subject with a homozygous deletion (zero copies of SMN1) is discussed in Figure 3.1.D. Table 3.2. Comparison of SMN1, exon 7 and exon 8 copy number in M1/M1 subjects Disease Code Genotype MLPA SMN1, exon 7 MLPA SMN1, exon 8 SMA677* M1/M1 0 0 SMA762 M1/M1 0 2 SMA737* M1/M1 0 0 SMA995 M1/M1 0 2 SMA1515 M1/M1 0 1 SMA1838* M1/M1 0 0 SMA1949 M1/M1 0 ≥3 SMA1958 M1/M1 0 2 SMA1978 M1/M1 0 2 SMA1988 M1/M1 0 2 SMAR7 M1/M1 0 ≥3 SMAR15 M1/M1 0 1 SMAR18* M1/M1 0 0 Total 13 13 13 Percentage of deletions extending to exon 8 = zero 4/13 (31%) Percentage of exon 8 greater than one 9/13 (69%) *Indicates the four subjects with zero copies of SMN1, exons 7 and 8. M2/M2 subjects - MLPA DNA copy number detection Fourteen subjects (10 black, 4 white) with a homozygous deletion of SMN2 were tested, these subjects were not affected with SMA. Subjects with the M2/M2 genotype often had more than two copies of the SMN1 gene, possibly due to gene conversion from SMN2 to SMN1. N/M1 subjects - MLPA DNA copy number Sixteen subjects (9 black, 7 white) with an N/M1 genotype were included in this study. All subjects were previously confirmed to be carriers of one copy of the SMN1, exon 7 deletion (M1 mutation, 1:0 genotype). Figure 3.1 C illustrates an N/M1 subject who had one copy of SMN1, exon 7 and is therefore a true carrier of SMA. In total 12/16 (75%) of the N/M1 subjects tested had one copy of SMN1 and 4/16 (25%) had two copies of SMN1. All four of the subjects with two copies of SMN1 (2:0 genotype) were black and were previously confirmed silent carriers by use of a family pedigree (see section 1.8 and Figure 1.10 for a definition and illustration of a silent carrier, respectively). 44 N/N subjects - MLPA DNA c