i A retrospective analysis of laboratory data to identify clinical practice trends in prostate cancer screening and diagnosis by level of care in the Gauteng Province between 2006 and 2016 Submitted by Naseem Cassim A thesis submitted to the Department of Molecular Medicine and Haematology, Faculty of Health Sciences, University of the Witwatersrand, in fulfilment of the requirements for the degree of Doctor of Philosophy September, 2021 ii Declaration I, Naseem Cassim declare that this research report is my own unaided work. The report is being submitted of requirements of the degree of Doctor of Philosophy in the University of the Witwatersrand, Johannesburg. It has not been submitted previously for any degree or examination at any other university. ______________________________ Signed 16 September 2021 Date iii DEDICATION I dedicate this thesis to my late parents Fatima Jassat and Abdool Hay Cassim, my dear wife Zarina Khan and our sons Muhammed Rashaad and Rameez Cassim. iv AUTHORS CONTRIBUTION TO THE WORK I was involved in all the stages of this study from identifying the problem statement, developing a hypothesis, study design, ethics application, protocol development and executing the data analysis. I requested the data from the corporate data warehouse (CDW) after obtaining approval from the Academic Affairs, Research and Quality Assurance (AARQA) department of the National Health Laboratory Service (NHLS). I performed all the data preparation and conducted the data analysis including statistical analysis. I also carried out the data checks and coded data to facilitate the analysis. v PUBLICATIONS ARISING FROM THIS THESIS The following publications are submitted Cassim N, Mapundu M, Olago V, Celik T, George JA and Glencross DK. (2021) Using text mining techniques to extract prostate cancer predictive information (Gleason score) from semi-structured narrative laboratory reports in the Gauteng province, South Africa. BMC Medical Informatics and Decision Making Contributions to the paper by each author: NC providing leadership, technical assistance to validate the study findings and prepared the initial draft. MM and VO developed the methodology and conducted the research. All authors contributed to reviewing initial draft. TC, JAG & DKG contributed to the drafting and revising of the work critically for important intellectual content as well as overall supervision. All authors read and approved the final manuscript. The following publications are submitted and currently under review Cassim M, Rebbeck TR, Glencross DK and George JA. (2021) Retrospective analysis of prostate-specific antigen testing trends and primary health care public-sector guidelines in the Gauteng Province, South Africa, between 2006 and 2016. BMJ Open Contributions to the paper by each author: NC made substantial contributions to the conception or design of the work, acquisition of laboratory data, data analysis, drafting the work and revising it critically for important intellectual content. TRR made substantial contributions to the conceptualisation of the study design, interpretation of data and revising the work critically for important intellectual content. DKG & JAG contributed to the drafting and revising of the work critically for important intellectual content as well as overall supervision. All authors read and approved the final manuscript. vi The following publications have been published Cassim N, Ahmad A, Wadee R, Rebbeck TR, Glencross DK and George JA (2020) Prostate Cancer age-standardised incidence increase between 2006 and 2016 in the Gauteng Province, South Africa: a laboratory data-based analysis. S Afr Med J. 111 (1), 26-32 https://doi.org/10.7196/SAMJ.2020.v111i4850 (Appendix D) Contributions to the paper by each author: NC made substantial contributions to the conception or design of the work, acquisition of laboratory data, data analysis, preparing the first draft and final draft submitted. AA & RW assisted with the SNOMED lookup table development, data review and reviewing the work. TRR contributed to the conception or design of the work, data review, reviewing the work and editorial input. DKG & JAG contributed to the drafting and revising of the work critically for important intellectual content as well as overall supervision. All authors read and approved the final manuscript. Cassim N, Ahmad A, Wadee R, George JA and Glencross DK. (2020) Using Systematized Nomenclature of Medicine (SNOMED) clinical terms (CT) codes to assign histological findings for prostate biopsies in the Gauteng Province, South Africa: Lessons learnt. Afr J Lab. 2020; 9 (1):1-9 https://doi.org/10.4102/ajlm.v9i1.909 (Appendix A) Contributions to the paper by each author: NC developed the methodology, conducted the research and prepared the initial draft. All authors contributed to reviewing the initial draft. RW and AA assisted with the SNOMED lookup table development. JAG & DKG contributed to the drafting and revising of the work critically for important intellectual content as well as overall supervision. Peer-reviewed conference proceeding Authors: Cassim N, Mapundu M, Olago V George JA Glencross DK and George JA. (2019) Title: Using big data techniques to improve prostate cancer reporting in the Gauteng Province, South Africa. Date: 23-30 August 2019 (17th World Congress of Medical and Health Informatics). Place: Lyon, France Citation: Cassim, N., Mapundu, M., Olago, V., George, J. A., & Glencross, D. K. (2019). Using Big Data Techniques to Improve Prostate Cancer Reporting in the Gauteng Province, South Africa. Studies in health technology and informatics, 264, 1437–1438. https://doi.org/10.3233/SHTI190472 vii Conference proceedings Authors: Cassim N, Ahmad A, Wadee R, Rebbeck TR, Glencross DK and George JA. (2019) Title: Prostate Cancer age-standardised incidence increase between 2006 and 2016 in the Gauteng Province, South Africa: a laboratory data-based analyses. Date: 18-21 July 2018 (Path Red conference). Place: Bredell, South Africa Authors: Cassim N, Mapundu M, Olago V, Wadee R, George JA and Glencross DK. (2019) Title: Big data approaches to extract clinical information from prostate biopsy results in the Gauteng Province. Date: 18-21 July 2018 (Path Red conference). Place: Bredell, South Africa Authors: Mapundu M, Olago V, Cassim N, Wadee R, George JA and Glencross DK. (2019) Title: Using supervised machine learning algorithms to assign prostate biopsy histological findings. Date: 22 August 2018 (School of Public Health Research Day). Place: Johannesburg, South Africa Grants National Health Laboratory Service (NHLS) development grant – R79 000 Awards School of Public Health Research Day best oral presentation for the paper “Using supervised machine learning algorithms to assign prostate biopsy histological findings”. viii Abstract Background: Prostate cancer (PCa) is the leading male neoplasm in South Africa, with an age-standardised incidence rate (ASIR) of 68.0 per 100,000 population reported in 2018. PCa is also the most diagnosed neoplasm among men globally. Local studies have reported that Black African men present with higher grade and stage disease, higher serum PSA and less often receive potentially curative treatment than men of other race groups. Updated local guidelines recommend informed patient-based screening for males with a life expectancy ≥10 years, commencing at 40 years for Black Africans. PCa risk categories are defined using the total prostate specific antigen (PSA), Gleason score (GS) and clinical stage. The GS is the strongest predictive factor for treatment. The updated grade group (GG) was developed based on the GS as follows: (i) GS ≥6: GG1, (ii) GS 3 + 4 = 7: GG2, (iii) GS 4 + 3 = 7: GG3, (iv) GS = 8: GG4 and (v) GS ≥9: GG5. One of the challenges is the tedious manual extraction of an adenocarcinoma histological finding and the GS as this information is embedded within the semi-structured narrative prostate biopsy report. There is a paucity of local data, with most studies reporting data in a urological setting with small sample sizes, based at one or more academic hospitals. Data mining has the potential to turn narrative reports into information by applying various computational techniques. Aims: The aim of this thesis was to describe PSA testing and histological diagnosis of PCa using laboratory data for men attending public-health facilities in the Gauteng Province. A further aim was to use text mining to extract and describe GS, specifically for Black Africans. A further aim was to assess trends in PSA testing for primary health care facilities. Automated methods to extract PCa information to reduce the burden of manual coding were a further aim. Methods: A retrospective descriptive study design was used to analyse prostate biopsy and PSA laboratory data between 2006 and 2016. The inclusion criteria were men ≥30 years. The Systematized Nomenclature of Medicine (SNOMED) clinical terms (CT) morphology (M) and topography (T) codes were used to develop lookup tables to assign the following histological findings: (i) diagnosis (benign/malignant) (ii) sub-diagnosis (adenocarcinoma) and (iii) sub- result (inflammation type). Two experts manually coded the diagnosis after reading the narrative prostate biopsy report to assess the positive predictive value (PPV) of the pathologist assigned SNOMED CT codes. For 1000 randomly selected prostate biopsies with PCa, predictive analytics and text mining was used to automate the extraction of the GS, reporting precision, recall and the F score. For the prostate biopsy data, data was reported for only prostatic biopsies. M or T codes were done manually. The GS was manually coded for an ix adenocarcinoma sub-diagnosis. PCa was defined as an adenocarcinoma histological finding with the GS reported. We reported associations of PCa with a GG ≥4 for race group. Both the biopsy and PSA were de-duplicated using the CDW unique patient identifier to report: (i) ASIR and (ii) develop a presentation cohort (first-ever PSA). The PSA data was analysed to report patient numbers by calendar year, age category and race group as well as descriptive statistics. We used logistic regression to assess any association for race group and age with a PSA ≥4µg/L. Results: SNOMED CT lookup tables were able to report the histological findings for 88% of biopsies. The manual coding revealed a PPV of 0.96 for the pathologist assigned SNOMED CT codes. The predictive analytics and text mining accurately extracted the GS for all 1000 biopsies (F score of 1). There were 22 937 prostatic biopsies referred to NHLS between 2006 and 2016. Of these, a PCa finding was reported for 39% of Black Africans. A high-risk GS was reported for 46% Black Africans. Multiple logistic regression revealed that Black Africans were more likely to have PCa with a GG ≥4 with an odds ratio of 1.45. The ASIR increased from 44.9 in 2006 to 57.3 per 100 000 population by 2016. There were239 506 (86.2%) patients with a first-ever PSA, compared to 277 983 tests. Between 2006 and 2016, numbers of men tested increased from 1 782 to 67 025 respectively, with 186 984/239 506 (78.1%) of tests from clinics. The majority of testing was for men in the 50 - 59 age category and Black Africans. The logistic regression reported that the odds of having a PSA ≥4µg/L was significantly lower for Indian/Asians, Coloureds and Whites than for Black Africans. Conclusions: The feasibility of using SNOMED CT codes to automate PCa data was confirmed. In addition, the reliable extraction of the GS to assess late presentation and prognosis was confirmed. These approaches can be applied to national PCa data. Our findings reveal that Black African men are significantly more likely to present with a PCa with a GS ≥8 compared with other racial groups. Our data suggest that predominantly healthy patients were tested which in turn is indicative of some population-based screening. Local public-sector guidelines need to be aligned to the 2017 urological evidence-based recommendations. There is a need for additional research to understand why Black African men present with higher grade disease. x Acknowledgements I wish to express my thanks and gratitude to the following people for their assistance, guidance and support: • My supervisors Professors George and Glencross for their contributions to the study design, manuscript review and comments on the various chapters. • Professor Mohammed Haffejee and Martin Hale for providing insight into the local context for prostate cancer work. • Dr Innocent Maposa for help with statistical analysis. • Dr Anushka Ajith and Marietha Nel for proof reading. • The National Cancer registry for assisting with the hot deck race imputation for the study. • The corporate data warehouse for implementing the probabilistic matching algorithm. • I am indebted to the staff of the National Health Laboratory Service (NHLS) that generated the prostate specific antigen (PSA) and prostate biopsy data. xi CONTENTS Dedication ............................................................................................................................... iii Authors contribution to the work ............................................................................................. iv Publications arising from this thesis ......................................................................................... v Contents .................................................................................................................................. xi List of TABLES ....................................................................................................................... xv List of Figures ..................................................................................................................... xviii List of Abbreviations .............................................................................................................. xxi Preface ................................................................................................................................ xxiv Chapter 1 – literature review ................................................................................................... 1 1.1 Introduction ....................................................................................................................... 1 1.2 Purpose of the literature review ..................................................................................... 1 1.3 Summary and research gaps ........................................................................................ 1 1.4 Aims .............................................................................................................................. 2 1.5 Scope of the literature review ........................................................................................ 2 1.6 Non-Communicable Diseases (NCD) .................................................................................................................. 3 1.7 Cancer ........................................................................................................................... 3 1.8 Prostate cancer ............................................................................................................. 4 1.9 Prostate cancer epidemiology ....................................................................................... 6 1.10 How prostate cancer biopsies are processed and reported ........................................ 7 1.11 Urological guidelines for prostate cancer screening and diagnosis in South Africa .. 10 1.12 Local public sector and urological prostate cancer screening and diagnosis guidelines in South Africa between 2006 and 2016 ........................................................................... 13 1.13 Other prostate cancer guidelines .............................................................................. 15 1.14 Controversy about prostate-specific antigen testing ................................................. 16 1.15 Global prostate cancer studies .................................................................................. 19 1.16 African prostate cancer studies ................................................................................. 21 1.17 Local prostate cancer studies .................................................................................... 26 1.18 National prostate cancer registry data ....................................................................... 30 1.19 Challenges with the manual extraction of prostate biopsy data and the way forward 31 Chapter 2 - Using Systematized Nomenclature of Medicine (SNOMED) clinical terms (CT) codes to assign histological findings for prostate biopsies in the Gauteng PROVINCE, South Africa: Lessons learnt ........................................................................................................... 37 2.1 List of definitions for terms used in this chapter .......................................................... 38 2.2 Introduction .................................................................................................................. 39 2.3 Aim and Objective ....................................................................................................... 41 2.4 Methods ....................................................................................................................... 41 2.4.1 Study design ......................................................................................................... 41 xii 2.4.2 Inclusion /Exclusion Criteria .................................................................................. 41 2.4.3 Sample Population ................................................................................................ 41 2.4.4 Ethical considerations ........................................................................................... 41 2.4.5. Data extraction and preparation .......................................................................... 41 2.4.6 Combining SNOMED descriptions for the lookup table ........................................ 43 2.4.7 Coding the SNOMED M and T lookup tables ....................................................... 45 2.4.8 Combining the lookup tables with the prostate biopsy data .................................. 45 2.4.9 SNOMED M and T code descriptive analysis ....................................................... 48 2.4.10 Descriptive analysis of diagnosis and sub-diagnosis .......................................... 48 2.4.11 Evaluating the accuracy of the SNOMED CT codes .......................................... 48 2.5 Results ........................................................................................................................ 48 2.5.1 SNOMED M and T code descriptive analysis ....................................................... 48 2.5.2 Descriptive analysis of diagnosis and sub-diagnosis ............................................ 49 2.5.3 Evaluating the accuracy of the SNOMED CT codes ............................................ 51 2.6 Discussion ................................................................................................................... 52 2.7 Conclusion ................................................................................................................... 55 Chapter 3 - Using text mining techniques to extract prostate cancer prognostic information (Gleason score) from semi-structured narrative laboratory reports in the Gauteng Province, South Africa .......................................................................................................................... 57 3.1 Introduction .................................................................................................................. 61 3.2 Materials and methods ................................................................................................ 63 3.2.1 Study design ......................................................................................................... 63 3.2.2 Ethical considerations ........................................................................................... 63 3.2.3 Text mining algorithm development ...................................................................... 63 3.2.4 Data acquisition .................................................................................................... 65 3.2.5 Pre-processing ...................................................................................................... 66 3.2.6 Feature extraction ................................................................................................. 68 3.2.7 Feature value representation ................................................................................ 68 3.2.9 Information extraction ........................................................................................... 71 3.2.10 Classification ....................................................................................................... 71 3.2.11 Discovered knowledge. ....................................................................................... 71 xiii 3.2.12 Text Mining Algorithm Evaluation ....................................................................... 71 3.2.13 Statistical analysis .............................................................................................. 72 3.3 Results ........................................................................................................................ 72 3.3.1 Text Mining Algorithm performance ...................................................................... 72 3.3.2 Text Mining precision and recall ........................................................................... 72 3.3.3 Gleason score formats reported ........................................................................... 73 3.3.4 Gleason score frequency analysis ........................................................................ 74 3.3.4 Gleason risk category analysis ............................................................................. 74 3.3.5 Validation analysis ................................................................................................ 75 3.4 Discussion ................................................................................................................... 76 3.5 Conclusion ................................................................................................................... 78 Chapter 4 - Retrospective analysis of prostate-specific antigen testing trends and primary health care public-sector guidelines in the Gauteng Province, South Africa, between 2006 and 2016 ...................................................................................................................................... 79 4.1 Introduction .................................................................................................................. 82 4.2 Methods ....................................................................................................................... 83 4.2.1 Study design ......................................................................................................... 83 4.2.3 Ethical considerations ........................................................................................... 83 4.2.4 Data extract .......................................................................................................... 83 4.2.4 Data preparation ................................................................................................... 84 4.2.5 Statistical Methods ................................................................................................ 85 4.3 Results ........................................................................................................................ 85 4.3.1 Number of patients receiving first-ever PSA test .................................................. 87 4.3.2 Number of patients receiving a first-ever PSA test by age category and race group ....................................................................................................................................... 88 4.3.3 Descriptive statistics by age and race group ........................................................ 91 4.3.4 Association between an elevated PSA with age and race group ......................... 92 4.4 Discussion ................................................................................................................... 93 Chapter 5 - Prostate Cancer age-standardised incidence increase between 2006 and 2016 in the Gauteng Province, South Africa: a laboratory data-based analysis ................................ 97 5.1 Introduction .................................................................................................................. 99 5.2 Methods ..................................................................................................................... 100 5.2.1 Study design ....................................................................................................... 100 xiv 5.2.2 Inclusion criteria .................................................................................................. 100 5.2.3 Biopsy data preparation ...................................................................................... 100 5.3 Results ...................................................................................................................... 102 5.3.1 Prostate biopsy outcomes by age and race group ............................................. 102 5.3.2 Analysis of low, intermediate and high-risk Gleason scores ............................... 106 5.5 Discussion ................................................................................................................. 110 Chapter 6 – Discussion ....................................................................................................... 113 6.1 Summary of findings .................................................................................................. 113 6.1.1 SNOMED CT Lookup Tables .............................................................................. 115 6.1.2 Using text mining and predictive analytics to extract the GS .............................. 115 6.1.3 Primary health care first-ever PSA trends .......................................................... 116 6.1.4 Prostate biopsy ................................................................................................... 117 6.1.5 Prostate cancer age-standardised incidence ...................................................... 118 6.1.6 Implications of our findings ................................................................................. 119 6.2 Conceptual framework and practical aspects for local and national implementation 126 6.3 Areas of future research ............................................................................................ 127 6.4 Limitations and challenges ........................................................................................ 128 6.5 Recommendations .................................................................................................... 129 6.6 Conclusions ............................................................................................................... 131 7. References ...................................................................................................................... 133 Appendix A – PUBLICATION .............................................................................................. 151 Appendix B – PUBLICATION .............................................................................................. 161 Appendix C – PUBLICATION ............................................................................................. 163 Appendix D – PUBLICATION ............................................................................................. 166 Appendix E – Ethics and other approval letters .................................................................. 174 Appendix F - Declaration and co-author signature ............................................................. 176 Appendix G - Similarity report ............................................................................................. 178 Appendix H - Detailed information on the HIV M&E dashboard ......................................... 182 Appendix I – Other HIV M&E Dashboards .......................................................................... 184 Appendix J – Additional visualisations for the text mining algorithm ................................... 186 xv LIST OF TABLES No Table Description Page 1.1 Grade Group (GG) reporting indicating the corresponding Gleason score 9 1.2 Recommended terminology to be used to report prostate biopsy histological findings 10 1.3 The tumour, node, metastasis (TNM) classification system used for cancers 12 1.4 Risk stratification for planning treatment options 12 1.5 Risk of prostate cancer (PCa) in relation to the total prostate specific antigen (PSA) result and Gleason score (GS) 13 1.6 Local and other urological prostate cancer (PCa) guidelines 15 1.7 Prostate cancer recommendation from other countries 16 1.8 Studies providing conflicting prostate screening data 18 1.9 The GLOBOCON prostate cancer age-standardised incidence rate per 100 000 population in 2012 and 2018. The number of new cases are also reported 21 1.10 Number of prostate cancer cases and the age-standardised incidence rate per 100 000 population is reported for 46 African countries in the 2018 GLOBOCON study. Data for is study is sorted by age-standardised incidence rate in descending order. The matching highest the age-standardised incidence rate per 100 000 population reported by the Adeloye et al is reported in the last column 23 1.11 Pooled age-standardised incidence rate by age categories for African countries 24 1.12 Findings from African studies 25 1.13 Percentage of Black Africans for each study compared to the Census 2011 data for each respective Province 28 1.14 Findings from local prostate cancer (PCa) studies 29 1.15 Table describing how the semi-structured narrative prostate biopsy data meet the six criteria for big data 34 2.1 Example of four prostate biopsies where the SNOMED code descriptions were assigned a diagnosis and sub-diagnosis including the allocation of ICD-0-3 codes. The biopsy result text are also provided. Episode numbers are anonymised 47 2.2 Contingency table to assess the percentage of SNOMED M and T codes populated using the mapping table for prostate biopsies between 2006 and 2016 in the Gauteng Province, South Africa 49 2.3 Top ten most commonly requested SNOMED M code combinations from the prostate biopsy data between 2006 and 2016 in the Gauteng Province, South Africa 49 2.4 Descriptive analysis of diagnosis and sub-diagnosis where both a SNOMED M and T code are populated of prostatic origin between 2006 and 2016 in the Gauteng Province, South Africa. Where appropriate, the ICD-0-3 codes are provided in brackets 51 2.5 Contingency table used to determine the accuracy of the pathologist assigned SNOMED CT codes for 1000 random prostate biopsies with a ‘Benign/negative for malignancy’ or ‘Neoplasm, malignant’ diagnosis. These biopsies were randomly selected from prostate biopsies that were performed between 2006 and 2016 in the Gauteng Province, South Africa. 52 xvi No Table Description Page 3.1 Example of the semi-structured narrative prostate biopsy report. The narrative biopsy report included the headings clinical history, macroscopy and pathological diagnosis 62 3.2 Description of the various python libraries used for the text mining algorithm to extract the Gleason score from narrative prostate biopsy reports. 64 3.3 N-grams feature extraction output for a sample of biopsies 68 3.4 Performance of the text mining algorithm to automate the extraction of the Gleason score from narrative prostate biopsy narrative reports. A contingency table was used to compare the manually coded and algorithm predicted values. We reported the precision, recall and F- score reported for the first and updated text mining algorithm output as well as for the validation dataset. 73 3.5 Different Gleason score formats reported. The clean extracted score reported, and the original value reported in the prostate biopsy report is indicated. 74 3.6 Biopsy volumes for the five most commonly reported Gleason scores. The table reported the Gleason score and frequency for the top five reported scores with the remaining scores grouped and reported as “Others”. 74 3.7 Comparison of low, intermediate and high-risk Gleason scores for the predicted and manually coded values. The macro-average F-score is reported. 75 3.8 Biopsy volumes for the five most commonly reported Gleason scores for the validation dataset. The table reported the Gleason score and frequency for the top five reported scores with the remaining scores grouped and reported as “Others”. 76 4.1 Descriptive statistics for a first-ever total prostate-specific antigen (PSA) test reported for age category and race group between 2006 and 2016 in the Gauteng Province, South Africa. The overall median and interquartile range (IQR) as well as the proportion of men with a with a first-ever total PSA test ≥4, <10 (low-risk), 10 - 19.9 (intermediate-risk) and ≥20µg/L (high- risk) are reported 92 4.2 Logistic regression to assess the association between samples with a first-ever total prostate- specific antigen (PSA) ≥4µg/L and race group as well as age category between 2006 and 2016 in the Gauteng Province, South Africa. We controlled for race group and age category. 93 5.1 Prostate biopsy descriptive statistics by race group. Prostate biopsy descriptive data for each race group reported as a table for the Gauteng Province between 2006 and 2016. Mean age (range), biopsy numbers and percentages reported for age category, race group, biopsy results and prostate cancer results. One-way analysis of variance (ANOVA) reported for age as a continuous variable and Chi-square test used to identify whether there was a statistical difference for age category and biopsy findings for the four race groups 103 5.2 Multiple logistic regression to assess the association between prostate cancer outcomes and Gleason score and race group and age category. For the dependent variable, a grade group 105 xvii No Table Description Page (GG) of 4 or 5 was coded as 1 (GS ≥8) and GG 1-3 as 0 for a prostate cancer histological finding with a Gleason score (GS) reported 6.1 Consolidated finding 113 6.2 Table describing some of the studies that could be undertaken once the Corporate Data Warehouse target area for Non-Communicable Diseases has been developed. Details are provided for the disease, laboratory test/s, study objectives and outcomes assessed 125 xviii LIST OF FIGURES No Figure Description Page 1.1 Anatomy of the prostate gland 5 1.2 Risk factors for prostate cancer (PCa) 7 1.3 Typical anatomical pathology workflow 8 1.4 Prostate cancer age-standardised incidence rate per 100 000 population per country in 2018 for the most commonly diagnosed neoplasm reference number. Data for South Africa is reported as a red bar 20 1.5 Bar chart reporting National Cancer Registry national age-standardised incidence rate (ASIR) estimates for South Africa between 1996 and 2014 31 1.6 The six criteria for big data 33 1.7 Illustrating the use of big data and the application of artificial intelligence 35 2.1 High-level overview of the steps taken to code each chained SNOMED code description combination. There were two separate SNOMED data extracts from the laboratory information system; morphology (M) and topography (T). The colour coding indicates the various processes: (i) green: data extracts, (ii) yellow: SNOMED code manipulation in preparation for lookup table development and (iii) orange: lookup tables with coded variables. The data extract was prepared using the two lookup tables to generate the following new coded 43 2.2 Six step procedure used to transform the chained comma separated SNOMED M codes into individual columns (leaving the original value intact) to add the laboratory information system code table descriptions (in preparation for lookup table development). The same procedure was conducted for T codes. The manipulation was achieved using standard Microsoft Excel functions (screenshots included next to each step). The steps are as follows: (1) copy unique chained codes to a new worksheet and then copy and paste to a new column for processing (leaving the original values intact), (2) use the Microsoft Excel text to column function to separate the chained codes and name new columns, e.g. M Code 1-n, (3) insert a new column next to each code column and label as a description column, e.g. M Code Descr 1- n, (4) Add the alphabetically sorted laboratory information system SNOMED code description in a new worksheet, (5) Use the Microsoft Excel VLOOKUP function to add the code description (range lookup set at 1 for an exact match), e.g. M-00100 code description is ’Normal tissue (finding)’ and (6) Microsoft Excel CONCATENATE function was used to combine code descriptions in a new column 44 2.3 Relational database diagram describes how the various tables were joined (left outer). For each table, the primary and foreign keys are provided. The lookup tables contain only the unique SNOMED code combinations. The lines indicate a relationship join between tables. Once the table joins are implemented, variables from any table can be reported. Structured query language (SQL) could be used to combine the required data for analysis 46 3.1 Diagram describing the logical processes used to analyse the raw narrative prostate biopsy report to generate the discovered knowledge. The steps were as follows: (i) data acquisition (ii) pre-processing and (iii) feature extraction, (iv) feature value representation, (v) feature selection, (vi) information extraction (vii) classification and (viii) discovered knowledge. 65 xix No Figure Description Page 3.2 Diagram depicting the visualisation of the corpus before (A) and after cleaning (B). The larger the text the more important and frequent the term is in the narrative biopsy reports 67 3.3 Horizontal bar graph depicting the top twenty occurring unigrams (A), bigrams (B), trigrams (C) and quadgrams (D). The number of occurrences is displayed on the x-axis. 70 4.1 Flow chart depicting all the data steps to generate the data used to create the first-ever PSA cohort for primary health services between 2006 and 2016 in the Gauteng Province, South Africa 86 4.2 Percentage year on year change reported as a bar chart for patients with a first-ever total PSA test at PHC services in the Gauteng Province, South Africa. The annual test volumes were reported as a line chart. As lower test volumes were reported in 2012 and 2013 (indicated by the dotted line), the 2014 and 2015 percentage year on year change calculations were based on the 2011 numbers. 87 4.3 Annual PSA test volumes by unit type for primary health care (PHC) services. The annual PSA volumes are reported for clinics (blue bars) and community health centres (CHC) (dark grey bars) between 2006 and 2016 in the Gauteng Province, South Africa 88 4.4 Line chart reporting the number of patients with a first-ever total prostate-specific antigen (PSA) test by year and age category (A) and race group (B) for primary health care (PHC) facilities between 2006 and 2016 in the Gauteng Province, South Africa 90 4.5 Bar chart reporting the number of patients with a first-ever total prostate-specific antigen (PSA) test by race group and age category for PHC facilities between 2006 and 2016 in the Gauteng Province, South Africa. 91 5.1 Age distribution for prostate cancer findings. Biopsy numbers reported by age category for both prostate cancer and non-prostate cancer histological findings in the Gauteng Province between 2006 and 2016 as a population pyramid. 104 5.2 Line chart reporting prostate biopsy numbers for the Gauteng Province between 2006 and 2016. Stacked bar charts indicating annual prostate biopsy numbers. The calendar year reported on the X-axis and the number of prostate biopsies on the Y-axis 106 5.3 Percentage of prostate biopsies with a low (GS ≤6), intermediate (GS =7: 3 + 4 =7 and 4 + 3 =7), and high (GS ≥8) risk Gleason score by race group. Bar chart used to report on the percentage of prostate biopsies with prostate cancer finding by GS risk categories. The grade group (GG) is indicated in brackets, with 4 and 5 combined to report a GS ≥8. The percentages reported as data labels per race group 107 5.4 Percentage of prostate biopsies with a low (GS ≤6), intermediate (GS =7: 3 + 4 =7 and 4 + 3 =7), and high (GS ≥8) risk Gleason score by calendar year. Bar chart used to report on the percentage of prostate biopsies with prostate cancer finding by GS risk categories. The grade group (GG) is indicated in brackets, with 4 and 5 combined to report a GS ≥8. The percentages reported as data labels per calendar year 108 5.5 Age-standardised prostate cancer incidence rates between 2006 and 2016 reported as orange dots with the average annual percentage change (AAPC) reported as a blue line. The year of diagnosis reported on the X-axis and the age-adjusted rate on the Y-axis 109 6.1 Prostate cancer conceptual framework 120 6.2 Flow chart depicting the relationship between guidelines, PSA and prostate biopsy testing. The information that PSA and prostate biopsy data informs is also indicated. 122 xx No Figure Description Page 6.3 Database diagram showing how the SNOMED CT lookup tables and text mining application programming interface (API) could automate the generation of a PCa diagnosis. The private sector prostate cancer data extract could also be accommodated through the CDW process. The Expert Committee would code all new SNOMED CT M and T code combinations to update the CDW lookup tables. 123 6.4 Prostate cancer conceptual framework 127 6.5 HIV Monitoring and Evaluation Facility Indicators and Trends (All Ages) dashboard screenshot (taken from the National Institute for Communicable Diseases (NICD) monitoring and evaluation online HIV programme dashboards). The dashboard displayed is titled ‘HIV M&E Facility Indicators and Trends (All Ages)’. 130 xxi LIST OF ABBREVIATIONS Abbreviation Description AAPC Annual Average Percentage Change AARQA Academic Affairs, Research and Quality Assurance AFR World Health Organisation African region AI Artificial Intelligence AMR Antimicrobial Resistance ANOVA Analysis of Variance AP Anatomical Pathologist API Application Programming Interface ART Antiretroviral Therapy ASIR Age-standardised incidence rate AUA American Urological Association BPH Benign Prostatic Hyperplasia CDW Corporate Data Warehouse CHB Chris Hani Baragwanath Hospital CHC Community Health Centre CI Confidence Interval CMJAH Charlotte Maxeke Johannesburg Academic Hospital CSV Comma separated values CT Clinical terms DGM Dr George Mukhari DHIS District Health Information System DRE Digital Rectal Examination EAU European Association of Urology EML Essential medicines list ERSPC European Random Study of Screening for Prostate Cancer ESMO European Society for Medical Oncology ETL Extract, Transform and Load FN False Negative FP False Positive FPSA Free PSA FT Free to total PSA ratio GG Gleason grade group GS Gleason score xxii Abbreviation Description GWAS Genome wide association studies HbA1c Glycated Haemoglobin HCW Health care worker HIE Health Information Exchange IARC International Agency for Research on Cancer ICD International Classification of Diseases ICD-0-3 International Classification of Diseases for Oncology – 3rd Edition ICD-10 International Classification of Diseases, Tenth Revision ICPCG International Consortium for Prostate Cancer Genetics IDE Integrated Development Environment IQR Interquartile range ISUP International Society of Urological Pathology LIS Laboratory Information System LMIC Low-middle income country LTFU Loss-To-Follow Up LUTS Lower urinary tract symptoms M Morphology M&E Monitoring and Evaluation ML Machine learning NCCN National Comprehensive Cancer Network NCCP National Cancer Control Programme NCD Non-Communicable Diseases NCR National Cancer Registry NDoH National Department of Health NHLS National Health Laboratory Service NICD National Institute for Communicable Diseases NLP Natural Language Processing NLTK Natural Language Toolkit ODS Operational Data Store OR Odds ratio PBCR Population-based cancer registry PCa Prostate Cancer PCA3 Prostate cancer antigen 3 PCF Prostate Cancer Foundation of South Africa PHC Primary Health Care xxiii Abbreviation Description PMTCT Prevention of Mother-to-Child Transmission PSA Prostate specific antigen PSAV PSA velocity RCT Randomised controlled trail SAPCS South African Prostate Cancer study SBA Steve Biko Academic Hospital SCM Supply chain management SEER Surveillance, Epidemiology, and End Results SNOMED Systematized Nomenclature of Medicine SNP Single Nucleotide Polymorphism SQL Structured Query Language SSA Sub-Saharan Africa STATS SA Statistics South Africa STG Standard treatment guidelines T Topography TB Tuberculosis TN True Negative TNM The tumour, node and metastasis classification system TP True Positive TRUS Transrectal ultrasound USA United States of America USPSTF The United States Preventative Services Task Force UTI Urinary Tract Infection WHO World Health Organisation WSP World Standard Population xxiv PREFACE This PhD thesis is presented as a number of chapters as follows: • Chapter 1 is the literature review. • Chapters 2 to 5 contain the results of the studies undertaken for this PhD. Each chapter contains the results and discussion for each research article (published articles are provided as appendices). • Chapter 6 is the conclusions for the study as well recommendations for further work. 1 CHAPTER 1 – LITERATURE REVIEW 1.1 INTRODUCTION This chapter will discuss the related literature reviewed for this study which outlines the key concepts related to this work. 1.2 Purpose of the literature review The purpose of the literature review is to identify and critically evaluate the available information, in the form of journal articles, policies and guidelines. 1.3 Summary and research gaps There is limited local PCa data, with most studies using small sample sizes to predominantly evaluate patients presenting at urology clinics at a regional or tertiary hospital [1-6]. Data from a urology setting for patients referred from lower levels of care with a suspicion of PCa is then not representative of the broader male population. These studies also used small samples sizes affecting generalisability [1-6]. These studies reported late presentation with advanced PCa (PSA ≥20µg/L and a GS ≥8) disproportionately affecting Black African men, resulting in a poorer prognosis [1-3]. The NCR reports annual PCa incidence data at the national level from data from both the public and private sector laboratories [7]. The latest NCR data is reported for the 2016 calendar year in 2020, with 8 332 incident cases of PCa and an ASIR of 47.48 [8]. This data is unfortunately not available at the provincial level. To obtain findings, all local data was generated by manually reading the prostate biopsy semi-structured reports to code the histological findings. This is a time-consuming process with the potential for transcription errors as well as delays in PCa surveillance data. One of the challenges with NCR data is that the absence of GS findings makes it difficult to assess late presentation. The absence of the GS is significant given the local data indicating that late presentation disproportionally affects Black African men [1-3]. There is a need for comprehensive PCa data that includes the PSA results, prostate biopsy histological findings and GS for an entire province that will improve both generalisability and provide detailed data at a level below national reporting [7]. Furthermore, there is a need to move away from manual coding to speed up the reporting of local PCa data. This has never been attempted locally. Automating the reporting of histological findings as well as the GS makes it possible to report data on a regular basis to assess late presentation for the entire 2 population. This will provide important insights that have not been reported locally for a large sample size. 1.4 Aims The aim of this study was to conduct a retrospective study to describe prostate cancer screening and histological diagnosis using prostate and PSA routinely collected laboratory data. A further aim was to describe presentation specifically for Black Africans, describe first- ever PSA testing trends for PHC services and to developed automated methods to reduce the burden of manual coding. The objectives of the study were to use laboratory data to: • Develop lookup tables to automate the reporting of prostate biopsy data. • Develop a text mining algorithm to extract the GS. • Describe the number of prostate biopsies performed annually. • Describe the number of prostate biopsies by race group and age category. • Describe the number of prostate biopsies with a PCa histological finding (Adenocarcinoma and GS reported). • De-duplicate the prostate biopsy data to identify incident patients that were diagnosed with PCa to report ASIR. • Describe the number of patients presenting with advanced disease (GS ≥8). • Describe first-ever PSA testing trends for PHC services in the Gauteng Province. 1.5 Scope of the literature review The literature review is presented according to the following headings: • Non-communicable diseases (NCD). • Cancer • Prostate cancer • Prostate cancer guidelines • How prostate cancer biopsies are processed and reported • Global prostate cancer data • African prostate cancer data • Local prostate cancer data • Local cancer registry data • Summary and research gaps 3 1.6 Non-Communicable Diseases (NCD) Non-communicable diseases (NCD) are due to a combination of genetic-, physiological-, environmental- and behavioural factors [9]. Globally, the most prevalent NCD include cardiovascular, cancer, chronic respiratory disease and diabetes [10]. In South Africa, cardiovascular, diabetes mellitus, respiratory diseases and cancers contributed to 12% of the overall disease burden [11]. Unfortunately, the human-, social- and economic consequences of NCD are felt mainly in poor and vulnerable populations [12]. In South Africa, the burden of NCD disproportionately affects urban poor people [11]. Globally, modifiable and metabolic risk factors are the most common causes of NCD [13]. These modifiable risk factors include physical inactivity, tobacco use, unhealthy diet and the harmful use of alcohol [13]. The metabolic risk factors include raised blood pressure, obesity, hyperglycaemia and hyperlipidaemia [13]. Both the modifiable and non-modifiable risk factors would be different for each NCD, e.g. cancer. For modifiable risk factors, the cancers of interest are lung, liver, esophageal, liver, skin, cervical, stomach, colorectal, etc [14]. Insufficient physical activity is the leading modifiable risk factor for NCD related deaths, causing approximately 3.2 million global NCD deaths per annum [12]. This has been linked to colon and breast cancer [14]. Furthermore, tobacco usage increases the risk of cardiovascular, cancer, chronic respiratory disease and diabetes [12]. Smoking has been linked to a number of cancers in the lung, oral cavity, pharynx, oesophagus, stomach, colorectum, liver, pancreas, etc [14]. Excessive dietary sodium (salt) consumption is associated with an increased risk of hypertension and cardiovascular disease [12]. The harmful use of alcohol also increases the risk of developing NCD [12]. Alcohol intake has been associated with lip, oral cavity, pharynx, oesophageal, colorectal, larynx and breast cancer [14]. In 2008, there were 56 million deaths globally, of which 38 million (68%) were due to NCD. Most of these deaths were in low- and middle-income countries (LMIC) [15]. In South Africa, the probability of dying from the four main NCD is ≥25%, which are cardiovascular, cancer, respiratory disease and diabetes [16, 17]. By 2010, NCD accounted for 39% of all deaths in South Africa [17]. 1.7 Cancer Cancer is a generic term that describes a group of diseases that are characterised by the growth of abnormal cells beyond their usual boundaries [18]. The abnormal cancerous cells can invade adjoining organs [18]. Other terms for cancer include malignant tumours and neoplasms [18]. Cancer is the leading cause of global mortality for countries of all income 4 levels [19]. The number of cancer cases are projected to increase rapidly, fuelled by population growth, increasing life expectancy, and the adoption of lifestyle behaviours that increase cancer risk [19, 20]. Cancer can affect most parts of the body and has many anatomic and molecular subtypes that each require specific management strategies [21]. Globally, the most common male cancers are lung, prostate, colorectal, stomach and liver [21]. While NCD are responsible for the majority of global deaths in 2018, cancer is expected to rank as the leading cause of death and an important barrier to increasing life expectancy across the globe in the 21st century. [22]. In 2018, it was reported that cancer is expected to be the primary or secondary global cause of death before the age of 70 years for 91 of 172 countries (52.9%) [22]. In developed area such as the Americas and Australia, cancer ranks as the number one cause of premature death compared to 3rd to 4th in South Africa [22]. The lower ranking in South Africa is most likely due to the contribution of infectious diseases such as HIV and TB as well as the population distribution [23]. The GLOBOCON 2018 incidence estimates were produced by the International Agency for Research on Cancer (IARC) [22]. This study estimated that 18.1 million new cases and 9.6 million cancer deaths were reported globally in 2018 [22]. The most common male neoplasm was prostate cancer (PCa) for the North and South American, Australian, European and for sub-Saharan Africa (SSA) regions [22]. For two SSA countries, Kaposi sarcoma was the most reported male cancer [22]. The northern African countries (not SSA) reported multiple cancers including lung, liver and non-Hodgkin lymphoma [22]. Lung cancer was the most common male neoplasm for the Asian continent [22, 24]. Globally, lung cancer is the leading cause of mortality for 93/186 countries [22]. PCa mortality was highest in 46/186 countries followed by liver cancer (20/186) [22]. The PCa mortality rates do not follow the patterns reported for incidence, with elevated mortality rates reported in Sub- Saharan Africa and the Caribbean [22]. The highest PCa mortality rates were reported Benin, South Africa, Zambia, Barbados, Jamaica, and Haiti [22]. In South Africa, lung cancer is the leading cause of cancer-related mortality [22]. 1.8 Prostate cancer The prostate gland is similar in size and shape to a walnut and forms part of the male reproductive system [25]. Anatomically, the prostate gland is situated low in the male pelvis, below the bladder and in front of the rectum [25]. 5 The prostate gland is divided into several anatomical zones or regions: (i) peripheral, (ii) transition, (iii) central and (iv) anterior fibromuscular stroma [25]. The peripheral zone is the largest area of the prostate that can be easily examined with a digital rectal exam (DRE) [26]. The transition zone is situated in an area where the urethra passes through the prostate [25]. The central zone is situated behind the transition zone through which the ejaculatory ducts pass [25]. The anterior fibromuscular stroma is the thickened area of tissue that surrounds the apex of the prostate [25]. The peripheral, the central and the transition zones comprise approximately 70%, 25% and 5% of the prostatic glandular tissue (Figure 1.1) [27]. The majority of PCa are located in the peripheral zone (70-80%), compared to between 20% and 30% arise in the transition zone [26, 27]. Figure 1.1 Anatomy of the prostate gland Source: Hand drawing The function of the prostate is to secrete one of the components of semen, which is a milky fluid that carries sperm from the testicles [18]. The prostate gland tends to grow larger as men age [18]. When the growth of the prostate gland squeezes on the urethra, it may cause lower urinary tract symptoms (LUTS) [28]. The LUTS symptoms include voiding, hesitancy, poor and/or intermittent stream, straining, prolonged micturition, feeling of incomplete bladder emptying, dribbling, urge incontinence, and nocturia [29]. 6 Growing older is associated with an increased risk of prostate disease [15]. The most common prostate diseases include: (i) prostatitis, (ii) benign prostatic hyperplasia (BPH), and (iii) PCa [30]. Two studies have reported that prostatitis can significantly increase the odds for PCa [31, 32]. Similarly, a systematic review reported a significant association between BPH and PCa [33]. Prostatitis is an inflammation of the prostate gland that may be caused by acute or chronic infections [34]. BPH is a benign (non-malignant) enlargement of the prostate gland and refers to the stromal and glandular epithelial hyperplasia (abnormal cell growth) [25]. PCa occurs when the cells in the prostate start to grow uncontrollably [35]. The majority of PCa’s are histologically confirmed as adenocarcinoma [35]. Other types of PCa include sarcomas and carcinomas which are rare [30]. Several risk factors have been identified for PCa including age (>50 years), race group (African men), family history and diet [30]. Due to increasing life expectancy, a growing world population and improvements in public health, PCa is a growing concern for global epidemiology [36, 37]. The global burden of PCa is high with one million cases histologically diagnosed annually with an annual mortality rate of 300 000 [36]. PCa is the second most frequently diagnosed cancer among men globally, but in South Africa it is the leading neoplasm in men [38]. 1.9 Prostate cancer epidemiology The most consistently reported risk factors for PCa are age, race group and family history [15]. PCa occurs predominantly in men over the age of 40 years [39]. Data from the histological examination of prostate biopsies and autopsy studies worldwide have indicated that the PCa age-standardised-incidence rate (ASIR) per 100 000 population increases with age [39]. Age is, therefore, the most important risk factor for PCa [15]. Several studies have indicated that PCa is higher among men of Black African descent who also have a higher stage of disease at presentation and poorer prognosis [36, 39-42]. There is significant epidemiological evidence for an increased PCa risk for African Americans compared to Asian Americans suggestive of both genetic and racial differences [39]. The risk of developing PCa doubles in men with a father or brother affected by PCa (family history) [41]. These findings suggest that racial and genetic factors are risk factors for PCa. A number of genetic studies have reported several PCa risk loci identified through linkage studies [39]. The strongest linkage was noted on chromosome one, with candidate genes including HPC1 on chromosome 1q23-35, PCAP on chromosome 1q42-43, and CAPB on chromosome 1p36 [39]. An additional 12 PCa risk regions on different chromosomes including 1q23, 5q11, 5q35, 6p21, 8q12, 11q13, and 20p11-q11 were identified by the International 7 Consortium for Prostate Cancer Genetics (ICPCG) [39, 43]. Occupational exposure to herbicides and pesticides are also risk factors for PCa [39]. Obesity is associated with high- grade PCa, which is possibly from lower androgen levels [44]. These men also have a higher risk of dying with PCa [44]. Furthermore, diet may play a role in PCa risk as countries with a higher dietary fat intake also present with higher PCa mortality rates [44]. Saturated fat intake may increase the risk of PCa-related death, while vegetable fat may have a protective effect after a diagnosis of non-metastatic PCa [44]. Overall, no specific environmental or occupational exposure has been definitively shown to increase PCa risk [45]. However, a relationship between agent orange, a herbicide used by the Americans in Vietnam, with PCa risk as well as aggressive disease exist [45]. There is also epidemiological evidence suggestive of an inverse relationship between sun exposure and PCa risk [45]. While farming has been associated with increased PCa risk, the specific underlying mechanisms have not been established [45]. The PCa risk factors are described in Figure 1.2. Figure 1.2: Risk factors for prostate cancer (PCa) (adapted from reference [26]) 1.10 How prostate cancer biopsies are processed and reported Within the Gauteng Province, prostate biopsies are processed in the anatomical pathology (AP) laboratories based at the Charlotte Maxeke Johannesburg Academic- (CMJAH), Chris Hani Baragwanath Academic- (CHB), Dr George Mukhari Academic- (DGM) and Steve Biko Academic (SBA) hospitals. In these laboratories, the prostate biopsies go through a number of steps to reach the endpoint of a histological diagnosis. The first step is the macroscopic description (also called gross examination) of the biopsy where the number of cores submitted is recorded, with the length of each core reported [26]. The length of the biopsy tissue significantly correlates with the PCa detection rate [26]. The next step involves cutting the Prostate cancer risk factors Age Race group Family history Other risk factors 8 cores into blocks. Each block is placed in a cassette marked with the laboratory number. The cassettes are transferred to an embedding instrument to form wax tissue blocks. Once cooled, the wax tissue blocks are cut using a microtome and placed on glass slides. Next the slides are stained and screened by the pathologist, who will prepare the biopsy report that includes: (i) macroscopic description of the prostate cores, (ii) microscopic description and (iii) the conclusion/discussion section that summarises the histological findings. Here is an example of microscopic findings: ‘Section confirm fibromuscular chips of tissue containing prostate glands. There is no significant epithelial hyperplasia. There is no prostate intraepithelial neoplasia or invasive malignancy in the sections examined’. For a PCa finding, the conclusion includes the tumour type, e.g., adenocarcinoma, and the Gleason score (GS), e.g., 3 + 4 =7 [26]. The typical AP workflow is described in Figure 1.3 below. Figure 1.3: Typical anatomical pathology workflow (modified from the National Health Laboratory Service reflex testing document) The Systematized Nomenclature of Medicine (SNOMED) clinical terms (CT) codes is a comprehensive clinical terminology system that makes it possible to both input and retrieve coded clinical information [46]. The SNOMED CT codes are an ontology supporting multiple types of relationships between clinical concepts.[46] This enables SNOMED CT to be used to efficiently classify a diagnosis [46]. For prostate biopsies, the anatomical pathologist will add the SNOMED Topography (T) and Morphology (M) codes based on the histological findings. The T code identifies anatomic terms for each organ, e.g., T-28000 for a lung biopsy. The M code describes the microscopic changes identified by the AP, e.g., M-4000 for inflammation. The SNOMED CT T and M codes for each biopsy are entered in the laboratory information system (LIS). 9 The prostate biopsy is used both to diagnose PCa and guide management with the AP an integral part of the decision-making process [47]. The management of patients with PCa is directly linked to the biopsy reported GS, percentage and numbers of cores as well as ancillary information such as seminal vesicle involvement and peri-neural invasion [47]. The GS is reported as five different patterns with increasing abnormalities of glandular structure [48]. To generate the GS, the most extensive (primary) pattern, plus the second most common (secondary) pattern is reported out of 5, e.g. 3/5 and 4/5 [26]. This equates to a GS of 3 + 4 =7. The International Society of Urological Pathology (ISUP) developed grade groups (GG) in 2014 to provide a more accurate stratification of neoplasms than the GS [49, 50]. This GG simplifies the classification by reducing the number of grading categories to five [49]. This grading was accepted by the World Health Organisation (WHO) 2016 edition of ‘Pathology and Genetics: Tumours of the Urinary System and Male Genital Organs’ [51]. The GG’s are described with the corresponding GS’s in Table 1.1 below [26]. Table 1.1: Grade Group (GG) reporting indicating the corresponding Gleason score Grade Group Gleason Score 1 2 to 6 2 3 + 4 =7 3 4 + 3 =7 4 4 + 4 =8 or 3 + 5 =8 or 5 + 3 =8 5 9 to 10 Modified from reference [26] The European Association of Urology (EAU) guidelines recommends using the following terminology for reporting prostate biopsies [26]: 10 Table 1.2: Recommended terminology to be used to report prostate biopsy histological findings Histological findings Benign/negative for malignancy Active inflammation Granulomatous inflammation High-grade prostatic intraepithelial neoplasia (PIN) High-grade PIN with atypical glands, suspicious for adenocarcinoma (PINATYP) Focus of atypical glands/lesion suspicious for adenocarcinoma/atypical small acinar proliferation, suspicious for cancer Adenocarcinoma Modified from reference [26] Furthermore, the following information should be reported for a PCa histological finding [26]: • Type of carcinoma, e.g., adenocarcinoma • Primary and secondary GS • Extent of carcinoma (in mm or percentage of cores submitted) • Seminal vesicle, lymphovascular or peri-neural invasion • Gleason grade group (GG) The prostate biopsy report contains essential information on the prognostic characteristics relevant for clinical decision making [26]. 1.11 Urological guidelines for prostate cancer screening and diagnosis in South Africa In 2013, the first local urological diagnostic and treatment PCa guidelines were introduced in South Africa [52]. These guidelines indicate that PCa diagnosis is based on taking a focused urological history and clinical examination [52]. A DRE is recommended for all patients presenting with a clinical suspicion of PCa (symptoms or an elevated PSA) [52]. The indications for a prostate biopsy include an abnormal DRE and/or a total prostate specific antigen (PSA) above the age-related reference range defined as follows: (i) 40 to 50 years: 0 - 2.5ng/ml, (ii) 50 to 60 years: 0 - 3.5ng/ml and (iii) >60 years: 0 - 4.0ng/ml [52]. At first presentation, for men with a normal DRE and a PSA below 10µg/L a repeat PSA is advised in six weeks [52]. PSA testing is recommended for males with a life expectancy of more than 10 years as follows: (i) Black African men from the age of 40 years with a positive family history (PCa and/or breast cancer in a first degree relative), (ii) from the age of 45 years for all males and (iii) patients with a history of lower urinary tract symptoms (LUTS) and/or clinical suspicion of PCa regardless of age group [52]. 11 In addition to a DRE and PSA, additional diagnostic tests are advised [52]. The free PSA (FPSA) is advised for men with a negative first biopsy and an age-related elevated PSA <10µg/L [52]. The free to total PSA ratio (FT) is able to improve decision making for a repeat biopsy, i.e. for a FT >20% a repeat biopsy is not recommended [52]. The PSA velocity (PSAV) is also an additional diagnostic measure, reported as the absolute annual increase in serum PSA (µg/L/year) [52]. A PSAV greater than 0.75µg/L/year is also an indication for a prostate biopsy [52]. Other diagnostic testing include the PCA3 urine test [52]. Although the PCA3 score increases with PCa volume, conflicting data indicate that it may not be able to independently predict the GS [26]. The primary indication for the PCA3 test is to determine whether repeat biopsy is needed after an initial negative biopsy [26]. However, the clinical effectiveness of the PCA3 test for this purpose is uncertain [26]. Therefore, local guidelines do not recommend PCA3 to be used in place of PSA testing [52]. The tumour, node and metastasis (TNM) classification system is used to record the local growth and regional and distant spread of cancer [53, 54]. This classification system is used to stage PCa for patients with a similar outcome [26]. For a primary tumour, the following TMN clinical stages are defined in a urological setting (Table 1.3) [53]. 12 Table 1.3: The tumour, node, metastasis (TNM) classification system used for cancers Tx Primary tumour cannot be assessed T0 No evidence of primary tumour T1 Clinically inapparent tumour not palpable or visible by imaging T1a Tumour incidental histological finding in 5% or less of tissue resected T1b Tumour incidental histological finding in more than 5% of tissue resected T1c Tumour identified by needle biopsy (e.g. because of elevated prostate- specific antigen (PSA) level) T2 Tumour confined within the prostate T2a Tumour involves one half of one lobe or less T2b Tumour involves more than half of one lobe, but not both lobes T2c Tumour involves both lobes T3 Tumour extends through the prostatic capsule T3a Extracapsular extension (unilateral or bilateral) including microscopic bladder neck involvement T3b Tumour invades seminal vesicle(s) T4 Tumour is fixed or invades adjacent structures other than seminal vesicles: external sphincter, rectum, levator muscles, and/or pelvic wall Modified from reference [26] In addition to the TNM clinical staging, the D’Amico risk classification is used in a urological setting [55]. This is one of the most widely used approaches for assessing PCa risk, using the PSA, GS and TNM clinical stage to assign patients as either low, intermediate and high-risk [55]. Patient risk stratification is important for planning treatment options, as described in Table 1.4 below [26]. Table 1.4: Risk stratification for planning treatment options Category Low-Risk Intermediate-Risk High-Risk TNM Clinical Stage T1 to T2a T2b to T2c T3a or T3b Gleason score 2 to 6 3 + 4 =7 or 4 +3=7 8 to10 PSA <10µg/L 10 - 20µg/L >20µg/L PSA: prostate specific antigen Modified from reference [26] In 2016, the EAU PCa guidelines recommended early PSA testing as follows: (i) all men 50 years and older, (ii) >50 years with a family history of PCa, (iii) >45 years for African 13 Americans, (iv) PSA >1µg/L at 40 years and (v) PSA >2µg/L at 60 years [26]. These guidelines were updated in 2021 and early PSA testing is recommended for well-informed men at elevated risk of having PCa as follows: (i) from 50 years of age, (ii) from 45 years of age for those with a family history of PCa, (iii) from 45 years of age for men of African descent and (iv) from 40 years for men carrying BRCA2 mutations [56]. These guidelines indicate the risk of PCa in relation to PSA value, as described in Table 1.5 below [26]. The risk of PCa is 26.9% for a PSA between 3.1 and 4.0µg/L [26]. Table 1.5: Risk of prostate cancer (PCa) in relation to the total prostate specific antigen (PSA) result and Gleason score (GS) PSA level (µg/L) Risk of PCa (%) Risk of Gleason score >7 (%) 0.0 - 0.5 6.6 0.8 0.6 - 1.0 10.1 1.0 1.1 - 2.0 17.0 2.0 2.1 - 3.0 23.9 4.6 3.1 - 4.0 26.9 6.7 PSA: prostate specific antigen PCa: Prostate cancer Modified from reference [26] 1.12 Local public sector and urological prostate cancer screening and diagnosis guidelines in South Africa between 2006 and 2016 Local public sector PCa recommendations were first introduced in the primary health care (PHC) standard treatment guidelines (STG) and essential medicines list (EML) for South Africa in 2014 [57]. In March 1996, local public-sector STG were first published by the National Department of Health (NDoH) for use by health facilities [58]. They are used across South Africa to diagnose and treat patients presenting at public health facilities. The STG with the associated EML, are defined as a systematically developed statement designed to assist health care workers (HCW) to make decisions for the provision of appropriate health care for specific clinical circumstances [59]. There is some evidence that indicates that when pharmaceutical supply is based on an approved EML, there is an opportunity for ineffective, unsafe, or wasteful prescribing [59]. The use of STG can benefit HCW, supply chain management (SCM) and patients by: (i) standardising guidelines for HCW, (ii) encouraging high quality care by directing HCW to use the most appropriate medicines for specific conditions, (iii) enabling HCW to concentrate on making the correct diagnosis using the provided treatment options (algorithms), (iv) utilising only formulary (essential) medicines, 14 (v) ensuring consistent treatment at all levels of care within the health care system and (vi) improving the availability of medicines due to consistent use and ordering [59, 60]. The 2014 public-sector PHC guidelines provided PCa recommendations for the very first time. These guidelines are used by all community health centres (CHC) and clinics across South Africa [61]. This is the fifth edition of these guidelines, with earlier editions produced in 1996, 1998, 2003 and 2008 [61, 62]. Local public-sector guidelines for hospitals were introduced in 2012 and 2015 [63, 64]. These guidelines include a urology section, with no specific PCa recommendations provided [63, 64]. The urology sections in these guidelines provide recommendation for haematuria, urinary tract infection (UTI), BPH, overactive bladder and erectile dysfunction [63, 64]. Local PCa diagnostic and treatment urological guidelines were released in 2013 by the Prostate Cancer Foundation (PCF) of South Africa [52]. Furthermore, these guidelines were reviewed by the South African Urological Association (SAUA), South African Society of Medical Oncology (SASMO) and the South African Society of Clinical and Radiation Oncologists (SASCRO) [52]. The table below summarises PCa recommendations by local public-sector and urological guidelines between 2006 and 2016. 15 Table 1.6: Local and other urological prostate cancer (PCa) guidelines Year (Reference) Guidelines Title and author Recommendations 2008 [62] Public sector Standard Treatment Guidelines and Essential Medicine List for South Africa: Primary Health Care Level National Department of Health Urology section includes no guidelines for PCa 2012 [63] Public sector Standard Treatment Guidelines and Essential Medicine List for South Africa: Hospital Level: Adults National Department of Health No PCa specific guidelines in the Urology section. Information provided for benign prostatic hyperplasia (BPH) amongst others in the Urology sections. 2013 [52] Urological Prostate Cancer Diagnostic and Treatment Guidelines Segone et al PSA testing is recommended in males with a life expectancy of more than 10 years as follows: (i) from the age of 40 in black African patients with a positive family history (PCa/Breast cancer in a first degree relative) and (ii) from the age of 45 years in all other males. Indications for prostate biopsy include an abnormal DRE and/or PSA above the age- related norm as follows: (i) 40 - 50 ≤2.5, (ii) 50 - 69 ≤3.5 and >60 ≤4. If DRE is normal and PSA ≤10 at presentation, repeat PSA in 6 weeks advised. 2014 [57] Public sector Standard Treatment Guidelines and Essential Medicine List for South Africa: Primary Health Care Level National Department of Health Usually occurs in men >50 years of age and is most often asymptomatic. Systemic symptoms, i.e. weight loss, bone pain, etc. occurs in 20% of patients. Referral to higher level of care for patients with suspected cancer 2015 [64] Public sector Standard Treatment Guidelines and Essential Medicine List for South Africa: Hospital Level: Adults National Department of Health No PCa specific guidelines in the Urology section. Information provided for benign prostatic hyperplasia (BPH) amongst others in the Urology sections. 1.13 Other prostate cancer guidelines Various other PCa guidelines have been released in other countries. The evidence-based guidelines from the EAU, American Association of Urology (AUA), National Comprehensive Cancer Network (NCCN) and European Society for Medical Oncology (ESMO) are described in Table 1.7 below [26, 65-67]. 16 Table 1.7: Prostate cancer recommendation from other countries Year (Reference) Guidelines Title and author Recommendations 2013 [66] Early detection of Prostate Cancer American Association of Urology Early detection using the PSA followed by prostate biopsy for diagnostic confirmation. In addition to PSA, DRE, PSA derivatives and isoforms such as free PSA, -2proPSA, prostate health index, hK2, PSA velocity or PSA doubling time and PCa# are recommended. Shared decision making between clinicians and men is a strategy for making health care decisions when there is more than one medically reasonable option. Test offered as follows: using a PSA threshold of 4.0 µg/L No testing for men <40 years Offer testing to men aged to 59 years at average risk. For men ages 55 to 69 years, PSA should be offered by weighing the benefits of preventing prostate cancer To reduce the harms of screening, a routine screening interval of two years is preferred over annual screening No testing for men >70+ years, based on a 10 to 15 year life expectancy. 2016 [26] Guidelines on Prostate Cancer European Association of Urology Offer early PSA testing in men at elevated risk of having PCa as follows: (i) > 50 years of age, (ii) > 45 years of age and a family history of PCa, (iii) African-Americans > 45 years of age, (iv) men with a PSA level of > 1ng/mL at 40 years of age and (v) men with a PSA level of >2 ng/mL at 60 years of age. PCa is usually suspected on the basis of DRE and/or PSA levels. 2019 [67] Prostate Cancer National Comprehensive Cancer Network Perform DRE to confirm clinical stage Perform PSA, calculate PSA density and doubling time Estimate life expectancy Obtain family history Perform prostate biopsy 2020 [65] Prostate cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up European Society for Medical Oncology Offer testing as follows: (i) men >50 years, (ii) men >45 years with a family history of prostate cancer, (iii) African- Americans >45 years and (iv) BRCA1/2 carriers >40 years. Testing in asymptomatic men should not be performed for men with a life expectancy <10 years 1.14 Controversy about prostate-specific antigen testing There is controversy about the role of PSA testing for PCa diagnosis [26]. Some studies suggest that testing may result in over-diagnosis and over-treatment [68, 69]. Hayes et al 17 reported that the available evidence favours a discussion with the men on the pros and cons of PSA testing with average-risk, aged 55 to 69 years [69]. Furthermore, PSA testing should only be offered to men expressing a definite preference therefore [69]. Currently, our local urological guidelines in South Africa still advocate for PSA testing from 40 for Black Africans and 45 years for all other men [52]. The draft 2017 local urological guidelines recommend informed patient-based screening for males with a life expectancy of more than 10 years [47]. Since 2012, there have been several conflicting PCa guidelines [70]. The United States Preventative Services Task Force (USPSTF) guidelines recommended against screening men ≥75 years in 2008 [70]. In 2012, the updated USPSTF guidelines added recommendations to offer no PSA testing for men of all ages [70]. In contrast, the American Urological Association (AUA) guidelines introduced in 2013 recommended against screening for men <40 years [70]. The AUA guidelines did also not recommend routine screening for men aged 40 - 54, but recommended shared decision making about screening for men aged 55 to 69 years [70, 71]. The development of conflicting guidelines are a result of three large prospective randomised controlled trials (RCTs) that published data on screening between 2009 and 2010 [72-74]. Andriole et al compared annual PCa screening versus standard of care as the control at ten American study centres [72]. Following 7 to 10 years of follow-up, the PCa mortality rate was very low and did not differ significantly between the two study groups [72]. Schröder et al analysed data from cancer registries in seven European countries [74]. This study randomly assigned men to a group that offered PSA screening once every four years compared to a control group with very limited screening [74]. The intervention arm of this study reported that PCa mortality decreased by 20%, associated with a high risk of over-diagnosis [74]. Hugossun et al randomised men in a 1:1 ratio to either a screening group invited for PSA testing every two years or to a control group that were not screened [73]. This study showed that PCa mortality was reduced by almost half over 14 years in the intervention arm with the risk of over diagnosis [73]. A Cochrane review reported that PCa screening is not likely to result in a significant reduction in cancer-specific and overall mortality [75]. One of the challenges with PCa screening is over- diagnosis and over-treatment [75]. Therefore, mass screening of PCa is not indicated using a public health approach [26]. In contrast, early diagnosis is advised for well-informed men with 10 to 15 years of life expectancy, based on DRE and PSA testing [26]. Patient screening requires informed consent after a comprehensive discussion of the pros and cons of the complete procedure, considering the patient’s risk factors, age and life expectancy [26]. 18 Table 1.8: Studies providing conflicting prostate screening data Year (Reference) Source (Type) PCa screening recommendation 2013 [75] Ilic et al (Cochrane peer review) All RCTs of screening versus no screening for prostate cancer were eligible for inclusion in this review (n=5). There were 341 342 participants. Patients that received a PSA with or without a DRE. Patients aged 45 to 80 years and duration of follow-up from 7 to 20 years. Men randomised to the screening and control group. There was no statistically significant reduction in PCa mortality between and screening group. 2009 [72] Andriole et al (Prostate, lung, colorectal and ovarian (PLCO) cancer screening trial on prostate cancer mortality) From 1993 to 2001, men aged 55-74 were enrolled at 10 study centres in the United States and assigned to either the annual screening or usual standard of care as the control arm (n=76 693) Cases: Offered annual PSA testing for 6 years and DRE for 4 years (n=38 343) Controls: Offered the standard of care (n=38 350) At 7 years, screening was associated with a relative increase of 22% in the rate of PCa diagnosis, with no reduction in PCa mortality. 2009 [74] Schöder et al (Randomised prostate cancer screening study to assess mortality) The European randomized study of screening for prostate cancer (ERSPC) is a randomised multi-centre trial reporting incidence and mortality for 9, 11 and 13 years of follow-up in the intervention and control arm. Men aged 50-74 were recruited from population registers. Cases: Offered PSA testing Controls: Not offered PSA testing A substantial mortality reduction due to PSA testing. 2010 [73] Hugosson et al (Prostate cancer screening study – Göteborg study) Men born between 1930 and 1944 randomly sampled (n=20 000) from the population register and randomised in a 1:1 ratio to the screening or control group. For both arms, incidence was checked by obtaining cancer registry data. The main outcome was the absolute and relative risk reduction in PCa mortality. Cases: Invited every second year for PSA testing until they reached the upper age limit (Median 69, range 67 – 71 years). Controls: Not offered PSA testing The benefits of PCa screening compare favourably to other studies with mortality reduced by half over 14 years. The risk of over-diagnosis is substantial. 19 1.15 Global prostate cancer studies The 2012 GLOBOCAN data produced by the International Agency for Research on Cancer (IARC) reported global cancer incidence and mortality estimates from 184 countries [76]. This study reported that PCa was the second most common neoplasm in men with an estimated 1.1 million cases diagnosed in 2012 [76]. The global age-standardised incidence rate (ASIR) was 31.1 per 100 000 population in 2012 [76]. PCa incidence varied 25-fold globally across the 184 countries, with higher incidence reported in developed areas such as Australia/New Zealand (111.6), Northern America (97.2), Western- (94.9) and Northern Europe (85) per 100 000 population [76]. In less developed areas such as the Caribbean (79.8), Southern Africa (61.7) and South America (60.1) high incidence rates were also reported per 100 000 population [76]. Incidence remains low in Asian populations [76]. Ferlay et al reported the 2012 GLOBOCON cancer incidence data [38]. The estimated number of PCa cases was 758.7 and 353 thousand for more and less developed regions respectively [38]. It was postulated that the higher incidence rates in developed regions was due to widespread PSA testing and subsequent diagnosis [38]. In many developing countries, PSA testing is not routinely offered [38]. Torre et al reported that developed countries that implemented PSA testing in the 1980’s have noted both a rapid increase in new case detection as well as an increasing incidence [19]. After a while, the incidence decreased rapidly as the pool of prevalent cases to be diagnosed reduced [19]. This data indicate that PCa was the predominant male cancer across most of SSA, North America, South America and Australia [19]. This highlights the burden of PCa in countries like South Africa as the leading male neoplasm. Bray et al reported the 2018 GLOBOCON incidence and mortality estimates [22]. PCa was the most commonly diagnosed male neoplasm in 105 countries including South Africa [22]. This study reported that PCa ASIR ranged from 1.0 per 100 000 population in Bhutan to 189.1 for France, Guadeloupe (Figure 1.4) [22] . 20 Figure 1.4: Prostate cancer age-standardised incidence rate per 100 000 population per country in 2018 for the most commonly diagnosed neoplasm reference number. Data for South Africa is reported as a red bar (modified from reference (10)) This data indicate that PCa ASIR ranged from 5.0 in South Central Asia to 86.4 per 100 000 population in Australia/New Zealand [22]. For Southern Africa, an ASIR of 64.1 per 100 000 population was reported with a relatively high mortality rate of 26.8 [22]. For South Africa, there were 3 639 new cases reported in 2012 that equates to an ASIR of 67.9 per 100 000 population (Table 1.9) [77]. For the same period, 26.4 PCa deaths were reported per 100 000 persons per year [77]. In 2018, a PCa ASIR of 68.0 was reported for South Africa with 12 452 new cases [78]. The number of deaths per 100 000 person per year increased to 27.9 [78]. 21 Table 1.9: The GLOBOCON prostate cancer age-standardised incidence rate per 100 000 population in 2012 and 2018. The number of new cases are also reported South African Number of new cases Age-standardised incidence per 100 000 population Number of deaths per 100 000 persons per year GLOBOCON 2012 3 639 67.9 26.4 GLOBOCON 2018 12 452 68.0 27.9 ASIR: Age standardised incidence rate Adapted from references [77] and [22]. 1.16 African prostate cancer studies Pilleron et al analysed cancer incidence data for calendar years 2008 to 2012 in four sub- Saharan populations in Kenya, South Africa, Uganda and Zimbabwe [79]. This study reported incidence data from high-quality population-based cancer registry (PBCR) stored in the IARC database [79]. The ASIR per 100 000 population was reported for two age categories, i.e., 0 - 59 and ≥60 years [79]. This study reported data for 8 944 cancer cases for adults aged 60 years and older [79]. The percentage of cases by country ranged from 23% in Uganda to 52% for South Africa [79]. Among males aged 60 years and older, PCa was the leading neoplasm in all regions except in South Africa, where oesophageal cancer exceeded all other cancer sites [79]. This study further reported that prostate and oesophageal cancers comprised between 40 and 60% of all male neoplasms for men 60 years and older [79]. The finding that oesophageal cancer was the most common male neoplasm in a South African population registry in the Eastern Cape conflicts with both National Cancer Registry (NCR) and GLOBOCON data [22, 38, 76, 77, 79-81]. This registry was specifically established to monitor oesophageal cancer in a known high-risk area, which may explain why these findings differ [79]. There are 55 countries in Africa, as defined by the African Union (AU) [82]. The 2018 GLOBOCON data provide ASIR estimates for 46/55 African countries (83.6%) [22]. The number of new PCa cases range from 22 in Djibouti to 13 078 for Nigeria [22]. New PCa cases in Nigeria, South Africa (n=12 452) and the Democratic Republic of Congo (n=5 718) represent 42.2% of all cases for this study [22]. The PCa ASIR ranged from 4.4 per 100 000 population in Benin to 68.0 for South Africa [22]. An ASIR of 55.7, 45.6, 43.7, 43.5 and 41.6 per 100 000 population was reported for Benin, Zambia, Ivory Coast, Zimbabwe and Cameroon respectively. Adeloye et al conducted a systematic literature review of PCa studies conducted in Africa between 1980 and 2015 [83]. This study reported PCa incidence data for 40 studies spread across 16 countries [83]. The pooled PCa incidence increased with age, rising from 12.9 for 22 the 40 - 49 age category to 39.0 per 100 000 population for men 70 years and older [83]. Three studies reported PCa incidence data for South Africa ranging from 4.4 to 30.8 per 100 000 population, all from the Eastern Cape Province [83]. This study reported the highest PCa incidence per 100 000 population in Nigeria (182.5), followed by Cameroon (93.8), Zambia (37.7), Uganda (35.5), Ivory Coast (31.4) and South Africa (30.8) [83]. The lowest incidence per 100 000 population was reported for Egypt (0.41), Rwanda (1.02) and Gambia (3.46) [83]. The table below reports the GLOBOCON 2018 data for African countries as well as the highest ASIR as reported by Adeloye et al [22, 83]. 23 Table 1.10: Number of prostate cancer cases and the age-standardised incidence rate per 100 000 population is reported for 46 African countries in the 2018 GLOBOCON study. Data for is study is sorted by age-standardised incidence rate in descending order. The matching highest the age-standardised incidence rate per 100 000 population reported by the Adeloye et al is reported in the last column. Country Number of cases& % of Cases& Age- standardised incidence rate& Highest Age- standardised incidence rate$ South Africa 12 452 16.8% 68.0 30.8 Benin 1 315 1.8% 55.7 Zambia 1 230 1.7% 45.6 37.7 Côte d'Ivoire 2 485 3.4% 43.7 31.4 Zimbabwe 1 299 1.8% 43.5 29.2 Cameroon 2 213 3.0% 41.6 93.8 Angola 2 016 2.7% 41.0 Congo, Republic of 505 0.7% 40.7 Liberia 376 0.5% 39.1 Namibia 206 0.3% 37.3 Equatorial Guinea 97 0.1% 35.9 Burundi 754 1.0% 35.5 Guinea 906 1.2% 35.3 8.1 Congo, Democratic Republic of 5 718 7.7% 35.1 Uganda 2 086 2.8% 34.5 35.5 Eswatini 82 0.1% 34.2 21.5 Nigeria 13 078 17.7% 32.8 182.5 Ghana 2132 2.9% 32.3 Senegal 959 1.3% 32.2 Gabon 192 0.3% 31.0 Kenya 2 864 3.9% 30.9 Central African Republic 330 0.4% 30.2 Rwanda 707 1.0% 29.1 1.02 Sierra Leone 390 0.5% 29.0 Mozambique 1 651 2.2% 27.1 Burkina Faso 689 0.9% 26.5 Lesotho 138 0.2% 25.0 South Sudan 716 1.0% 24.3 Morocco 3 990 5.4% 22.7 Chad 580 0.8% 22.0 Mauritania 202 0.3% 21.9 24 Country Number of cases& % of Cases& Age- standardised incidence rate& Highest Age- standardised incidence rate$ Togo 311 0.4% 20.4 Mali 539 0.7% 17.7 4.7 Libya 317 0.4% 15.6 11.4 Malawi 525 0.7% 15.3 5.5 Botswana 72 0.1% 13.7 Somalia 382 0.5% 13.1 Algeria 2 578 3.5% 13.0 Tunisia 819 1.1% 12.3 11.9 Egypt 3 109 4.2% 9.5 0.41 Sudan 938 1.3% 9.2 The Republic of the Gambia 33 0.0% 8.4 Eritrea 94 0.1% 7.7 Djibouti 22 0.0% 7.6 Ethiopia 1 701 2.3% 6.5 Niger 172 0.2% 4.4 &Globocon 2018 $Adeloye et al Modified from reference [22, 83] Adeloye et al also reported the pooled ASIR estimates by age category (Table 1.11) [83]. The overall ASIR in Africa was 22.0 per 100 000 population [83]. ASIR increased with age from 12.9 for the 40 - 49 age category to 39.0 for men 70 years and older [83]. For the 60 - 69 and 50 - 59 age categories, an ASIR of 25.0 and 16.3 per 100 000 population was reported respectively. Table 1.11: Pooled age-standardised incidence rate by age categories for African countries Age Category PCa ASIR per 100 000 population 40 - 49 12.9 50 - 59 16.3 60 - 69 25.0 70+ 39.0 Total 22.0 ASIR: Age-standardised incidence rate Adapted from reference [83] 25 The findings from African studies are summarised in the table below: Table 1.12: Findings from African studies Author (Year) (Reference) Study design Population Results Adeloye et al (2016) [83] Systematic literature review from 1980 to 2015 for studies that estimated incidence rate of PCa in any African location. This study included population, hospital, and/or registry based PCa studies. PCa identifier by three mechanisms included: (i) histologically confirmed, (ii) reporting diagnosis confirmed by medical practitioner and (iii) reported ICD-0-3 codes. Data extracted included the study location, study period, mean age/age range, population size, and number of incident cases. Data were then sorted into five main African regions (central, east, north, south, and west) and countries represented within each region. The overall pooled PCa incidence for Africa was 21.95 per 100 000 population (CI: 19.93–23.97). Pooled incidence increased as follows: 1980-1989: 15.7 per 100 000 1990-1999: 21.9 per 100 000 2000-2009: 26.1 per 100 000 Pilleron et al (2018) [79] Cancer incidence data for the years 2008 – 2012 were obtained from high- quality population-based cancer registry (PBCR) data stored in the International Agency for Research on Cancer (IARC) database. Incidence data were available for 61 cancer sites as well for all sites combined by gender and 16 age groups. The ASIR per 100 000 population was reported for two age categories, i.e. 0 - 59 and ≥60 years Five sub-Saharan population in Kenya, South Africa, Uganda and Zimbabwe N=8 944 cancer cases (male and female) 52% of cases from South Africa (Eastern Cape population-based registry) Among males ≥60 years, PCa was the leading male cancer (except in South Africa - oesophageal cancer) The largest increase in the estimated average annual percentage change (EAPC) between 2008 and 2012 for males was noted from PCa. Bray et al (2018) [22] The 2018 GLOBOCON data provides prostate cancer age-standardised incidence rate estimates for 185 countries and 36 cancers by age and sex. This data is based on the best available sources of cancer incidence and mortality data in each country. 185 countries across the world PCa ASIR range: Top Six countries with the highest ASIR: South Africa: 68.0 Benin: 55.7 Zambia: 45.6 Côte d'Ivoire: 437 Zimbabwe: 43.5 Cameroon: 41.6 26 1.17 Local prostate cancer studies Several local studies have offered insi