An investigation of the presence of cervical schistosomiasis in patients with cervical cancer and pre-cancer lesions: a text mining approach
Temdemnou, Carole Metekoua
Background: Cervical cancer is the second most common cancer in developing countries. Few studies have associated its incidence rate to cervical schistosomiasis which is also endemic in Sub-Saharan Africa (SSA) while others have rejected this association. This contradiction could be a result of small sample sizes which introduce bias in study ﬁndings. Therefore, there is a need to appraise cost-effective methods such as text-mining. Text-mining of histology reports in laboratory databases may provide sufﬁcient sample size to adequately test the relationship between cervical schistosomiasis and cervical cancer and precancerous lesions and provide more robust ﬁndings. Aim: This study aimed to use text-mining techniques on pathology reports available at National Health Laboratory Service (NHLS) to investigate the presence of cervical schistosomiasis in patients with precancerous lesions and cervical cancer. Methods: Detailed cervical histopathology reports from Inkosi Albert Luthuli Hospital from 2011 to 2017 were obtained from NHLS. Only 50% of records had a diagnosis in terms of cervical precancerous lesions, cancer and other cervical pathologies. The reports were pre-processed and One-Versus-One (OVO) and One-versus-All (OVA) Support Vector Machine (SVM) and Random Forest (RF) classiﬁers were trained and optimised to assign diagnosis to records without diagnosis. The performance evaluation of all classiﬁers was assessed using recall, precision and F-measure. The best performing classiﬁer was used to predict the pathology group of all records in the dataset. Furthermore, word/phrase matching was used to classify records with cervical precancerous lesions into their various grades and cancer records into various types of cervical cancer. Word/phrase was also used to extract the Human Papillomavirus (HPV) and schistosomiasis status from all selected records. Frequencies and proportions were used to describe reporting patterns of schistosomiasis and HPV. Subsequently, the prevalence of schistosomiasis was calculated using a dataset that was restricted to records with a known schistosomiasis status. The Independent t-test was used to assess if there was any relationship between schistosomiasis and age at cervical pre-cancerous lesions and cancer diagnosis. A sensitivity analysis was also done. It was done under the assumption that patients without schistosomisis related terms in their histology report were schistosomiasis negative. Results: OVO SVM accurately classiﬁed 94.2% of records into cervical precancerous lesions, cancer and other pathologies. Schistosomiasis and HPV was reported in 5.58% (n=1,315) and 58.02% (n=13,668) of records, respectively. The v prevalence of schistosomiasis was 42.80% (n=464) in cervical pre-cancerous lesions and 23.3% (n=54) in cervical cancer in samples where schistosomiasis status was reported. This prevalence changed to 2.59% in cervical precancerous lesions and 0.96% in cervical cancer under the sensitivity analysis. The mean age of pre-cancer diagnosis in schistosomiasis positive patients was 38 ± 9.46 years while that of schistosomiasis negative patients was 40 ± 9.86 years. The mean age at cancer diagnosis in schistosomiasis positive patients was 45 ± 13.70 years while that of schistosomiasis negative patients was 50 ± 13.78 years. The mean age difference between the schistosomiasis positive patients and schistosomiasis negative patients were statistically signiﬁcant in both cervical pre-cancerous lesions and invasive cancer. These ﬁndings were similar in the sensitivity analysis. Conclusion: This study shows that text-mining techniques effectively classiﬁed records into various pathology groups. Through text-mining, this study identiﬁed the largest number of cervical precancerous lesions and cancer cases that could be used to explore the relationship between cervical schistosomiasis and cervical precancerous lesions/cancer compared to other studies. Women with schistosomiasis may develop cervical precancerous lesions and cancer 2 and 5 years earlier than patients without schistosomiasis respectively. However, ﬁndings were limited by the low schistosomiasis reporting rates in this population. There is need for pathologists to report the schistosomiasis status in cervical precancerous lesions and cancer as this type of study highly relies on routinely collected data. Moreover, further studies are required to explore the effect of HIV and schistosomiasis treatment on the control of cervical precancerous lesions and cervical cancer in SSA.
A research report submitted in partial fulﬁlment of the requirements for the degree of Master of Science (MSc) in Epidemiology - Public Health Informatics to the Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, 2020