DIGITAL PATHOLOGY & ARTIFICIAL INTELLIGENCE: FEASIBILITY FOR CLINICAL PRACTICE Liron Pantanowitz Original published work submitted to the Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Doctor of Philosophy (PhD) in Anatomical Pathology Ann Arbor, Michigan, USA, 2022 i Declaration I, Liron Pantanowitz, student number 8701541J, declare that this Thesis is my own work and that I contributed adequately towards research findings published in the articles included in my Thesis. This Thesis is being submitted for the Degree of Doctor of Philosophy (PhD) in Anatomical Pathology at the University of the Witwatersrand, Johannesburg. It has not been submitted before for any degree or examination at any other University. Signature of Student: 31 August, 2022 ............................... ......................... Liron Pantanowitz Date Name of Primary Supervisor: Scott Hazelhurst Signature of Supervisor: 31 August 2022 ............................... ......................... Scott Hazelhurst Date Name of Co-Supervisor: Pamela Michelow Signature of Supervisor: 31 August 2022 ............................... ......................... Pamela Michelow Date Agreement by co-authors: Given that this Thesis includes published articles with co- authors, the subsequent tables for each article include signatures of all co-authors ii documenting their agreement for Liron Pantanowitz to use these published articles for his PhD degree. For all of the included published articles Liron Pantanowitz was the first and corresponding author, and primarily responsible for the study design, scientific work, manuscript preparation and successful submission for publication. Article 1: Title: Artificial intelligence-based screening for mycobacteria in whole-slide images of tissue samples. American Journal of Clinical Pathology, 2021, volume 156(1), pages 117-128. Note: 11 signatures of all co-authors are included in the above table. Article 2: Title: Accuracy and efficiency of an artificial intelligence tool when counting breast mitoses. Diagnostic Pathology, 2020, volume 51(1), pages 80. Note: 11 signatures of all co-authors are included in the above table. Article 3: Title: An artificial intelligence algorithm for prostate cancer diagnosis in whole slide images of core needle biopsies: a blinded clinical validation and deployment study. The Lancet Digital Health, 2020, volume 2, pages e407-416. iii Note: 13 signatures of all co-authors are included in the above table. Article 4: Title: A digital pathology solution to resolve the tissue floater conundrum. Archives Pathology & Laboratory Medicine, 2021, volume 145(3), pages 359-364. Note: 8 signatures of all co-authors are included in the above table. iv Presentations arising from this study 1. Validating AI apps for pathology practice, Keynote address. 5th Digital Pathology & AI Congress. June 2019, New York City, NY, USA. 2. Embedding AI into digital pathology workflows. Digital Pathology Association (DPA) companion meeting. United States and Canadian Academy of Pathology (USCAP) annual meeting. March 2020, Los Angeles, CA, USA. 3. Clinical validation and deployment of an AI-based algorithm for prostate cancer diagnosis in whole slide images of core needle biopsies. Wits Faculty of Health Sciences (FHS) research day & postgraduate expo. October 2020, virtual meeting. Note: Awarded best student oral presentation in clinical sciences & therapeutics for health track. 4. Artificial Intelligence and Anatomical Pathology: Feasibility for clinical practice. Canadian Association of Pathologists. October 2020, webinar. 5. AI – Is it ready for the clinic? Swiss Society of Pathology 86th annual scientific meeting. November 2020, virtual plenary talk. 6. Embedding AI into clinical practice: A pathologist’s perspective. European Society of Digital and Integrative Pathology (ESDIP). February 2021, virtual presentation. 7. Artificial intelligence and the practice of Anatomical Pathology. Papanicolaou Society of Cytopathology (PSC) companion meeting, March 2021, USCAP annual virtual meeting. 8. Validating AI systems for clinical use in pathology practice. AI-Med Clinician Series virtual conference. June 29, 2021. 9. Validation of Artificial Intelligence-based tools in Digital and Computational Pathology. Digital Pathology Association (DPA) webinar, September 30, 2021. 10. Validating artificial intelligence for pathology practice. TelemedEdu, USA. October 8, 2021. Troy, MI, USA. v Publications arising from this study Article 1 Pantanowitz L, Wu U, Seigh L, LoPresti E, Yeh FC, Salgia P, Michelow P, Hazelhurst S, Chen WY, Hartman D, Yeh CY. Artificial intelligence-based screening for mycobacteria in whole-slide images of tissue samples. American Journal of Clinical Pathology, 2021; 156(1):117-128. PMID: 33527136. Article 2 Pantanowitz L, Hartman D, Qi Y, Cho EY, Suh B, Paeng K, Dhir R, Michelow P, Hazelhurst S, Song SY, Cho SY. Accuracy and efficiency of an artificial intelligence tool when counting breast mitoses. Diagnostic Pathology, 2020; 51(1):80. PMID: 32622359 Article 3 Pantanowitz L, Quiroga-Garza GM, Bien L, Heled R, Laifenfeld D, Linhart C, Sandbank J, Albrecht Shach A, Shalev V, Vecsler M, Michelow P, Hazelhurst S, Dhir R. Clinical validation and deployment of an AI-based algorithm for prostate cancer diagnosis in whole slide images of core needle biopsies. Lancet Digital Health, 2020; 2(8) e407- e416. PMID: 33328045 Article 4 Pantanowitz L, Michelow P, Hazelhurst S, Kalra S, Choi C, Shah S, Babaie M, Tizhoosh HR. A digital pathology solution to resolve the tissue floater conundrum. Archives of Pathology & Laboratory Medicine, 2021; 145(3):359-364. PMID: 32886759 vi Abstract There is increasing interest in applying artificial intelligence (AI) tools to Anatomical Pathology. AI algorithms can detect rare events, automatically quantify features, and diagnose diseases by analyzing digital images. However, very few laboratories today routinely use such AI tools. Therefore, the aim of this study was to determine the feasibility of developing and validating AI technology for routine use in Anatomical Pathology. Four experiments were conducted that utilized whole-slide image datasets to train and test deep learning models. The first experiment involved a deep learning algorithm to screen digitized acid fast-stained slides for mycobacteria. With AI assistance pathologists were more accurate, quicker and found it easier to identify acid- fast bacilli than using manual modalities. The second experiment critiqued an AI tool that quantified mitotic figures in images of invasive breast carcinoma. For end-users of varying experience their accuracy and time spent counting mitoses improved with AI support. The third experiment concerned a blinded validation study and clinical deployment of an AI-based algorithm to aid reviewing prostate core needle biopsies. This algorithm was highly accurate at detecting prostate adenocarcinoma, distinguishing low- from high-grade Gleason scores, and identifying Gleason pattern 5 or perineural invasion. The fourth experiment demonstrated the success of using an AI-based image search tool to rapidly resolve the tissue floater conundrum encountered in pathology practice. All four AI-based tools successfully aided pathologists with routine tasks typically encountered in pathology practice and overall proved to be more accurate, efficient, standardized and easier to use than outdated and onerous manual methods. vii Acknowledgements Article 1 I thank Colleen Vrbin from Analytical Insights for her help with statistical analysis and Chitra Sharma and Cynthia Carbine from the University of Pittsburgh Medical Center in the USA for their administrative help. I thank the company aetherAI for their help in funding this study. Article 2 I thank all of the participants in this study. I also thank Colleen Vrbin from Analytical Insights, LLC for her help with statistical analysis. I thank the company Lunit for their help in funding this study. Article 3 I thank Ibex Medical Analytics for their help in funding this project. Article 4 The results in this study are partly based on digital images offered for free by The Cancer Genome Atlas (TCGA) Research Network (https://www.cancer.gov/tcga). I thank the company Huron Digital Pathology for their financial support of this project. https://www.cancer.gov/tcga viii Table of Contents Pages Chapter 1: Introduction 1-34 Chapter 2: Published article 1 35-51 Chapter 3: Published article 2 52-64 Chapter 4: Published article 3 65-77 Chapter 5: Published article 4 78-86 Appendix 1: Ethics clearance certificate 87-88 ix List of Abbreviations AI: Artificial intelligence AFB: Acid-fast bacilli AFS: Acid-fast stain ASAP: Atypical small acinar proliferation AUC: Area under ROC curve C#: Computer programming language CBIR: Content-based image retrieval CI: Confidence interval CNB: Core needle biopsy CNN: Convolutional neural network CPU: Central processing unit FN: False negative FP: False positive GB: Gigabyte GPU: Graphics processing unit H&E: Hematoxylin and eosin stain HER2: Receptor tyrosine-protein kinase erbB-2 HPF: High-power field IBM: International Business Machines Corporation IOU: Intersection Over Union KB: Kilobyte Ki67: Proliferation index antigen mAP: Mean AP (area under precision recall curve) Mdn: Median MTb: Mycobacterium tuberculosis NMS: Non-maximum suppression NPV: Negative predictive value OPT: Observer performance test PCR: Polymerase chain reaction x PGY: Postgraduate year PHH3: Phosphorylated Histone H3 PNI: Perineural invasion PPV: Positive predictive value PR: Precision recall RAM: Random-access memory RCNN: Regions with convolutional neural networks SGD: stochastic gradient descent ROC: Receiver operating characteristic ROI: Regions of interest SMC: Samsung Medical Center TB: Tuberculosis TCGA: The Cancer Genome Atlas TP: True positive TUPAC16: Tumor Proliferation Assessment Challenge in 2016 UPMC: University of Pittsburgh Medical Center USA: United States of America WSI: Whole slide image ZN: Ziehl-Neelsen xi List of Figures Chapter (Article #) Figure Number Legend Page Chapter 2 (article #1) 1 Schematic showing acid-fast bacilli detection algorithm training workflow. 38 2 Screenshot from the web portal showing regions of interest (patches) identified by the algorithm in the gallery on the left and the corresponding whole-slide image (WSI) on the right. 39 3 Artificial intelligence–assisted detection of acid-fast bacilli (AFBs) is shown in a whole-slide image (WSI). 41 4 Examples of artificial intelligence– detected acid-fast bacilli (AFB) structures. 42 5 Area under the curve (AUC) for acid-fast bacilli algorithm detection in image patches. 42 6 Proportion of slides positive for acid-fast bacilli (AFBs) screened by different modalities. 43 7 Review time by screening modality. 44 8 Pathologist-perceived difficulty of identifying acid- fast bacilli by screening modality. 45 Chapter 3 (article #2) 1 Flow chart of the methodology and datasets employed in developing and validating an AI-based tool to quantify mitoses in breast carcinoma. 55 2 Web-based tool showing a HPF of breast carcinoma. 56 3 Algorithm performance for mitotic figure detection in the analytical validation dataset. 57 4 Accuracy and precision with and without AI support per user experience level. 58 5 Median number of seconds spent with and without AI support per user experience level. 59 Chapter 4 (article #3) 1 Overview of the algorithm and clinical deployment of the Galen Prostate second read system. 68 2 Examples of diagnoses after review. 70 3 Missed cancer case originally diagnosed as benign. 71 xii Chapter (Article #) Figure Number Legend Page Chapter 5 (article #4) 1 Fabricated slide containing a section of renal cell carcinoma and 2 adjacent separate colon cancer and bladder cancer tissue floaters. 80 2 Schematic illustration of the general idea of using barcodes for image representation: whole slide image indexed by converting separate patches into barcodes. 81 3 Schematic diagram showing how the origin of a suspected floater gets detected. 82 4 Indexing of a sample whole slide imaging (scan with bladder tumor) (B) yielding 33 patches to build a mosaic. 83 xiii List of Tables Chapter (Article #) Table Number Title Page Chapter 2 (article #1) 1 Breakdown of Whole-Slide Images (WSIs) Used in Acid-Fast Bacilli Algorithm Development 37 2 Algorithm Performance at Image Patch and WSI Levels 42 3 Comparison of Algorithm Performance Using Individual or Combined Convolutional Neural Network Models 42 4 Sensitivity, Specificity, PPV, PPV, and Accuracy by Reviewer and Method 43 5 Comparison of Time Category for Pathologists by Review Method (n = 138) 44 6 Summary of Published Studies Using Image Analysis to Identify Mycobacteria 46 Chapter 3 (article #2) 1 Profile of invasive ductal carcinoma cases enrolled in the study 55 2 Accuracy by experience level 58 3 True positive (TP), false positive (FP), and false negative (FN) values for mitotic cell detection 59 4 Median time to count mitoses by study participant experience level 60 Chapter 4 (article #3) 1 Algorithm performance 68 2 Pathologists’ misdiagnoses identified by the algorithm 69 3 Performance of algorithms in detection and grading of prostate cancer 71 Chapter 5 (article #4) 1 Top 20 Primary Sites With the Highest Number of Whole Slide Imaging (WSI) in the Dataset 80 2 Initial Matched Results for Each Tissue Floater (UPMC 300 Whole Slide Imaging Dataset) 81 3 Image Search Results That Match Tissue Floaters (UPMC þ NCI 2325 Whole Slide Imaging Dataset) 82 4 Top 5 Retrieved Patches for Manual Search 83 1 CHAPTER 1: INTRODUCTION 1.1 DIGITAL PATHOLOGY The application of Artificial Intelligence (AI) in Anatomical Pathology requires digital images and a suitable Information Technology (IT) infrastructure. For this reason, a brief overview of Digital Pathology is necessary. Digital imaging has benefited the field of Pathology for clinical practice, research and education [1]. Digital data is often easier to share, archive, integrate and analyze. Digital images also enable pathologists to employ next generation tools such as image analysis [2]. Today, many pathology laboratories are using whole slide imaging (WSI) [3]. WSI is the process whereby a pathology glass slide with tissue from a patient can be digitized (scanned) using a whole slide scanner and the acquired whole-slide image (digital slide) can be viewed on a computer monitor [4]. Numerous commercial vendors offer hardware and/or software solutions for WSI. There are multiple clinical (e.g. telepathology) and non-clinical (e.g. research, education) applications. Furthermore, in several countries there is regulatory approval of this technology for clinical diagnostic use [5]. Nevertheless, whilst there is increasing interest among pathologists to use WSI the global adoption of Digital Pathology has been slow due to technological, financial, regulatory and cultural barriers [6]. Recent efforts to couple WSI with advanced computation and deep learning methods have incited much interest in applying AI to pathology, which in turn is driving the field of Digital Pathology forward. 2 1.2 ARTIFICIAL INTELLIGENCE IN PATHOLOGY AI is a branch of computer science that is currently very topical. Recently, there has been much hype regarding the use of AI tools applied to healthcare [7]. Over the last two decades digital imaging technology and its application in pathology have increased greatly, which has ushered in a new field called Computational Pathology [8]. WSI has enabled the generation of large digital datasets in the Pathology community that, in turn, facilitates the development and validation of AI algorithms. AI has been applied to pathology for both clinical purposes and discovery. AI-based tools have demonstrated outstanding performance, but this has been mostly with narrow (weak) AI systems focused on limited tasks (e.g. mitotic figure detection). A plethora of studies about AI using deep learning in pathology have been published showing novel applications [9- 14]. These algorithms have been used to detect rare events, automatically quantify features in digital images, diagnose diseases (e.g. cancer) from WSI, and even make prognostic predictions by analyzing pixels. Clearly, there is great potential for leveraging digital pathology to support AI [15]. AI in pathology offers better diagnostic accuracy and more precise measurements compared to those rendered by humans, promotes standardization, and also permits automation. AI can accordingly benefit pathologists practicing in most settings, including low middle income countries. For example, there is a dearth of anatomical pathologists in South Africa [16]. If pathologists in South Africa could exploit AI to handle more routine and 3 mundane tasks this could free them up to perform more important work and allow them to rather focus their attention on more difficult cases. It has been pointed out in the literature that most studies evaluating AI applications in healthcare to date have not been vigorously validated for reproducibility, generalizability and safety in the clinical setting [17-19]. (Bi et al 2019; Maddox et al 2019). Despite advances with AI technology applied to pathology there are currently very few laboratories routinely using these tools. Studies demonstrating the feasibility of using AI to assist pathologists in daily practice are thus limited. Addressing this gap in the field would make a substantial contribution to advancing this field and provide much needed translational evidence that could help develop recommendations and guidelines for the safe and effective use of AI in routine diagnostic Anatomical Pathology workflow. The experiments discussed herein demonstrate the feasibility of employing AI-based tools to successfully solve various clinical challenges in Anatomical Pathology. 1.3 STUDY GOALS The aim of this study was to determine the feasibility of implementing AI technology for routine use in Anatomical Pathology. Considerable literature has accumulated showing the potential for AI to assist pathologists. However, it is unclear to what extent AI can support pathologists with everyday practical tasks. AI-based tools for four different clinical scenarios in Anatomical Pathology were consequently deployed in a modern clinical laboratory setting with adequate computational capacity and pathology 4 informatics expertise to demonstrate that AI can assist pathologists in performing specified tasks. Four challenges in pathology, that are often time-consuming to undertake manually, and that were intended to be solved by AI-based methods included: o Experiment #1: Screening digital slides for mycobacteria o Experiment #2: Grading breast carcinoma using mitotic figure counts o Experiment #3: Screening prostate core needle biopsies (CNBs) for cancer o Experiment #4: Resolving the original source of contaminating tissue floaters sometimes encountered on pathology slide In the first experiment, the objective was to develop and validate a novel AI tool to screen digitized acid-fast stained (AFS) slides in order to detect mycobacterial infection. In the remaining experiments, the objectives were to clinically validate previously developed algorithms on external datasets to either quantify mitotic figures in breast carcinoma, diagnose and grade prostate adenocarcinoma, or utilize a search tool to query for matching regions of interest (ROI) in whole-slide images. 1.4 MATERIALS AND METHODS SYNOPSIS The four experiments in this study were primarily conducted at the University of Pittsburgh Medical Center (UPMC) in Pittsburgh, USA, where Liron Pantanowitz was employed. Ethics clearance was accordingly obtained from both the Human Research 5 Ethics Committee (Medical) of the University of the Witwatersrand (Appendix 2) and the Institutional Review Board (IRB) of the University of Pittsburgh, USA. All experiments utilized whole-slide images acquired by scanning archival glass slides, typically at 40x magnification (0.25 µm/pixel resolution) with a single Z-focus plane. A variety of commercial whole slide scanners were used including Aperio (Leica Biosystems, USA), Philips (Philips, Netherlands), Nanozoomer (Hamamatsu, Japan) or 3D Histech (3DHISTECH, Hungary) instruments. For experiment 4, an additional digital dataset was downloaded from The Cancer Genome Atlas (TCGA) public archive. Digital datasets were de-identified and when needed coupled with available clinical metadata such as patient demographics, pathology diagnoses, and ancillary laboratory test results. Development of deep learning models followed the process delineated by Harrison and colleagues that included data acquisition, expert annotation (labeling), data partitioning into training and test data, algorithm training, blinded validation on at least one independent dataset, and verification prior to deployment [13]. Whole-slide images (typically 100,000 x 100,000 pixels) were broken into smaller patches or tiles (e.g. 64 x 64 pixels) with no overlap and subsequently fed into one or multiple multilayered convolutional neural networks (CNN). For experiment 4, an ensemble approach utilized a cohort of different algorithms that exploited both supervised and unsupervised computational methods for image processing. For certain experiments, data augmentation techniques (e.g. random rotation, jittering to add small amounts of noise) 6 were applied. Different hardware (e.g. CPUs and GPUs), software solutions (e.g. Ubuntu) and programming languages (e.g. Python, C/C++, JavaScript) were utilized for each experiment. Algorithm output at the image patch level, and occasionally at the WSI level, was conveyed as explainable annotations, heatmaps overlaid on hematoxylin and eosin (H&E) digital images, or as image patches ranked and displayed in gallery format. The results of different AI tools were compared to manual microscopic interpretations rendered by humans. The ground truth for resolving discrepancies was established by expert pathologist reading, consensus review and/or the results of ancillary tests (e.g. immunohistochemical staining, microbiology culture results, molecular findings). Metrics used to evaluate AI model performance included accuracy, sensitivity/specificity, positive/negative predictive value, F-score, area under the receiver operating characteristic curve (AUC), precision recall curve, and/or recorded time to perform a specific task. 1.5 SYNTHESIS OF EXPERIMENTS 1.5.1 EXPERIMENT #1: Acid-Fast Bacilli Screening AI Tool A detailed report of this experiment is provided in Chapter 2 [18]. 7 1.5.1.1 Background Mycobacteria cause significant disease worldwide. In South Africa, for example, annually hundreds of thousands of people fall ill with tuberculosis and thousands die from this infectious disease [19]. When anatomical pathologists encounter pathologic changes in tissue specimens (e.g. granulomatous inflammation) that may be caused by mycobacterial infection they are obliged to try identify these microorganisms concealed within these samples. This typically requires the use of ancillary studies. A simple and relatively cheap method is to stain tissue sections with an AFS (e.g. Ziehl-Neelsen or Kinyoun stain) to highlight acid-fast bacilli (AFB) such as mycobacteria. Since AFB are small (e.g. only 2-4 µm in length) and may be sparse, it is cumbersome for a busy pathologist to screen stained slides. Therefore, the possibility of computer-assisted screening for this mundane task is attractive. To date, publications about image algorithms to detect AFB have been restricted mostly to sputum samples and exploited static snapshots [18, 20]. To the best of my knowledge, very few studies have reported developing a deep learning model to analyze entire digital slides in order to identify mycobacteria in human tissue [21, 22]. These prior studies used small datasets for algorithm training. The objective of this first experiment was to thus develop and validate a comparable deep learning algorithm, but with a much larger dataset, to screen digitized AFS slides for mycobacteria within tissue sections, and to subsequently deploy this AI-based tool for use in clinical practice. 8 1.5.1.2 Materials and Methods A deep learning algorithm was developed using a sizeable digital dataset (n=441) comprised of scanned positive and negative AFS slides from UPMC (USA) and Wan Fang Hospital (Taiwan). Numerous ROIs containing either AFB (n = 6,817), mimics (n = 7,426), or negative background areas (n= 3,601) were annotated in these whole-slide images by several pathologists. Digital slides were subsequently cropped into thousands of image patches to train two CNN models. The algorithm’s output was presented in the form of image thumbnails ranked according to the likelihood that they contained at least one possible AFB. The thumbnails were displayed online in gallery format, together with their corresponding digital slide, for a pathologist end-user to review. After the algorithm was successfully validated, the accuracy and speed (measured in seconds) of two pathologists at finding AFB in an independent dataset (n=138) were compared using manual light microscopy, whole-slide image evaluation without AI support, and web-based AI-assisted analysis. 1.5.1.3 Results The algorithm demonstrated excellent performance (AUC = 0.960 at the image patch level; AUC = 0.900 at the digital slide level) when compared to the ground truth (i.e. original “signed-out” diagnosis rendered by a pathologist). The AI-based method was found to be more sensitive and accurate than humans alone. Pathologists were accordingly also able to identify significantly more AFB with AI-assistance than manual 9 microscopy or WSI examination without the aid of AI. Moreover, screening for mycobacteria proved to be significantly quicker and easier with AI-assistance. 1.5.1.4 Conclusion This experiment successfully demonstrated the feasibility of using an AI-based tool to search for rare events, such as AFS mycobacteria in whole-slide images. Compared to alternate manual methods of screening, with AI assistance this onerous task was more sensitive, accurate, and quicker to perform. 1.5.2 EXPERIMENT #2: Breast Mitosis Quantification AI tool A comprehensive report of this experiment is provided in Chapter 3 [23]. 1.5.2.1 Background The proliferation activity in breast carcinoma, which relies on mitotic count, is an important prognostic marker. Breast cancer grading accordingly requires that a pathologist count mitotic figures in H&E stained histology sections of a patient’s breast cancer sample. However, manually counting mitotic figures is subjective, time- consuming, and suffers from low reproducibility. Several articles were published showing that digital image analysis can solve this challenge by automating mitosis quantification [24-25]. However, early efforts leveraging computer-assisted mitotic 10 counting were limited to using only pre-defined ROIs in images. More recent work has resorted to developing automated methods using deep learning techniques to predict tumor proliferation from whole-slide images [26-27]. To date, there have been no studies showing whether using an AI tool to detect and quantify mitoses in breast carcinoma actually improves end-user accuracy and efficiency when scoring mitotic figures in practice. The aim of this study was to accordingly critique such an AI tool for use in clinical practice. The hypothesis was that reviewer accuracy and efficiency would improve with AI support. 1.5.2.2 Materials and Methods All components of this study were conducted at two different medical centers, UPMC in the USA and Samsung Medical Center (SMC) in Seoul, South Korea. This helped increase the dataset size and heterogeneity for algorithm training, as well as limit bias in the reader study used to assess the AI system’s practicability. Representative H&E glass slides from breast cancer cases (n = 320) were scanned at 40x magnification at UPMC using an Aperio AT2 scanner (Leica Biosystems, USA) and at SMC using a 3D Histech P250 scanner (3DHISTECH, Hungary). To simplify working with large image files, the whole-slide images were broken up into representative digital patches called high-power fields (HPFs). To further train the algorithm developed by the company Lunit, 10 expert pathologists were recruited to label mitotic figures. To assess the efficacy of this algorithm a reader study involving 24 end-users of varying proficiency 11 was undertaken. After receiving informed consent, the readers were asked to count mitotic figures in 140 HPFs with and without the aid of AI. Their accuracy and time (measured in seconds) to perform this task were recorded using a web-based tool. 1.5.2.3 Results The accuracy, precision and sensitivity of counting mitotic figures in images of invasive breast carcinoma for pathology end-users improved when they were provided with AI support to perform this task. With AI support, most readers (87.5%) were also able to detect more mitoses and had fewer falsely flagged mitotic figures. Of note, with AI assistance there was better standardization among readers, manifested by higher inter- pathologist agreement. Moreover, AI assistance reduced the overall time (27.8% savings) that end-users spent on this task. 1.5.2.4 Conclusion This study validated that not only can an AI algorithm successfully perform a narrow task such as counting mitotic figures in digital images of invasive breast carcinoma, but that this AI-based tool can augment pathology end-users when performing this mundane task. End-users with AI assistance were more accurate, efficient and less varied at quantifying mitotic figures. 12 1.5.3 EXPERIMENT #3: Prostate Cancer Diagnosis AI Tool A complete report of this experiment is provided in Chapter 4 [28]. 1.5.3.1 Background In Anatomical Pathology practice, screening prostate CNBs for carcinoma is challenging. In many practices these patient samples are typically received in high volume, screening is time-consuming, foci of cancer may be very small, there is notable inter-observer variability especially with grading, and there are several mimics that can lead to a misdiagnosis. To help address this challenge, several investigators have developed deep learning algorithms capable of analyzing digital pathology images of prostate histopathology [29-32]. Most such algorithms published in the literature have been limited to performing narrow tasks (e.g. only cancer detection or grading). However, pathologists are required to perform multiple tasks when evaluating a prostate CNB such as identify, grade and quantify prostate adenocarcinoma. Pathologists also have to report about additional findings important for patient management such as the presence of any Gleason pattern 5 which is a strong predictor of poor outcome and perineural invasion (PNI) which is a predictor of extra-prostatic tumor extension. To date, very few AI tools capable of comprehensively analyzing prostate CNBs have been deployed and validated for use in routine clinical practice. The aim of this study was to address this gap. 13 1.5.3.2 Materials and Methods A deep learning algorithm was developed to diagnose prostate adenocarcinoma, grade and quantify detected cancer, as well as find PNI in H&E stained slides of prostate CNBs that were scanned using a Philips scanner (Philips, Netherlands). Labeled whole- slide images used for this purpose included a training dataset of 549 slides further broken into 1,357,480 image patches, and a separately held-out test dataset of 2,501 digital slides. Following calibration, the algorithm was subsequently validated on an external dataset of 100 consecutive archival cases comprised of 1,627 scanned slides using an Aperio AT2 scanner (Leica Biosystems, USA). Discrepancies between algorithm and original pathologist’s diagnoses were managed by blinded review and the consensus of three expert pathologists, as well as immunohistochemistry (PIN4 cocktail stain that consisted of p63 + CK-903 + P504S) to confirm the diagnosis. This AI-based system was implemented in the clinical pathology laboratory at Maccabi Healthcare Services in Israel, and deployed to retrospectively review all prostate CNBs as a quality control application (i.e. second read system). 1.5.3.3 Results The algorithm performed exceptionally well on both internal and external datasets. For identifying prostate adenocarcinoma the AUC was 0·991 in the external validation set. For distinguishing between Gleason low- and high-grade cancers the AUC was 0·941 in 14 the external validation set. For detecting Gleason pattern 5 the AUC was 0·971 in the external validation set. For identifying PNI the algorithm achieved an AUC of 0·957 in the external validation set. As a second read system in the Maccabi pathology laboratory, this AI-based tool provided numerous helpful alerts for potential disagreements and correctly identified one case of missed adenocarcinoma among 941 reviewed cases. 1.5.3.4 Conclusion This study reports, to the best of my knowledge, the first successful development, external clinical validation, and deployment in clinical practice of an AI-based algorithm to accurately detect, grade, quantify and assess for PNI in whole-slide images of prostate CNBs. 1.5.4 EXPERIMENT #4: Pathology Image Search AI Tool A detailed report of this experiment is provided in Chapter 5 [33]. 1.5.4.1 Background In routine clinical practice pathologists sometimes encounter extraneous pieces of tissue on slides that may be due to specimen cross contamination. These are often called “tissue floaters”. Figuring out if they belong to the patient’s case being reviewed 15 or whether this tissue fragment arose from another case is cumbersome and time- consuming, even if expensive molecular techniques are used [34-35]. The aim of this study was to develop an AI-based image search tool to resolve the tissue floater conundrum. 1.5.4.2 Materials and Methods A glass slide with tissue floaters containing different cancers was artificially created by adding two separate H&E stained tissue fragments on the edge. This constructed slide was scanned using an Aperio AT2 scanner (Leica Biosystems, USA), along with the two original cancer slides used to create these tissue floaters. These digital slides were then randomly embedded into a large whole-slide image dataset (n = 2,325), encompassing many different H&E stained pathology entities. UPMC provided 300 images, whereas most of these images (n = 2,025) were obtained from the public dataset offered by The Cancer Genome Atlas (TCGA). Whole-slide images were tagged with their pathology diagnosis and the affected anatomic site. Image patches derived from these whole-slide images were converted into linear barcodes and indexed for easy retrieval. A deep learning-based image search tool using an ensemble approach of different algorithms was then utilized to mine for matching image features via barcodes. The retrieved images, each ranked based on their likelihood of matching the queried image, were sorted and then displayed in gallery format. The accuracy of matching the original tumor images using this model to each tissue floater was recorded. 16 1.5.4.3 Results The image search tool performed as intended and required only milliseconds to complete a query. When the digital database was repeatedly queried using this tool the likelihood of retrieving the correct tumor match to the tissue floater was very high. Further, the accuracy of retrieving the correct matching image increased when successively greater regions of the tissue floater were chosen. 1.5.4.4 Conclusion The AI-based model developed was successful for content-based image retrieval (CBIR). This image search tool offers pathology laboratories, especially those that have transitioned to going fully digital with WSI, a novel method to rapidly resolve the tissue floater conundrum. 1.6 DISCUSSION AI encompasses a variety of techniques devised to mimic human intelligence. The application of AI in pathology is relatively new [13, 36-38]. Nevertheless, this innovative field is rapidly evolving due to advances in digital pathology (e.g. WSI), availability of large curated digital datasets, better computation (e.g. graphics processing units, cloud computing), and progress in machine learning (e.g. deep learning networks that contain multiple hidden layers of nodes). Deep neural networks can now be trained to identify 17 precise objects, larger ROIs, or even analyze entire whole-slide images. Of the diverse artificial neural networks available, CNNs have been most commonly applied to developing pathology image classification systems [13, 39]. Several global machine learning competitions (e.g. CAMELYON and TUPAC challenges) that exploited CNNs helped promote the role of AI in pathology [40-41]. Whilst most early work in this field was largely confined to research laboratories, recently several startup AI companies (e.g. aetherAI, Lunit, Ibex, PathAI, Paige) have emerged offering the pathology community promising commercial AI-based tools intended for clinical and/or research use. According to a survey published in 2019 involving 487 pathologists practicing in 54 countries, the vast majority (80%) of respondents expected AI tools to soon be incorporated into pathology laboratories [42]. There are many benefits to adopting AI in pathology. These include eliminating tedious tasks, improved accuracy, standardization, predicting outcomes, better efficiency, reduced workload, and automation [43]. Hence, there is much enthusiasm about implementing AI in routine clinical practice. The study herein reports four separate experiments that successively demonstrate the capabilities of AI in Anatomical Pathology. Each experiment culminated in a peer- reviewed publication that made important contributions to the field. Article #1 regarding an AI tool used to screen for AFB in digital slides that was published in the journal Archives of Pathology & Laboratory Medicine (impact factor 4.094) was accessed 2,534 times online, had 784 PDF downloads, and was shared 18 times on social media only four months after being published. Article #2 concerning the mitotic figure quantification AI tool for breast carcinoma that was published in Diagnostic Pathology (impact factor 18 2.192) was accessed 2,582 times online 14 months since its publication and cited 8 times according to Web of Science. According to Google Scholar, article #3 about the AI tool used to analyze prostate cancer has been cited 47 times in the one year since it was published in The Lancet Digital Health (impact factor 24.519). There was also an editorial published alongside this paper in which the authors stated “The resulting study is distinguished by its real-world evaluation, showing how computer-aided diagnosis (CAD) tools might influence pathology practice in the near future”[44]. Article #4 that was published in the journal Archives of Pathology & Laboratory Medicine (impact factor 4.094) was accessed 2,543 times online, had 785 PDF downloads, and was tweeted on Twitter by 30 individuals in the 14 months since being published. Experiments 1, 2 and 3 replicated specialized tasks pathologists manually perform with their microscope, whereas experiment 4 achieved a task pathologists are currently typically unable to accomplish alone using a traditional light microscope. Experiments 1 and 2 demonstrated the ability of an AI-based tool to perform a narrow (specific) task (i.e. screen for rare mycobacteria or detect and quantify mitotic figures in images). Indeed, computers are better suited than humans to screen large image files for rare events, such as a single mycobacterium that only occupies a minute portion (e.g. 10 x 10 resolution) of an entire whole-slide image (e.g. 100,000 x 100,000 resolution) [18]. Experiment 3 successfully demonstrated the ability of an AI-based tool to simultaneously perform multiple tasks (i.e. identify, grade and characterize additional features about prostate cancer). Experiment 4 established the ability of an AI-based search tool to rapidly query a large image dataset for matching pathology images. 19 While many AI tools in pathology have been theoretically shown to be valuable, most of them have not yet been deployed widely in clinical practice. They have accordingly not been validated in real-world clinical settings for reproducibility and generalizability. Therefore, best practices for implementation (e.g. validation studies, underlying IT requirements, pathologist oversight needed) and the impact of AI tools on end-users remains to be defined. Each of the aforementioned published experiments offers some evidence to address this gap, which could be used in the future development of standard clinical guidelines. Experiment 1, for example, provided information about the optimal washout period to be used during implementation when comparing AI versus manual modalities. While most published data insinuates that pathologists working with AI tools typically outperform humans or machines separately [13], their optimal setup (e.g. workflow, end-user interaction) in practice still needs to be ascertained. For example, experiment 2 aptly highlighted that not all end-users may interact with AI-tools as anticipated, which could be addressed with more training and ongoing performance assessment. Experiment 3, on the other hand, highlighted the importance of calibrating algorithms locally pre-deployment. All of the experiments in this study followed a similar process of developing AI algorithms that included training followed by validation using an independent internal and/or external dataset. For experiments 2 and 3, roughly half of the data collected was used for training and the rest held-out for testing. To avoid bias, it was important to ensure that both of these datasets had a similar distribution of cases and features. The 20 strategy employed for training in experiments 1, 2 and 3 was supervised learning, where delineated data inputs were used to predict matching outputs. This involved pathology experts assigning labels (e.g. pathology diagnoses) to images and/or manually annotating specific features (e.g. mycobacteria, mitotic figures, prostate glands, etc.) within images. While manual training was effective, it was labor-intensive, time- consuming and sometimes subject to observer variability. Recently, some investigators have advocated using weak supervision in which they combine small datasets that have detailed annotations with larger unlabeled datasets, which they believe produces a better performing model [45]. In experiment 4, an ensemble approach was in fact used that exploited the strengths of both supervised (i.e. trained deep networks) and unsupervised computational methods for image processing. Unsupervised learning allows more salient features and patterns to be identified in pathology images, but requires very large datasets for training purposes. Compared to the first three experiments, the AI solution used in experiment 4 was unique because it also involved CBIR. CBIR systems rely on a search engine to retrieve similar images, and in pathology this has been recognized as an alternate framework to classify images and maximize diagnostic accuracy [46-47]. Google have developed a similar search tool for use in digital pathology called “similar image search for histopathology” (SMILY) [48]. The selection of pathology data used to generate AI algorithms is important, because the quality of the input usually determines the quality of the input. This concept is well conveyed in the computer science acronym GIGO (i.e. garbage in, garbage out). There are several pre-imaging variables (e.g. tissue folds, pale staining, blurred areas, and air 21 bubbles on slides) that can negatively affect the extraction of deep features in images that, in turn, can cause an AI algorithm to perform poorly. For this reason, in experiment 4 whole-slide images of low quality (e.g. slides scanned at low resolution or with large out-of-focus regions) were rejected. It is equally important that data used for training be heterogenous and comprised of a suitable distribution of cases that reflects real-world data likely to be encountered in clinical practice. This is because heterogenous datasets help build more robust algorithms that can be better generalized to work on digital slides from most laboratories. A prior survey involving greater than 500 studies about AI in medical imaging found that 94% only employed data from one site [49]. In an attempt to avoid this pitfall, all of the experiments in this study incorporated data from multiple sources (e.g. UPMC, Wan Fang Hospital, SMC, Maccabi, TCGA). Furthermore, for experiment 2 the datasets included breast carcinoma cases with a broad range of tumor grades and stages. Another drawback inherent in developing AI algorithms is data bias. This was encountered in experiment 1, because most image patches did not contain mycobacteria which resulted in imbalanced datasets (i.e. there were disproportionately many more negative than positive patches). To avoid bias toward a negative prediction, the ratio of negative to positive patches was thus intentionally curbed through random sampling during training. Developing AI algorithms in Anatomical Pathology shares challenges common to many other deep learning projects. These include lack of readily available datasets, lengthy time needed for humans to label and annotate data, as well as significant investment in computer equipment. However, developing AI algorithms for Anatomical Pathology also 22 poses unique obstacles. Whole-slide images, for example, are very large files (i.e. several gigabytes in size) and hence often unwieldly to work with. Breaking these digital slides into smaller image patches helped overcome this challenge and also provided a mechanism to augment data for training purposes. For pathologist buy-in to use AI tools it will be important to avoid the “black box” concept. Hence, it is important to deliver results generated by algorithms in an explainable format. For this reason, data output in the experiments conducted was either converted into presentable image galleries or easy to view heatmaps overlaid on H&E images. This appeared to be more palatable for pathologists and easier to use in clinical practice. The performance of AI algorithms are commonly evaluated using AUC [13]. An AUC value that approaches 1 signifies high discrimination performance. For experiment 1 the AUC = 0.960 at the image patch level and AUC = 0.900 at the digital slide level [18]. For experiment 3 the algorithm developed achieved an AUC of 0·997 for prostate cancer detection in the internal test set and 0·991 in the external validation set [28]. This minor difference in experiment 3 may be attributed to variability in tissue staining, prepared slides used, and the scanned images for the two datasets. Indeed, it is well-known that feature stability between different sites may be compromised by laboratory-specific pre- imaging variables (e.g. specimen fixation, tissue thickness, stain reagents) and the proprietary scanner used to digitize slides [13, 50]. Nevertheless, the AUC for experiment 3 is comparable to other publications on this topic showing that most modern AI models can accurately detect prostate adenocarcinoma with AUC values up to 0.991 to 0.997 [51]. It is important to also recognize that most AI algorithms have 23 been developed using pathology cases typically encountered in practice, because it is often difficult to collect sufficient numbers of rare cases to train algorithms. Of course, this may negatively impact their performance in real-world practice [52]. Additional testimony about the success of the work reported herein is the ongoing usage for some of the AI tools developed. The AI system used to screen whole-slide images for AFB is still being used daily at UPMC in the USA. The prostate cancer diagnosis AI tool is similarly still being used daily at Maccabi Hospital in Israel, where the workflow was subsequently switched from a second read system (i.e. retrospective case review) to a first read system (i.e. prospective review before a pathologist reviews the case). Finally, the barcode indexing technology used to develop the pathology image search AI tool in experiment 4 has since been adopted to build Yottixel, which is another image search engine for large archives of histopathology whole-slide images intended to advance further research in this field [53]. Building AI systems that perform well in clinical practice is challenging and often subject to technical limitations [17,54]. There were indeed several limitations encountered in the various experiments of this study. Whilst the algorithms developed were successful, using larger datasets may have resulted in better performance. The publicly available TCGA dataset that contains thousands of labeled whole-slides with helpful metadata was helpful to address this challenge in experiment 4. The whole-slide images used to train algorithms in this study were only scanned at 40x magnification and using only one focal (z) plane. The reason is that the commercial brightfield whole-slide scanners used 24 in this study could not scan slides at greater magnification. However, utilizing higher resolution images for training may have generated better algorithm performance. This is especially true for the AI tool used to screen for AFB because it has been shown that scanning slides at 100x with oil immersion is more desirable than 40x for resolving microorganisms in digital images [55]. Some investigators have also recently shown that employing multi-magnification-based machine learning can yield more accurate models [56]. Another limitation in experiment 2 was not standardizing the computer monitors used for annotation and the end-user reader study. Although the impact this may have had is unclear, it has been shown that different types of displays (e.g. medical grade versus consumer off the shelf screens) may influence pathologists' diagnostic performance [57]. The four experiments in this study demonstrate the enormous potential for AI to successfully augment the practice of Anatomical Pathology. However, there is currently limited experience with AI clinical use around the world, and a lack of practical AI validation guidelines for pathologists to follow. Hopefully the published results of my PhD work will provide helpful translational evidence for the future development of such clinical guidelines. Apart from simply producing “weak” AI-based tools that merely replicate the work pathologists perform, it is anticipated that soon there will be “strong” AI systems capable of undertaking challenging tasks pathologists cannot accomplish with their microscopes. However, before these state-of-the-art tools can be expected to be widely adopted, many hurdles (e.g. regulatory approval, financial reimbursement, pathologist buy-in) will need to be overcome. Surmounting these barriers may take 25 longer in countries such as South Africa where financial resources are limited, which is unfortunate because that is the precise clinical setting where pathologists are scarce, workloads are high, and AI support is likely to be especially beneficial. 1.7 REFERENCES 1. Pantanowitz L. Whole slide imaging, In: Pantanowitz L, Parwani AV, eds. Digital Pathology. Chicago: ASCP Press; 2017, pages 59-76. 2. Lara H, Li Z, Abels E, Aeffner F, Bui MM, ElGabry EA, Kozlowski C, et al. Quantitative image analysis for tissue biomarker use: A white paper from the digital pathology association. Appl Immunohistochem Mol Morphol. 2021; 29(7):479-493. 3. Evans AJ, Salama ME, Henricks WH, Pantanowitz L. Implementation of Whole Slide Imaging for Clinical Purposes: Issues to Consider From the Perspective of Early Adopters. Arch Pathol Lab Med. 2017; 141(7):944-959. 4. Hanna MG, Pantanowitz L. Digital Pathology, In: Encyclopedia of biomedical engineering. Narayan R, ed. Philadelphia, Elsevier; 2019, Volume 2, pages 524- 532. 26 5. Evans AJ, Bauer TW, Bui MM, Cornish TC, Duncan H, Glassy EF, et al. US Food and Drug Administration Approval of Whole Slide Imaging for Primary Diagnosis: A Key Milestone Is Reached and New Questions Are Raised. Arch Pathol Lab Med. 2018; 142(11):1383-1387. 6. Fraggetta F, Pantanowitz L. Going fully digital: utopia or reality? Pathologica. 2018; 110(1):1-2. 7. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019; 25(1):44-56. 8. Louis DN, Gerber GK, Baron JM, Bry L, Dighe AS, Getz G, et al. Computational pathology: an emerging definition. Arch Pathol Lab Med. 2014; 138(9):1133- 1138. 9. Chang HY, Jung CK, Woo JI, Lee S, Cho J, Kim SW, Kwak TY. Artificial Intelligence in Pathology. J Pathol Transl Med. 2019; 53:1–12. 10. Niazi MKK, Parwani AV, Gurcan MN. Digital pathology and artificial intelligence. Lancet Oncol. 2019; 20(5):e253-e261. 11. Naugler C, Church DL. Automation and artificial intelligence in the clinical laboratory. Critical Rev Clin Lab Sciences. 2019; 56(2):98-110. https://pubmed.ncbi.nlm.nih.gov/29708429/ https://pubmed.ncbi.nlm.nih.gov/29708429/ https://pubmed.ncbi.nlm.nih.gov/29708429/ https://pubmed.ncbi.nlm.nih.gov/30259907/ https://www.ncbi.nlm.nih.gov/pubmed/30617339 https://www.ncbi.nlm.nih.gov/pubmed/30617339 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6344799/ https://www.ncbi.nlm.nih.gov/pubmed/31044723 27 12. Yao K, Singh A, Sridhar K, Blau JL, Ohgami RS. Artificial intelligence in pathology: a simple and practical guide. Adv Anat Pathol. 2020; 27(6):385-393. 13. Harrison JH, Gilbertson JR, Hanna MG, Olson NH, Seheult JN, Sorace JM et al. Introduction to artificial intelligence and machine learning in pathology. Arch Pathol Lab Med. 2021; 145(10):1228-1254. 14. van der Laak J, Litjens G, Ciompi F. Deep learning in histopathology: the path to the clinic. Nat Med. 2021; 27(5):775-784. 15. Maddox TM, Rumsfeld JS, Payne PRO. Questions for Artificial Intelligence in Health Care. JAMA. 2019; 321(1):31-32. 16. Bi WL, Hosny A, Schabath MB, Giger ML, Birbak NJ, Mehrtash et al. Artificial intelligence in cancer imaging: clinical challenges and applications. CA Cancer J Clin. 2019; 69(2):127-157. 17. Cheng J, Balis U, McClintock D, Abel J, Pantanowitz L. Challenges in the development, deployment and regulation of artificial intelligence (AI) in anatomical pathology. Am J Pathol. 2021; 191(10):1684-1692. 18. Pantanowitz L, Wu U, Seigh L, LoPresti E, Yeh FC, Salgia P, et al. Artificial intelligence-based screening for mycobacteria in whole-slide images of tissue samples. Am J Clin Pathol. 2021; 156(1):117-128. https://www.ncbi.nlm.nih.gov/pubmed/30535130 https://www.ncbi.nlm.nih.gov/pubmed/30535130 28 19. World Health Organization. Global Tuberculosis Report 2021. Geneva: World Health Organization; 2021. https://www.who.int/teams/global-tuberculosis- programme/tb-reports/global-tuberculosis-report-2021 [Last accessed October 23, 2021] 20. Tadrous PJ. Computer-assisted screening of Ziehl-Neelsen-stained tissue for mycobacteria. Algorithm design and preliminary studies on 2,000 images. Am J Clin Pathol. 2010; 133(6):849-858. 21. Xiong Y, Ba X, Hou A, Zhang K, Chen L, Li T. Automatic detection of mycobacterium tuberculosis using artificial intelligence. J Thorac Dis. 2018; 10(3):1936-1940. 22. Wang HS, Liang WY. Automatic mycobacterium tuberculosis detection using simple image processing with artificial intelligence. J Pathol Inform. 2019; 10:S40. 23. Pantanowitz L, Hartman D, Qi Y, Cho EY, Suh B, Paeng K, et al. Accuracy and efficiency of an artificial intelligence tool when counting breast mitoses. Diagnostic Pathology 2020; 51(1):80. 24. Malon C, Brachtel E, Cosatto E, Graf HP, Kurata A, Kuroda M et al. Mitotic figure recognition: agreement among pathologists and computerized detector. Anal Cell Pathol (Amst). 2012; 35(2):97-100. https://www.who.int/teams/global-tuberculosis-programme/tb-reports/global-tuberculosis-report-2021 https://www.who.int/teams/global-tuberculosis-programme/tb-reports/global-tuberculosis-report-2021 https://www.ncbi.nlm.nih.gov/pubmed/?term=tadrous+and+acid+fast+stain https://www.ncbi.nlm.nih.gov/pubmed/?term=tadrous+and+acid+fast+stain 29 25. Veta M, van Diest PJ, Willems SM, Wang H, Madabhushi A, Cruz-Roa A et al. Assessment of algorithms for mitosis detection in breast cancer histopathology images. Med Image Anal. 2015; 20(1):237-48. 26. Veta M, Heng YJ, Stathonikos N, Bejnordi BE, Beca F, Wollmann T et al. Predicting breast tumor proliferation from whole-slide images: The TUPAC16 challenge. Med Image Anal. 2019; 54:111-121. 27. Nateghi R, Danyali H, Helfroush MS. A deep learning approach for mitosis detection: Application in tumor proliferation prediction from whole slide images. Artif Intell Med. 2021; 114:102048. 28. Pantanowitz L, Quiroga-Garza GM, Bien L, Heled R, Laifenfeld D, Linhart C et al. Clinical validation and deployment of an AI-based algorithm for prostate cancer diagnosis in whole slide images of core needle biopsies. Lancet Digital Health. 2020; 2(8):e407-e416. 29. Gertych A, Ing N, Ma Z, Fuchs TJ, Salman S, Mohanty S, Bhele S et al. Machine learning approaches to analyze histological images of tissues from radical prostatectomies. Comput Med Imaging Graph. 2015; 46(Pt 2):197-208. 30. Bulten W, Balkenhol M, Belinga JJA, Brilhante A, Cakir A, Egevad L et al. Artificial intelligence assistance significantly improves Gleason grading of prostate biopsies by pathologists. Mod Pathol. 2021; 34(3):660-671. https://www.ncbi.nlm.nih.gov/pubmed/25547073 https://www.ncbi.nlm.nih.gov/pubmed/25547073 https://www.ncbi.nlm.nih.gov/pubmed/?term=Gertych%20A%5BAuthor%5D&cauthor=true&cauthor_uid=26362074 https://www.ncbi.nlm.nih.gov/pubmed/?term=Ing%20N%5BAuthor%5D&cauthor=true&cauthor_uid=26362074 https://www.ncbi.nlm.nih.gov/pubmed/?term=Ma%20Z%5BAuthor%5D&cauthor=true&cauthor_uid=26362074 https://www.ncbi.nlm.nih.gov/pubmed/?term=Fuchs%20TJ%5BAuthor%5D&cauthor=true&cauthor_uid=26362074 https://www.ncbi.nlm.nih.gov/pubmed/?term=Salman%20S%5BAuthor%5D&cauthor=true&cauthor_uid=26362074 https://www.ncbi.nlm.nih.gov/pubmed/?term=Mohanty%20S%5BAuthor%5D&cauthor=true&cauthor_uid=26362074 https://www.ncbi.nlm.nih.gov/pubmed/?term=Bhele%20S%5BAuthor%5D&cauthor=true&cauthor_uid=26362074 https://www.ncbi.nlm.nih.gov/pubmed/26362074 https://www.ncbi.nlm.nih.gov/pubmed/?term=Bulten%20W%5BAuthor%5D&cauthor=true&cauthor_uid=30696866 30 31. Van Booven DJ, Kuchakulla M, Pai R, Frech FS, Ramasahayam R, Reddy P et al. A Systematic Review of Artificial Intelligence in Prostate Cancer. Res Rep Urol. 2021; 13:31-39. 32. Perincheri S, Levi AW, Celli R, Gershkovich P, Rimm D, Morrow JS et al. An independent assessment of an artificial intelligence system for prostate cancer detection shows strong diagnostic accuracy. Mod Pathol. 2021; 34(8):1588-1595. 33. Pantanowitz L, Michelow P, Hazelhurst S, Kalra S, Choi C, Shah S et al. A digital pathology solution to resolve the tissue floater conundrum. Arch Pathol Lab Med. 2021; 145(3):359-364. 34. Hunt JL, Swalsky P, Sasatomi E, Niehouse L, Bakker A, Finkelstein SD. A microdissection and molecular genotyping assay to confirm the identity of tissue floaters in paraffin-embedded tissue blocks. Arch Pathol Lab Med. 2003; 127:213-217. 35. Venditti M, Hay RW, Kulaga A, Demetrick DJ. Diagnosis of ectopic tissue versus contamination by genetic fingerprinting in a routine surgical pathology specimen. Hum Pathol. 2007; 38(2):378-382. 31 36. Pantanowitz L, Carter A, Kurc T, Sharma A, Sussman A, Saltz J. 20 Years of Digital Pathology: an overview of the road travelled, what is on the horizon, and emergence of vendor neutral archives. J Pathol Inform. 2018; 9:46. 37. Chang HY, Jung CK, Woo JI, Lee S, Cho J, Kim SW, Kwak TY. Artificial Intelligence in Pathology. J Pathol Transl Med. 2019; 53:1-12. 38. Steiner DF, MacDonald R, Liu Y, Truszkowski P, Hipp JD, Gammage C et al. Impact of Deep Learning Assistance on the Histopathologic Review of Lymph Nodes for Metastatic Breast Cancer. Am J Surg Pathol. 2018; 42(12):1636-1646. 39. Benjordi BE, Mullooly M, Pfeiffer RM, Fan S, Vacek PM, Weaver DL et al. Using deep convolutional neural networks to identify and classify tumor-associated stroma in diagnostic breast biopsies. Mod Pathol. 2018; 31(10):1502-1512. 40. Hartman DJ, van der Laak JAWM, Gurcan MN, Pantanowitz L. Value of public challenges for the development of pathology deep learning algorithms. J Path Inform. 2020; 11:7. 41. Golden JA. Deep learning algorithms for detection of lymph node metastases from breast Cancer: Helping artificial intelligence be seen. JAMA. 2017; 318:2184-2186. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6344799/ https://www.ncbi.nlm.nih.gov/pubmed/30312179 https://www.ncbi.nlm.nih.gov/pubmed/30312179 32 42. Sarwar S, Dent A, Faust K, Richer M, Djuric U, Van Ommeran R et al. Physician perspectives on integration of artificial intelligence into diagnostic pathology. NPJ Digit Med. 2019; 2:28. 43. Jiang Y, Yang M, Wang S, Li X, Sun Y. Emerging role of deep learning-based artificial intelligence in tumor pathology. Cancer Commun (Lond). 2020; 40(4):154-166. 44. Janowczyk A, Leo P, Rubin MA. Clinical deployment of AI for prostate cancer diagnosis. Lancet Digit Health. 2020; 2(8):e383-e384. 45. Campanella G, Hanna MG, Geneslaw L, Miraflor A, Werneck Krauss Silva V, Busam KJ et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med. 2019; 25(8):1301-1309. 46. Zheng Y, Jiang Z, Zhang H, Xie F, Ma Y, Shi H et al. Histopathological whole slide image analysis using context-based CBIR. IEEE Trans Med Imaging. 2018; 37(7):1641– 1652. 47. Tizhoosh H, Diamandis P, Campbell CJV, Safarpoor A, Kalra S, Maleki D et al. Searching Images for Consensus: Can AI Remove Observer Variability in Pathology? Am J Pathol. 2021; 191(10):1702-1708. https://pubmed.ncbi.nlm.nih.gov/32277744/ https://pubmed.ncbi.nlm.nih.gov/32277744/ https://pubmed.ncbi.nlm.nih.gov/31308507/ https://pubmed.ncbi.nlm.nih.gov/31308507/ 33 48. Hegde N, Hipp JD, Liu Y, Emmert-Buck M, Reif E, Smilkov D et al. Similar image search for histopathology: SMILY. NPJ Digital Medicine. 2019; 2:56. 49. Kim DW, Jang HY, Kim KW, Shin Y, Park SH. Design characteristics of studies reporting the performance of artificial intelligence algorithms for diagnostic analysis of medical images: results from recently published papers. Korean J Radiol. 2019; 20(3): 405-410. 50. Leo P, Lee G, Shih NNC, Elliott R, Feldman MD, Madabuhushi A. Evaluating stability of histomorphometric features across scanner and staining variations: prostate cancer diagnosis from whole slide images. J Med Imaging. 2016; 3(4):047502. 51. Egevad L, Delahunt B, Samaratunga H, Tsuzuki T, Yamamoto Y, Yaxley J et al. The emerging role of artificial intelligence in the reporting of prostate pathology. Pathology. 2021; 53(5):565-567. 52. Pantanowitz L, Mehra R, Kunju LP. AI reality check when evaluating difficult to grade prostate cancers. Virchows Arch. 2021; 478(4):617-618. 53. Kalra S, Tizhoosh H, Choi C, Shah S, Diamandis P, Campbell CJV, Pantanowitz L. Yottixel - An image search engine for large archives of histopathology whole slide images. Medical Image Analysis. 2020; 65:101757. 34 54. Tizoosh HR, Pantanowitz L. Artificial intelligence and digital pathology: challenges and opportunities. J Pathol Inform. 2018; 9:38. 55. Rhoads DD, Habib-Bein NF, Hariri RS, Hartman DJ, Monaco SE, Lesniak A et al. Comparison of the Diagnostic Utility of Digital Pathology Systems for Telemicrobiology. J Pathol Inform. 2016; 7:10. 56. D'Alfonso TM, Ho DJ, Hanna MG, Grabenstetter A, Yarlagadda DVK, Geneslaw L et al. Multi-magnification-based machine learning as an ancillary tool for the pathologic assessment of shaved margins for breast carcinoma lumpectomy specimens. Mod Pathol. 2021; 34(8):1487-1494. 57. Abel JT, Ouillette P, Williams CL, Blau J, Cheng J, Yao K et al. Display Characteristics and Their Impact on Digital Pathology: A Current Review of Pathologists' Future "Microscope". J Pathol Inform. 2020; 11:23. https://pubmed.ncbi.nlm.nih.gov/33903728/ https://pubmed.ncbi.nlm.nih.gov/33903728/ https://pubmed.ncbi.nlm.nih.gov/33903728/ 35 CHAPTER 2: PUBLISHED ARTICLE 1 Pantanowitz L, Wu U, Seigh L, LoPresti E, Yeh FC, Salgia P, Michelow P, Hazelhurst S, Chen WY, Hartman D, Yeh CY. Artificial intelligence-based screening for mycobacteria in whole-slide images of tissue samples. American Journal of Clinical Pathology, 2021; 156(1):117-128. https://pubmed.ncbi.nlm.nih.gov/33527136/ Abstract Objectives: This study aimed to develop and validate a deep learning algorithm to screen digitized acid fast-stained (AFS) slides for mycobacteria within tissue sections. Methods: A total of 441 whole-slide images (WSIs) of AFS tissue material were used to develop a deep learning algorithm. Regions of interest with possible acid-fast bacilli (AFBs) were displayed in a web-based gallery format alongside corresponding WSIs for pathologist review. Artificial intelligence (AI)-assisted analysis of another 138 AFS slides was compared to manual light microscopy and WSI evaluation without AI support. Results: Algorithm performance showed an area under the curve of 0.960 at the image patch level. More AI-assisted reviews identified AFBs than manual microscopy or WSI examination (P < .001). Sensitivity, negative predictive value, and accuracy were highest for AI-assisted reviews. AI-assisted reviews also had the highest rate of matching the original sign-out diagnosis, were less time-consuming, and were much easier for pathologists to perform (P < .001). Conclusions: This study reports the successful development and clinical validation of an AI-based digital pathology system to screen for AFBs in anatomic pathology material. AI assistance proved to be more sensitive and accurate, took pathologists less time to screen cases, and was easier to use than either manual microscopy or viewing WSIs. https://pubmed.ncbi.nlm.nih.gov/33527136/ 1 AJCP / ORIGINAL ARTICLE Am J Clin Pathol 2021;XX:1-12 DOI: 10.1093/ajcp/aqaa215 © American Society for Clinical Pathology, 2021. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com Artificial Intelligence–Based Screening for Mycobacteria in Whole-Slide Images of Tissue Samples Liron Pantanowitz, MD,1,2 Uno Wu, MEng,3,4 Lindsey Seigh,1 Edmund LoPresti,5 Fang-Cheng Yeh, PhD,6 Payal Salgia, MBBS, DPB, DNB,1 Pamela Michelow, MBBCh,2 Scott Hazelhurst, PhD,7 Wei-Yu Chen, MD, PhD,8,9 Douglas Hartman, MD,1, and Chao-Yuan Yeh, MD4 From the 1Department of Pathology and 5Information Services Division, University of Pittsburgh Medical Center, Pittsburgh, PA, USA; 2Department of Anatomical Pathology, University of the Witwatersrand and National Health Laboratory Services, Johannesburg, South Africa; 3Department of Electrical Engineering, Molecular Biomedical Informatics Lab, National Cheng Kung University, Tainan City, Taiwan; 4aetherAI, Taipei, Taiwan; 6Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA, USA; 7School of Electrical & Information Engineering and Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa; and 8Department of Pathology, Wan Fang Hospital, and 9Department of Pathology, School of Medicine, Taipei Medical University, Taipei, Taiwan. Key Words: Acid-fast bacilli; Artificial intelligence; Deep learning; Digital pathology; Informatics; Screening; Mycobacteria; Whole-slide imaging Am J Clin Pathol 2021;XX:1–13 DOI: 10.1093/AJCP/AQAA215 ABSTRACT Objectives: This study aimed to develop and validate a deep learning algorithm to screen digitized acid fast–stained (AFS) slides for mycobacteria within tissue sections. Methods: A total of 441 whole-slide images (WSIs) of AFS tissue material were used to develop a deep learning algorithm. Regions of interest with possible acid-fast bacilli (AFBs) were displayed in a web-based gallery format alongside corresponding WSIs for pathologist review. Artificial intelligence (AI)–assisted analysis of another 138 AFS slides was compared to manual light microscopy and WSI evaluation without AI support. Results: Algorithm performance showed an area under the curve of 0.960 at the image patch level. More AI-assisted reviews identified AFBs than manual microscopy or WSI examination (P < .001). Sensitivity, negative predictive value, and accuracy were highest for AI-assisted reviews. AI-assisted reviews also had the highest rate of matching the original sign-out diagnosis, were less time-consuming, and were much easier for pathologists to perform (P < .001). Conclusions: This study reports the successful development and clinical validation of an AI-based digital pathology system to screen for AFBs in anatomic pathology material. AI assistance proved to be more sensitive and accurate, took pathologists less time to screen cases, and was easier to use than either manual microscopy or viewing WSIs. Mycobacteria are a major cause of infectious disease morbidity and mortality worldwide. This includes tuber- culosis (TB) caused by the bacillus Mycobacterium tuber- culosis (MTb). MTb is one of the leading global causes of death, especially in susceptible populations (eg, malnutri- tion, acquire immune deficiency syndrome) and people living in resource-poor countries. According to the World Health Organization, in 2018, there were around 10 mil- lion people ill with TB and 1.2 million TB deaths among human immunodeficiency virus–negative individuals.1 Most cases were from Southeast Asia, Africa, and the western Pacific. If diagnosed in a timely manner, infected patients may be treated, and their risk of transmitting this communicable disease to others is reduced. Mycobacteria bacilli are small organisms (length = 2-4 μm; width = 0.2-0.5 μm) that are consequently hard to detect microscopically without the aid of a special stain. Laboratory tests for TB vary and include microscopic examination of sputum smears, blood tests, culture, and molecular testing. Due to the high content of mycolic acids in their cell walls, mycobacteria can be identified with an Key Points • A deep learning algorithm was developed to screen digitized slides for mycobacteria. • Detected mycobacteria were displayed in gallery format for review. • Artificial intelligence assistance was more sensitive, accurate, quicker, and easier to use than manual microscopy or viewing digital slides. D ow nloaded from https://academ ic.oup.com /ajcp/advance-article/doi/10.1093/ajcp/aqaa215/6126000 by U niversity of M ichigan user on 08 April 2021 36 2 © American Society for Clinical Pathology Pantanowitz et al / AI-BASED SCREENING FOR MYCOBACTERIA Am J Clin Pathol 2021;XX:1-12 DOI: 10.1093/ajcp/aqaa215 acid-fast stain (AFS). For bright-field microscopy, a Ziehl- Neelsen (ZN) stain is used or modifications such as the Kinyoun and Fite stains. With these stains, rod-shaped ba- cilli stain red (acid-fast bacilli [AFBs]) and background ma- terial blue/green with the counterstain (eg, methylene blue). For fluorescent microscopy, auramine and rhodamine fluo- rescent stains are used. In situations where TB infection is unsuspected, fresh tissue may not be available for testing. This is often the case with fixed tissue and cytology sam- ples. However, mycobacteria in sections from these fixed specimens are not readily visible with a H&E or a Gram stain. Therefore, to render a microscopic diagnosis, an AFS is often ordered when infection is being considered (eg, granulomas or necrosis are identified) in lung specimens or extrapulmonary sites. This is a simple and relatively cheap method to look for AFBs. While such manual microscopy to screen for AFBs has been reported to have good predic- tive value,2,3 currently this mundane task needs to be per- formed by trained individuals and is time-consuming (eg, 15-20 minutes/slide is often recommended), laborious, in- consistent, and subject to human error. The sensitivity of manual detection also depends on the number of AFBs present, which can be sparse in early infections. To try improve the efficiency, raise accuracy, and al- leviate busy workloads, attempts have been made to auto- mate AFB microscopy identification by employing digital imaging and computer vision techniques. Jha et al4 showed that such automated digital microscopy of sputum smears for detecting MTb could significantly lower the cost of diagnosis, which is important in countries such as South Africa, where resources are low. Panicker et al5 evaluated several automatic methods based on image-processing techniques published between 1998 and 2014. Their review article demonstrated that the accuracy of these image al- gorithms designed to automatically detect AFBs restricted mostly to sputum samples markedly improved over the years. For sputum microscopy, several commercial systems have already been marketed.5 However, very few studies to date have similarly looked at applying image analysis for AFB detection in histopathology material using either static snapshots6,7 or whole-slide images (WSIs).8 The aim of this study was to develop and validate a deep learning algorithm for assisting pathologists with screening digitized acid fast–stained slides for the detec- tion of mycobacterial infection. Materials and Methods Institutional review board approval was obtained for this study (University of Pittsburgh, PRO19060216; University of Witwatersrand, clearance certificate M191003). Acid-Fast Staining Slides for this study were contributed by two separate institutions: (1) University of Pittsburgh Medical Center (UPMC), Pittsburgh, PA, and (2) Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan. Sections (0.4 μm thick) were routinely prepared from formalin- fixed, paraffin-embedded pathology material, including surgical pathology samples or cytology cell blocks. At UPMC, acid-fast stains were performed on slides using an Artisan autostainer instrument that employs a com- mercial AFB stain kit (AR162; Agilent Dako). This au- tomated staining process includes onboard drying and deparaffinization. The AFB stain kit used is a mod- ification of the original ZN method. The AFS pro- cess involves the application of carbol fuchsin to stain AFBs red, followed by decolorization of all other tissue elements. Thereafter, a methylene blue counterstain is applied. At Wan Fang Hospital, AFS was performed manually using the Kinyoun (cold) method. Clinical Data Sets A total of 441 WSIs were used for algorithm devel- opment Table 1 . At UPMC, 297 randomly retrieved ar- chival cases (some with multiple AFS slides) from 2016 to 2019, which were signed out by 26 different patholo- gists, were included. For each of these cases, the following metadata were recorded: patient sex (55.2% men, 44.8% women), patient age (mean, 55.5 years; range, 19-90 years), sample type (98.7% surgical pathology cases, 1.3% cytology cases), specimen anatomic location (57.2% lung, 42.8% extrapulmonary), original pathology diagnosis, presence of necrosis (present in 19.9% of cases), and AFS interpretation reported by the original sign-out pathologist. As illustrated Table 1 Breakdown of Whole-Slide Images (WSIs) Used in Acid-Fast Bacilli Algorithm Development Characteristic Algorithm Development, No. Analytical Validation, No. Testing, No. Total No. Positive Negative Positive Negative Positive Negative Positive Negative WSIs 47 371 9 3 6 5 62 379 Patches 4,629 1,049,766 449 40,508 600 21,644 5,678 1,111,918 D ow nloaded from https://academ ic.oup.com /ajcp/advance-article/doi/10.1093/ajcp/aqaa215/6126000 by U niversity of M ichigan user on 08 April 2021 37 3© American Society for Clinical Pathology AJCP / ORIGINAL ARTICLE Am J Clin Pathol 2021;XX:1-12 DOI: 10.1093/ajcp/aqaa215 in Table 1, the slides included a mixture of positive (AFBs present) and negative (AFBs absent) cases. Table  1 also shows how the 441 WSIs were split into three subsets for algorithm development (data set to train our deep neural networks), analytical validation (validation data set to check model performance), and testing (test data set used to calcu- late and report performance). Wan Fang Hospital in Taiwan contributed an additional 15 positive cases for training pur- poses derived from lung, pleural, and lymph node tissue. If at least one slide in a case was positive for AFBs, then the overall case was labeled positive. If available, corresponding microbiology results were documented, including culture re- sults (1 positive case, 11 negative cases, not tested in the re- maining cases) and AFB polymerase chain reaction (PCR) findings (5 positive cases, 7 negative cases, not ordered in the remaining cases). A separate randomly selected set of 78 archival cases (138 slides) was used for clinical validation pur- poses. These cases from UPMC, accessioned between June 2015 and July 2019, were different from the previ- ously used batch of slides and signed out by 18 different anatomic pathologists. The cases used for the test set did not have slides used in the training set. The 78 cases in- cluded 55.1% men and 44.9% women, patients ranging in age from 21 to 85 years (mean, 58.5 years), 87.2% sur- gical cases and 12.8% cytology cases, and 56.4% lung and 43.6% extrapulmonary cases, with necrosis pres- ent in 41.0% of cases. Microbiology culture results for mycobacteria included three positive cases, six negative cases, and no test for the remaining cases. In many of these cases, since the diagnosis of tuberculosis was not clinically suspected, cultures were not requested upfront. PCR findings for mycobacteria included 1 positive case and 18 negative cases, and the remaining cases did not have molecular testing ordered. At UPMC, all slides were scanned at ×40 (0.25 µm/ pixel resolution) with a single Z-focus plane using an Aperio AT2 scanner (Leica Biosystems). The typical res- olution for a whole-slide image was 100,000  ×  100,000 pixels. At Wan Fang Hospital, slides were also scanned with a single-focus plane at ×40 but using a Hamamatsu Nanozoomer XR (0.23 µm/pixel resolution). Algorithm Development We adopted a patch-based approach to the AFB detec- tion problem where a WSI was divided into nonoverlapping patches 64 × 64 pixels in resolution, and each patch was determined by the algorithm to be positive or negative for AFBs. To achieve a better balance between sensi- tivity and specificity, the algorithm consisted of two deep convolutional neural network (CNN) models Figure 1 , in which the first model had higher sensitivity and the second model had high specificity. The reason for using two models was to filter out false-positive predictions. While changing the threshold may have solved this issue, we found that it was not easy to decide on a threshold. The two models have the same architecture, GhostNet,9 but were trained differently. For training data, positive and negative patches were generated by expert annotation on WSIs via a web-based application (aetherSlide). A single AFB oc- cupied a minute component (10 × 10 resolution) of an entire WSI (100,000  ×  100,000 resolution). For posi- tive annotations (AFBs identified; n = 6,817), a small square region of the image containing organism(s) was cropped. AFBs were permitted to be present anywhere within the region of a positive patch. Negative patches consisted of larger polygonal regions, and over one million patches without AFBs were sampled. Negative annotations included three different labels: true nega- tive (n = 2,773), hard negative areas with AFB mimics, including artifacts and stain precipitate (n  =  7,426); and background regions (n = 828). When training the neural network model, true and hard negative anno- tations were merged. There were disproportionately (>200 times) more negative patches than positive ones. To prevent neural network models from an inherent bias toward a negative prediction, we limited the ratio of negative to positive patches to 7:3 through random sampling during training. We found that this sampling strategy yielded better algorithm performance over a focal loss strategy. The first model was pretrained on the CIFAR- 10 data set (50,000 images with 32  ×  32 pixels)10 and further trained using the entire training set. For each Figure 1 Schematic showing acid-fast bacilli detection algorithm training workflow. D ow nloaded from https://academ ic.oup.com /ajcp/advance-article/doi/10.1093/ajcp/aqaa215/6126000 by U niversity of M ichigan user on 08 April 2021 38 4 © American Society for Clinical Pathology Pantanowitz et al / AI-BASED SCREENING FOR MYCOBACTERIA Am J Clin Pathol 2021;XX:1-12 DOI: 10.1093/ajcp/aqaa215 image, we applied random shift, random rotation (0, 90, 180, or 270 degrees), scaling, changing brightness, and horizontal flipping. Pixel values were divided by 127.5 and minus 1. The parameter of L2 regularization was 0.0005, and batch size was 128. We used a stochastic gra- dient descent (SGD) optimizer with an initial learning rate of 0.1 and momentum of 0.9. The value of the learning rate was divided by 10 when the epoch came to 100, 200, and 300. In total, we trained 400 epochs. The second model was initialized with the weights of the trained first model and fine-tuned using only patches predicted as positive by the first model. The method used to finetune the first and second models was the same. For the second model, we finetuned all layers in the models without L2 regularization. In this model, batch size was 32. We again used the SGD optimizer with an initial learning rate of 0.001 and momentum of 0.9. The learning rate was multiplied 0.98 after every 500 steps. In total, we trained 50,000 steps to avoid overfitting. When calculating validation performance, we used all positive patches and 2,000 random negative patches from the validation set to calculate validation accuracy. We subsequently saved the model with the highest validation accuracy. Color normalization was not employed. The hardware environment for algorithm develop- ment included two central processing units (InteI Xeon CPU E5-2697 v3 at 2.60 GHz, with 14 cores per proc- essor), had a graphics processing unit (Nvidia GeForce RTX 2080ti), and required random access memory (DDR4-32G * 24) of 768 GB. The software environment included the Ubuntu 18.04 operating system and Python 3.6 programming language. Depending on the file size, it takes the algorithm 5 to 30 minutes to process an en- tire WSI. This complete process includes reading a WSI, splitting the WSI into patches, and running the machine learning algorithm. Although the program processes one WSI at a time, all of the patches are processed in parallel at the same time. While one CPU with eight cores and 32 GB of RAM is likely sufficient to process a WSI, the Nvidia GPU helps speed up the algorithm. Web Portal For managing WSIs in routine clinical practice, a slide processing assistant was developed in C# that iden- tified the digitized slide submitted for processing, made the WSI available to the CNN for analysis, and stored the algorithm’s results in a structured query language data- base. The algorithm’s output was transferred to a comma- separated values file, including the (1) X and Y coordinates and (2) likelihood confidence score of predicted positive patches. To make the algorithm output explainable and easy for pathology end users, patches (ie, regions of interest) were displayed via a web portal in gallery format Figure 2 . The web portal was written in C# and JavaScript. The web-based gallery includes image patches shown as thumbnails, ranked from highest to lowest based on Figure 2 Screenshot from the web portal showing regions of interest (patches) identified by the algorithm in the gallery on the left and the corresponding whole-slide image (WSI) on the right. In this example, regions 17 and 18 (out of 50 total) iden- tified by the algorithm each had a confidence score of 0.88. Clicking on thumbnail 17 displayed acid-fast bacilli (green circle) within the WSI at the exact location where they were detected. D ow nloaded from https://academ ic.oup.com /ajcp/advance-article/doi/10.1093/ajcp/aqaa215/6126000 by U niversity of M ichigan user on 08 April 2021 39 5© American Society for Clinical Pathology AJCP / ORIGINAL ARTICLE Am J Clin Pathol 2021;XX:1-12 DOI: 10.1093/ajcp/aqaa215 their probability of containing AFBs. These thumbnail images are dynamically generated using OpenSlide11 by automatically taking 80 × 80 (resolution 96 pixels/inch) snapshots (file size 2 KB) at maximum magnification, centered on the X and Y location within the WSI. A user can scroll down the webpage to examine all the regions of interest for an analyzed digital slide. Each thumbnail also shows the probability (eg, score 0.5 indicates that the image patch has a 50% probability of containing an AFB). For each case, the website also displays the cor- responding WSI. Clicking on a thumbnail in the gallery displays that exact region of interest in the context of the WSI image on the right, allowing a pathologist to visu- alize the AFBs within the context of the entire slide (eg, within a Langhans cell or located extracellularly in a ne- crotic background). Users are free to navigate through the entire WSI by panning and zooming. They can also define boundaries within the WSI to inspect, filtering out regions outside the boundary (ie, only patches within the boundary get displayed in the gallery). Clinical Validation The algorithm and workflow incorporating the web portal to display artificial intelligence (AI)–based results were validated at one site (UPMC) in a blinded study by comparing manual bright-field light microscopy and WSI evaluation of AFS slides to AI-assisted screening independently by two pathologists (pathologists A  and B). There was a 7-day minimum washout period be- tween review methods. This washout was selected for logistic reasons. While 2 weeks is often used for digital pathology validation studies, this is debated with argu- ments for selecting both longer and shorter washout periods. AI-based reviews also occurred last for logistic purposes. For manual microscopy, traditional light micro- scopes (Olympus BX45 or BX46) were used. The WSIs were examined using 24-inch monitors (Dell or HP) in landscape orientation, each with 1,920 × 1,080 resolution and standard dynamic range color space. When reviewing cases with AI support via the web portal, a minimum of 300 thumbnails/case was evaluated even though some cases had many more regions of interest displayed in the gallery. The reviewing pathologists recorded the range of time their assessments took (0-5 minutes, 6-10 minutes, 11+ minutes), AFB interpretation (negative, indetermi- nate, positive), quantity of AFBs identified (2 or fewer, 3-10, 11+ AFBs), and task difficulty (easy, medium, hard) for each slide. For reviews with the aid of the algorithm, the web portal recorded the precise review time in millisec- onds, number of positive regions the review pathologist detected, and the total number of regions recommended by the algorithm for review. All interpretations using all three of these modalities were compared with the original sign-out diagnoses and, in a subset of these cases, also to available microbiology (culture/PCR) results. Discrepant cases were not adjudicated. Statistical Analysis A receiver operating characteristic (ROC) curve was plotted and area under the ROC curve (AUC) was calculated to determine the accuracy of the algorithm once the training phase was completed using Python (sickit-learn12 and matplotlib13). For the clinical valida- tion study, pathologists’ interpretations were compared with each other, as well as with the original sign-out interpretation. All 138 slides were compared for both pathologists for each review method (microscope, WSI, algorithm assisted), including AFB interpretation, time to review, and difficulty. Comparisons to the original sign-out and original microscopic findings were based on the 78 cases. If any slide in a multislide case was positive, the case was considered positive. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were calculated for each review method, using the signed-out assess- ment as the gold standard. Interpretations were also compared with available microbiology results. PCR and other microscopic results were either not performed or not available for most cases used in the validation phase of the study. Therefore, results of microscopic findings were not separately used as a gold standard for sensi- tivity, specificity, NPV, PPV, and accuracy calculations. However, positive results would have affected the AFB assessment at sign-out and therefore are indirectly re- lated to the gold-standard comparison used in this study (sign-out interpretation). The Pearson χ 2 test was used to determine if the proportion of cases positive for AFBs was different among the review methods (microscope, WSI, AI) com- pared with the original interpretation. The Pearson χ 2 test was also used to determine if the time category, categorized quantity of AFBs, or difficulty in assess- ment was different among the reviewers. The normality of recorded distributions for continuous variables (AI-recorded review time, number of positive regions, percentage of total regions that were positive) was examined using the Shapiro-Wilk normality test. As the data were not normally distributed, nonparametric statistical tests were used. The Mann-Whitney test was used to compare pathologists’ review time (minutes) and, when using AI-assisted review, the positive re- gions identified. Statistical significance was assumed D ow nloaded from https://academ ic.oup.com /ajcp/advance-article/doi/10.1093/ajcp/aqaa215/6126000 by U niversity of M ichigan user on 08 April 2021 40 6 © American Society for Clinical Pathology Pantanowitz et al / AI-BASED SCREENING FOR MYCOBACTERIA Am J Clin Pathol 2021;XX:1-12 DOI: 10.1093/ajcp/aqaa215 at P < .05. Analyses were performed using IBM SPSS Statistics 22 (SPSS). Results Algorithm Performance The final AI-based algorithm was able to reliably de- tect AFBs in WSIs Figure 3 and Figure 4 , even when these microorganisms were not always in clear focus due to the limited image acquisition resolution of the commercial WSI scanners used. Table  2 summarizes the performance of the algorithm at the image patch and WSI levels. The final AUC was 0.960 at the image patch level Figure 5 and 0.900 at the WSI level. To calculate the WSI-level AUC score, a predicted score was computed for each image via averaging all predicted scores of patches higher than 0.5. If a WSI had no patch with a score over 0.5, the WSI-level predicted score was 0.  Table 3 shows the algorithm performance at the patch level if we use only the first, only the second, or both models. This illustrates that using both models pro- vided the highest AUC. In rare cases with abundant AFBs present, the algorithm returned many positive regions. Since too many image patches delayed the web-based application as they took long to upload into the gallery, a forced cutoff of 1,000 images/slide was implemented. A B Figure 3 Artificial intelligence–assisted detection of acid-fast bacilli (AFBs) is shown in a whole-slide image (WSI). A, Portion of WSI is shown with expert manual annotation (red squares) of AFBs. B, Predicted heatmap (orange) from the algorithm is shown overlaid on the exact same region of the WSI. D ow nloaded from https://academ ic.oup.com /ajcp/advance-article/doi/10.1093/ajcp/aqaa215/6126000 by U niversity of M ichigan user on 08 April 2021 41 7© American Society for Clinical Pathology AJCP / ORIGINAL ARTICLE Am J Clin Pathol 2021;XX:1-12 DOI: 10.1093/ajcp/aqaa215 Clinical Validation Findings A statistically significantly higher proportion of AI-assisted slide reviews identified the presence of AFBs (20.3%) than manually using a microscope (11.6%) or by the WSI (7.6%) review method (χ 2 = 7.787, P = .005; χ 2 = 18.488, P < .001, respectively). Figure 6 shows the overall proportion of slides identified as positive, as well as the estimation of the quantity of AFBs. AFBs were least likely to be detected when pathologists screened slides using WSI. Pathologist B indicated a significantly higher proportion of positive WSI slides (10.9%) than patholo- gist A (4.3%) (χ 2 = 4.175, P = .041). However, there was not a significant difference in the proportion of positive cases identified by either reviewer using microscope and algorithm-assisted methods. The number of thumbnails identified as positive was significantly higher for pathol- ogist B (median,  0.0%; mean rank,  145.2) than pathol- ogist A (median, 0.0%; mean rank, 131.8) (U = 8,601.5, P =  .040). Algorithm-assisted reviews had a higher rate of matching the original sign-out assessment (84.6%) than did reviews using the microscope (82.7%) and WSI Figure 4 Examples of artificial intelligence–detected acid-fast bacilli (AFB) structures. Upper left: sparse extracellular (arrow) AFBs (algorithm probability 0.98). Upper right: multiple AFBs within the green circle (algorithm probability 1.00). Lower left: scant intracellular AFBs in a cytology specimen (algorithm probability 0.94). Lower right: AFB mimic (arrow) located within normal tissue (algorithm probability 0.54). Table 2 Algorithm Performance at Image Patch and WSI Levels Image Sensitivity Specificity AUC Patch level 0.600 0.999 0.960 WSI level 0.833 0.800 0.900 AUC, area under the curve; WSI, whole-slide image. Figure 5 Area under the curve (AUC) for acid-fast bacilli algorithm detection in image patches. AUC = 0.960. Table 3 Comparison of Algorithm Performance Using Individual or Combined Convolutional Neural Network Models Applying CNN Models Specificity Sensitivity AUC First model only 0.96 0.66 0.95 Second model only 1.00 0.61 0.92 Both models 1.00 0.60 0.96 AUC, area under the curve; CNN, convolutional neural network. D ow nloaded from https://academ ic.oup.com /ajcp/advance-article/doi/10.1093/ajcp/aqaa215/6126000 by U niversity of M ichigan user on 08 April 2021 42 8 © American Society for Clinical Pathology Pantanowitz et al / AI-BASED SCREENING FOR MYCOBACTERIA Am J Clin Pathol 2021;XX:1-12 DOI: 10.1093/ajcp/aqaa215 (77.6%), although the differences were not statistically significant (χ 2 = 0.211, P = .646; χ 2 = 2.529, P = .112, re- spectively). Table 4 shows the sensitivity, specificity, PPV, NPV, and accuracy calculations by review method and re- viewer per case. Since the microscope and WSI methods had no false positives, the specificity and PPV for both of these review methods were 100%. The sensitivity, NPV, and accuracy were highest for algorithm-assisted reviews. Only limited cases had follow-up microbiology results available for comparison. For the three cases with subse- quent positive microbiology cultures, only two were in- terpreted as positive with all review modalities (one slide was missed by one pathologist using manual microscopy). The case with a positive PCR result had four slides that were included. Pathologist A  provided a negative AFB assessment for all four slides for each review method. Pathologist B provided a positive result for one of the four slides on microscope and WSI review but a negative result for all four slides on algorithm-assisted review. There was a statistically significant relationship between review time and pathologist for each review method Figure 7 . A statistically significantly higher pro- portion of algorithm-assisted reviews took 0 to 5 minutes (91.7%) than did microscope (66.3%) or WSI (47.1%) reviews (χ 2 = 53.480, P <  .001; χ 2 = 129.022, P <  .001, respectively). Table 5 shows the amount of time for pa- thologist review by method. A  significantly higher pro- portion of pathologist A’s microscopic reviews were completed in 0 to 5 minutes (89.1%) compared with pa- t