The dynamics of pathology dataset creation using urine cytology as an example

McAlpine, Ewen; Michelow, Pamela; Celik, Turgay

The dynamics of pathology dataset creation using urine cytology as an example

dc.citation.doi	10.1159/000519273
dc.citation.epage	9
dc.citation.spage	1
dc.contributor.author	McAlpine, Ewen
dc.contributor.author	Michelow, Pamela
dc.contributor.author	Celik, Turgay
dc.date.accessioned	2023-04-13T08:04:14Z
dc.date.available	2023-04-13T08:04:14Z
dc.description.abstract	Introduction: Dataset creation is one of the first tasks required for training AI algorithms but is underestimated in pathology. High-quality data are essential for training algorithms and data should be labelled accurately and include sufficient morphological diversity. The dynamics and challenges of labelling a urine cytology dataset using The Paris System (TPS) criteria are presented. Methods: 2,454 images were labelled by pathologist consensus via video conferencing over a 14-day period. During the labelling sessions, the dynamics of the labelling process were recorded. Quality assurance images were randomly selected from images labelled in previous sessions within this study and randomly distributed throughout new labelling sessions. To assess the effect of time on the labelling process, the labelled set of images was split into 2 groups according to the median relative label time and the time taken to label images and intersession agreement were assessed. Results: Labelling sessions ranged from 24 m 11 s to 41 m 06 s in length, with a median of 33 m 47 s. The majority of the 2,454 images were labelled as benign urothelial cells, with atypical and malignant urothelial cells more sparsely represented. The time taken to label individual images ranged from 1 s to 42 s with a median of 2.9 s. Labelling times differed significantly among categories, with the median label time for the atypical urothelial category being 7.2 s, followed by the malignant urothelial category at 3.8 s and the benign urothelial category at 2.9 s. The overall intersession agreement for quality assurance images was substantial. The level of agreement differed among classes of urothelial cells - benign and malignant urothelial cell classes showed almost perfect agreement and the atypical urothelial cell class showed moderate agreement. Image labelling times seemed to speed up, and there was no evidence of worsening of intersession agreement with session time. Discussion/conclusion: Important aspects of pathology dataset creation are presented, illustrating the significant resources required for labelling a large dataset. We present evidence that the time taken to categorise urine cytology images varies by diagnosis/class. The known challenges relating to the reproducibility of the AUC (atypical) category in TPS when compared to the NHGUC (benign) or HGUC (malignant) categories is also confirmed.
dc.identifier.citation	McAlpine ED, Michelow PM, Celik T. The Dynamics of Pathology Dataset Creation Using Urine Cytology as an Example. Acta Cytol. 2022;66(1):46-54. doi:10.1159/000519273
dc.identifier.issn	0001-5547
dc.identifier.uri	https://hdl.handle.net/10539/34989
dc.journal.title	Acta Cytologica
dc.journal.volume	66
dc.subject	Digital pathology
dc.subject	Machine learning
dc.subject	The Paris System
dc.subject	Urine cytology
dc.title	The dynamics of pathology dataset creation using urine cytology as an example
dc.type	Article

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Journal.pdf
Size:: 952.3 KB
Format:: Adobe Portable Document Format
Description:: Journal

Download

Collections

School of Anatomical Sciences (Journal Articles)
Academic Wits Research Outputs (All submissions)