A machine learning approach for assessing sedimentological data's potential for in-situ gold grade prediction in the Witwatersrand basin, South Africa

Journal Title
Journal ISSN
Volume Title
Mineral resource estimation is one of the most crucial factors in the mining industry because it extracts information about the quantity and quality of a mineral deposit. Mining operations have traditionally taken a purely geostatistical approach to resource estimation; however, the global technology boom has significantly improved estimation techniques. Machine Learning (ML) has only recently proven relevant in geoscience. Several studies have alluded to it being a revolutionary resource estimation tool with the potential to improve overall prediction accuracies. ML enables the incorporation of geological properties (e.g., lithology, grain size); consequently, resource estimation models produced with geological context are expected to be of higher accuracy. This research aimed to investigate how incorporating sedimentological properties enables in-situ gold grade prediction in the Witwatersrand Basin. Sibanye-Stillwater provided a gold assay database comprising sample information collected from their Kloof mining operation (South Africa). This dataset was refined and used to produce ML models; geostatistical (kriging) models were later produced for comparison. The experimental component of this dissertation was programmed in Jupyter Notebook using the Python programming language. The Random Forest (RF), AdaBoost, K-Nearest neighbours (KNN) and Elastic Net (EN) were the machine learning algorithms (MLAs) of choice for this study. The original dataset comprised 55 304 datapoints, with sample properties including spatial coordinates, four sedimentological properties (Channel Width, Internal Waste, Conglomerate Percentage and Basal Contact), and the gold grade property. In this study, a sample area was selected; consequently, the final dataset used for predictive modelling comprised 10 682 datapoints (19.31% of original). The predictive modelling results proved that it is possible to produce high-accuracy mineral resource models when predicting gold grades using MLAs; however, the quality of the data plays a significant role in the extent to which sedimentological properties can be useful in this regard. An extensive quality assurance and quality control (QAQC) workflow had to be implemented to attain good predictions from what was originally a very poor model. Investigations into the spatial distribution of the well-predicting and poor-predicting datapoints revealed a spatial pattern in the ‘bad’ dataset while the ‘good’ dataset had none, suggesting that the poor-predicting samples may have been extracted off-reef. Analysing the differences between the ‘good’ and ‘bad’ datasets thus revealed a prominent data quality issue that may have resulted from errors during sampling. Overall, this study has produced a resource estimation methodology that can be replicated and used on any gold deposit in the Witwatersrand Basin, where similar types of data exist. Therefore, similar deposits in other locations can also use the same methodology.
A research report submitted in fulfilment of the requirements for the degree of Master of Science to the Faculty of Science in Geology, School of Geosciences, University of the Witwatersrand, Johannesburg, 2023