Electronic Theses and Dissertations (Masters)
Permanent URI for this collectionhttps://hdl.handle.net/10539/38012
Browse
Search Results
Item Machine Learning Algorithms-Based Classification of Lithology using Geophysical Logs: ICDP DSeis Project Boreholes, South Africa(University of the Witwatersrand, Johannesburg, 2024-09) Atita, Obehi Chapet; Durrheim, Raymond; Saffou, EricOne of the most significant geosciences tasks is the accurate classification of lithologies for metal and mineral resources exploration, characterization of oil/gas reservoir(s), and the planning and management of mining operations. With the availability of abundant, huge and multidimensional datasets, machine learning-based data-driven methods have been widely adopted to assist in solving geoscientific problems such as the efficient evaluation and interpretation of large datasets. The adoption of machine learning-based methods aims to improve lithological identification accuracy and extract information required for accurate and objective decision-making with respect to activities such as exploration, drilling, mine planning and production. Practically, this helps to reduce working time and operating costs. We aim to evaluate the feasibility of machine learning-based algorithms application to geophysical log data for the automated classification of lithologies based on the stratigraphic unit at the formation level for the purpose of distinguishing and correlating the quartzites between boreholes, and mapping key radioactive zones within the mining horizon. This study implemented four different machine learning algorithms: gradient boosting decision trees, random forest, support vector machine, and K-means clustering models. Analyzed features and labelled datasets are multivariate downhole geophysical and lithology logs from the two ICDP DSeis project boreholes drilled in the Klerksdorp gold field, respectively. To mitigate misclassification error and avoid model overfitting/underfitting, the optimal combination sets and optimal values for each implemented supervised model’s hyperparameters were obtained using the Grid search and 10-fold cross-validation optimization methods. The input dataset was randomly split automatedly into training and testing subsets that made up 80% and 20% of the original dataset, respectively. The models were trained and cross-validated using the training subset, and their performances were assessed using the testing subset. The classification performance of each model was evaluated using F1 scores and visualized using confusion matrices. The best supervised classification model for our study area was selected based on the testing subset F1 scores and computational cost of training models. The testing subset results shows that Random Forest and Support Vector Machine classifier models performed much better relative to the Gradient Boosting Decision Trees classifier model, with F1 scores over 0.80 in borehole A and B. In borehole A and B, Random Forest classifier has the least computational training time of about 14- and 6- hours, respectively. The feature importance results demonstrate that the logging feature P-wave velocity (Vp) is the highest predicting feature to the lithology classification in both boreholes. We find that the quartzite classes at different stratigraphic positions in each borehole are similar and they are correlated between the DSeis boreholes. The K-means clustering revealed three clusters in this study area and effectively map the radioactive zones. This study illustrates that geophysical log data and machine learning-based algorithms can improve the task of data analysis in the geosciences with accurate, reproducible and automated prediction of lithologies, correlation and mapping of radioactive zones in gold mine. This study outputs can serve as quality control measures for future similar studies both in the academic and industry. We identified that availability of large data is the major factor to high accuracy performance of machine learning-based algorithms for classification problems.