3. Electronic Theses and Dissertations (ETDs) - All submissions

Permanent URI for this communityhttps://wiredspace.wits.ac.za/handle/10539/45

Browse

Search Results

Now showing 1 - 4 of 4

Convex optimization for rank-sparsity decomposition, with application to the planted quasi-clique problem
(2020) Abdulsalaam, Sakirudeen A
We consider the rank-sparsity decomposition problem with its application to the planted quasi-clique recovery in this thesis. Given a matrix which is a superposition of a low rank and a sparse matrix, the rank sparsity decomposition problem answers the question, “when is it possible to decompose the matrix into its low rank and sparse components?”. The common convex formulation for this problem is to minimize a weighted combination of the nuclear norm and thel1-norm. To prove optimality of solutions with this formulation, it is customary to derive a bound on the dual matrix which certifies the optimality of the solution. Among the methodological contributions of this thesis is the sharp theoretical bounds obtained for the dual matrix. We have improved the results on low rank matrix decomposition by deriving the bound on our dual matrix, using the matrix l∞,2 norm. Moreover, we established conditions under which recovery is achievable by deriving a dual matrix, certifying the optimality of our solution. We adapt the convex formulation for the rank-sparstity decomposition to the planted quasi-clique problem. This problem is a generalization of the planted clique problem which is known to be NP-hard. This problem has applications in areas such as com-munity detection, data mining, bioinformatics, and criminal network analysis. We have considered mathematical modelling, theoretical framework, and computational aspects of the problem. We showed that the planted quasi-clique can be recovered using con-vex programming. We have achieved this by adapting techniques from low rank matrix decomposition to the planted quasi-clique problem. Our numerical results show that when the input graph contains the desired single large dense subgraph and a moderate number of diversionary vertices and edges, the relaxation is exact. We have shown, numerically, the superiority of our formulation over the only existing Mixed Integer Programming (MIP) formulations. Further, we present a simplified proof to show that quasi-cliques also posses what is known as quasi-hereditary property. This property can be exploited to develop enumerative algorithm for the problem
A study to use a data mining approach to classify customer price sensitivity using a retail banking foreign exchange historical dataset
(2018) Maplanka, Ntombizodwa
Data analysis combined with machine learning has become an essential part of the modern scienti c methodology, o ering automated procedures for the prediction of a phenomenon based on past observations, identifying underlying patterns in data and providing insights about the problem. This thesis seeks to demonstrate the use of data mining techniques to classify price sensitive customers using a historical dataset from a retail banking forex department. Two data sets were merged (customer data and deals data), and statistical models were tted and compared; namely decision trees, random forests and neural networks. All models produced excellent results when tted on the datasets; the random forests performed slightly better with marginal improvements over decision trees and neural networks. These models gave the area under the receiver operating characteristic curve of at least 0.90 and percentage correctly classi ed of least 0.95 for the datasets. Apart from making the most accurate predictions of the response variable random forests and decision trees were used to identify predictor variables that are most important to make these predictions. The study shows that in retail banking under the given setting, the foreign exchange division can price the clients appropriately and increase competitive edge by using data mining techniques to predict customers' price sensitivity to foreign exchange rates. The next step for the bank is to use these methods to retain the customers, increase revenue as well as make improvements in pricing where warranted.
Knowledge extraction in population health datasets: an exploratory data mining approach
(2018) Khangamwa, Gift
Thereisagrowingtrendintheutilizationofmachinelearninganddataminingtechniques for knowledge extraction in health datasets. In this study, we used machine learning methods for data exploration and model building and we built classiﬁer models for anemia. Anemia is recognized as a crucial public health challenge that leads to poor health for mothers and infants and one of its main causes is malaria. WeusedadatasetfromMalawiwheretheprevalenceofthesetwohealthchallenges of malaria and anemia remains high. We employed machine learning algorithms for the task of knowledge extraction on these demographic and health data sets for Malawi for the survey years 2004 and 2010. We followed the cross-industry standard processfordataminingmethodologytoguideourstudy. Thedatasetwasobtained, cleaned and prepared for experimentation. Unsupervised machine learning methods were used to understand the nature of the data set and the natural groupings in it. On the other hand,supervised machine learning methods were used to build predictive models for anemia. Speciﬁcally, we used principal component analysis and clustering algorithms in our unsupervised machine learning experiments. Support vector machines and decision trees were used in the supervised machine learning experiments. Unsupervised ML methods revealed that there was no signiﬁcant separation of clustering according to both malaria and anemia attributes. However, attributes such as age, economic status, health practices attributes and number of children a woman has, were clustered insigniﬁcantly different ways,i.e.,young and old women went to different clusters. Moreover, PCA results conﬁrmed these ﬁndings. Supervised methods, on the other hand, revealed that anemia classiﬁers could be developed using SVM and DTs for the dataset. The best performing models attained accuracy of 86%, ROC area score of 86%, mean absolute error of 0.27, and kappaof 0.78,which was built using an SVM model having C = 100, γ = 10−18. On the other hand, DTs produced the best model having accuracy 73%, ROC area score 74%, mean absolute error 0.36 and Kappa statistic of 0.449. In conclusion, we successfullybuiltagoodanemiaclassiﬁerusingSVMandalsoshowedtherelationship between important attributes in the classiﬁcation of anemia.
Contextualized risk mitigation based on geological proxies in alluvial diamond mining using geostatistical techniques
(2016) Jacob, Jana
Quantifying risk in the absence of hard data presents a significant challenge. Onshore mining of the diamondiferous linear beach deposit along the south western coast of Namibia has been ongoing for more than 80 years. A historical delineated campaign from the 1930s to 1960s used coast perpendicular trenches spaced 500 m apart, comprising a total of 26 000 individual samples, to identify 6 onshore raised beaches. These linear beaches extend offshore and are successfully mined in water depths deeper than 30 m. There is, however, a roughly 4 km wide submerged coast parallel strip adjacent to the mostly mined out onshore beaches for which no real hard data is available at present. The submerged beaches within the 4 km coast parallel strip hold great potential for being highly diamondiferous. To date hard data is not yet available to quantify or validate this potential. The question is how to obtain sufficient hard data within the techno economic constraints to enable a resource with an acceptable level of confidence to be developed. The work presented in this thesis illustrates how virtual orebodies (VOBs) are created based on geological proxies in order to have a basis to assess and rank different sampling and drilling strategies. Overview of 4 papers Paper I demonstrates the challenge of obtaining a realistic variogram that can be used in variogram-based geostatistical simulations. Simulated annealing is used to unfold the coastline and improve the detectable variography for a number of the beaches. Paper II shows how expert opinion interpretation is used to supplement sparse data that is utilised to create an indicator simulation to study the presence and absence of diamondiferous gravel. When only the sparse data is used the resultant simulation is unsuitable as a VOB upon which drilling strategies can be assessed. Paper III outlines how expert opinion hand sketches are used to create a VOB. The composite probability map based on geological proxies is adjusted using a grade profile based on adjacent onshore data before it is seeded with stones and used as a VOB for strategy testing. Paper IV illustrates how the Nachman model based on a Negative Binomial Distribution (NBD) is used to predict a minimum background grade by considering only the zero proportions (Zp) of the grade data. v Conclusions and future work In the realm of creating spatial simulations that can serve as VOBs it is very difficult to attempt to quantify uncertainty when no hard data is available. In the absence of hard data, geological proxies and expert opinion are the only inputs that can be used to create VOBs. Subsequently these VOBs are used as a base to be analysed in order to evaluate and rank different sampling and drilling strategies based on techno economic constraints. VOBs must be updated and reviewed as hard data becomes available after which sampling strategies should be reassessed. During early stage exploration projects the Zp of sample results can be used to predict a minimum background grade and rank different targets for further sampling and valuation. The research highlights the possibility that multi point statistics (MPS) can be used. Higher order MPS should be further investigated as an additional method for creating VOBs upon which sampling strategies can be assessed.

3. Electronic Theses and Dissertations (ETDs) - All submissions

Browse

Filters

Settings

Sort By

Results per page

Search Results