3. Electronic Theses and Dissertations (ETDs) - All submissions
Permanent URI for this communityhttps://wiredspace.wits.ac.za/handle/10539/45
Browse
4 results
Search Results
Item Metaheuristic approaches for scheduling of multipurpose batch plants(2018) Woolway, Mathew JohnThe field of batch chemical process has seen a significant rise in research over the last five decades as changes in the economic climate have lead to an increased demand for the manufacturing of high-value small-volume products. Due to the dependency on time, batch processes are considerably more complex than their continuous process counterparts. The predominant approach in batch process literature makes use of mathematical programming, whereby binary variables are utilised to indicate the assignment of certain tasks to capable units. This mathematical programming strategy, coupled with the aforementioned time complexity can lead to computational intractability due to the extended enumeration of binary variables. In this thesis, the reduction of computational time requiredinthesolutionofmultipurposebatchplantschedulingisconsidered. Due to the infeasible computational times required to solve mathematical programming models in multipurpose batch plant scheduling, often close-to optimal solutions rather than global optimal solutions are accepted. If close-to optimal solutions are acceptable then it is reasonable to explore non-deterministic metaheuristic strategies to reduce the required computational time. In order to apply these strategies, generalised frameworks consistent with metaheuristic approaches are necessary. Presently, no decoupled generalised framework suitable for various metaheuristic implementation exists in the literature. As a result, this thesis presents two novel mathematical frameworks for the representation of batch scheduling. Specifically, one framework for discrete-time approaches and another framework for continuous-time approaches. In each framework, two well-known literature examples are considered. In addition, three metaheuristic techniques are applied to these literature examples, namely, genetic algorithms (GA), simulated annealing (SA) and migrating bird optimisation (MBO). The resultant framework allows for experimentation of 12 variants of the literature examplestobeinvestigated,whichcanbecomparedtothecurrentlyacceptedmixed integerlinearprogramming(MILP)approach. In the aforementioned experiment, simulated results with the metaheuristics implemented under the newly introduced frameworks showed a reduction in computational time of up to 99.96% in the discrete-time approach and 99.68% in the continuous-time approach. Additionally, the genetic algorithm showed to be the best performer of the metaheuristic suite, often obtaining the global optimum in short-time horizons and close-to-optimal solutions in the medium-to-long time horizons. Furthermore, parallel implementations were explored and showed additional time reduction would be possible, with certain workloads terminating 2 ordersofmagnitudelessincomputationaltimethanserialimplementations. Theresultsshowtheapplicationofmetaheuristicstotheschedulingofmultipurpose batch plants are indeed appropriate and are able to obtain close-to-optimal solutions to that of their MILP counterparts at considerably reduced computational times.Item Knowledge extraction in population health datasets: an exploratory data mining approach(2018) Khangamwa, GiftThereisagrowingtrendintheutilizationofmachinelearninganddataminingtechniques for knowledge extraction in health datasets. In this study, we used machine learning methods for data exploration and model building and we built classifier models for anemia. Anemia is recognized as a crucial public health challenge that leads to poor health for mothers and infants and one of its main causes is malaria. WeusedadatasetfromMalawiwheretheprevalenceofthesetwohealthchallenges of malaria and anemia remains high. We employed machine learning algorithms for the task of knowledge extraction on these demographic and health data sets for Malawi for the survey years 2004 and 2010. We followed the cross-industry standard processfordataminingmethodologytoguideourstudy. Thedatasetwasobtained, cleaned and prepared for experimentation. Unsupervised machine learning methods were used to understand the nature of the data set and the natural groupings in it. On the other hand,supervised machine learning methods were used to build predictive models for anemia. Specifically, we used principal component analysis and clustering algorithms in our unsupervised machine learning experiments. Support vector machines and decision trees were used in the supervised machine learning experiments. Unsupervised ML methods revealed that there was no significant separation of clustering according to both malaria and anemia attributes. However, attributes such as age, economic status, health practices attributes and number of children a woman has, were clustered insignificantly different ways,i.e.,young and old women went to different clusters. Moreover, PCA results confirmed these findings. Supervised methods, on the other hand, revealed that anemia classifiers could be developed using SVM and DTs for the dataset. The best performing models attained accuracy of 86%, ROC area score of 86%, mean absolute error of 0.27, and kappaof 0.78,which was built using an SVM model having C = 100, γ = 10−18. On the other hand, DTs produced the best model having accuracy 73%, ROC area score 74%, mean absolute error 0.36 and Kappa statistic of 0.449. In conclusion, we successfullybuiltagoodanemiaclassifierusingSVMandalsoshowedtherelationship between important attributes in the classification of anemia.Item Social media analytics and the role of twitter in the 2014 South Africa general election: a case study(2018) Singh, AsheenSocial network sites such as Twitter have created vibrant and diverse communities in which users express their opinions and views on a variety of topics such as politics. Extensive research has been conducted in countries such as Ireland, Germany and the United States, in which text mining techniques have been used to obtain information from politically oriented tweets. The purpose of this research was to determine if text mining techniques can be used to uncover meaningful information from a corpus of political tweets collected during the 2014 South African General Election. The Twitter Application Programming Interface was used to collect tweets that were related to the three major political parties in South Africa, namely: the African National Congress (ANC), the Democratic Alliance (DA) and the Economic Freedom Fighters (EFF). The text mining techniques used in this research are: sentiment analysis, clustering, association rule mining and word cloud analysis. In addition, a correlation analysis was performed to determine if there exists a relationship between the total number of tweets mentioning a political party and the total number of votes obtained by that party. The VADER (Valence Aware Dictionary for sEntiment Reasoning) sentiment classifier was used to determine the public’s sentiment towards the three main political parties. This revealed an overwhelming neutral sentiment of the public towards the ANC, DA and EFF. The result produced by the VADER sentiment classifier was significantly greater than any of the baselines in this research. The K-Means cluster algorithm was used to successfully cluster the corpus of political tweets into political-party clusters. Clusters containing tweets relating to the ANC and EFF were formed. However, tweets relating to the DA were scattered across multiple clusters. A fairly strong relationship was discovered between the number of positive tweets that mention the ANC and the number of votes the ANC received in election. Due to the lack of data, no conclusions could be made for the DA or the EFF. The apriori algorithm uncovered numerous association rules, some of which were found to be interest- ing. The results have also demonstrated the usefulness of word cloud analysis in providing easy-to-understand information from the tweet corpus used in this study. This research has highlighted the many ways in which text mining techniques can be used to obtain meaningful information from a corpus of political tweets. This case study can be seen as a contribution to a research effort that seeks to unlock the information contained in textual data from social network sites.Item Chunked extendible arrays and its integration with the global array toolkit for parallel image processing(2016) Nimako, GideonSeveral meetings of the Extremely Large Databases Community for large scale scientific applications have advocated the use of multidimensional arrays as the appropriate model for representing scientific databases. Scientific databases gradually grow to massive sizes of the order of terabytes and petabytes. As such, the storage of such databases requires efficient dynamic storage schemes where the array is allowed to arbitrarily extend the bounds of the dimensions. Conventional multidimensional array representations in today’s programming environments do not extend or shrink their bounds without relocating elements of the data-set. In general extendibility of the bounds of the dimensions is limited to only one dimension. This thesis presents a technique for storing dense multidimensional arrays by chunks such that the array can be extended along any dimension without compromising the access time of an element. This is done with a computed access mapping function that maps the k-dimensional index onto a linear index of the storage locations. This concept forms the basis for the implementation of an array file of any number of dimensions, where the bounds of the array dimension can be extended arbitrarily. Such a feature currently exists in the Hierarchical Data Format version 5 (HDF5). However, extending the bound of a dimension in the HDF5 array file can be unusually expensive in time. Such extensions, in our storage scheme for dense array files, can be performed while still accessing elements of the array at orders of magnitude faster than in HDF5 or conventional array-files. We also present Parallel Chunked Extendible Dense Array (PEXTA), a new parallel I/O model for the Global Array Toolkit. PEXTA provides the necessary Application Programming Interface (API) for explicit data transfer between the memory resident global array and its secondary storage counterpart but also allows the persistent array to be extended on any dimension without compromising the access time of an element or sub-array elements. Such APIs provide a platform for high speed and parallel hyperspectral image processing without performance degradation, even when the imagery files undergo extensions.