ETD Collection
Permanent URI for this collectionhttps://wiredspace.wits.ac.za/handle/10539/104
Please note: Digitised content is made available at the best possible quality range, taking into consideration file size and the condition of the original item. These restrictions may sometimes affect the quality of the final published item. For queries regarding content of ETD collection please contact IR specialists by email : IR specialists or Tel : 011 717 4652 / 1954
Follow the link below for important information about Electronic Theses and Dissertations (ETD)
Library Guide about ETD
Browse
2 results
Search Results
Item Computational intelligence techniques for missing data imputation(2008-08-14T12:08:53Z) Nelwamondo, Fulufhelo VincentDespite considerable advances in missing data imputation techniques over the last three decades, the problem of missing data remains largely unsolved. Many techniques have emerged in the literature as candidate solutions, including the Expectation Maximisation (EM), and the combination of autoassociative neural networks and genetic algorithms (NN-GA). The merits of both these techniques have been discussed at length in the literature, but have never been compared to each other. This thesis contributes to knowledge by firstly, conducting a comparative study of these two techniques.. The significance of the difference in performance of the methods is presented. Secondly, predictive analysis methods suitable for the missing data problem are presented. The predictive analysis in this problem is aimed at determining if data in question are predictable and hence, to help in choosing the estimation techniques accordingly. Thirdly, a novel treatment of missing data for online condition monitoring problems is presented. An ensemble of three autoencoders together with hybrid Genetic Algorithms (GA) and fast simulated annealing was used to approximate missing data. Several significant insights were deduced from the simulation results. It was deduced that for the problem of missing data using computational intelligence approaches, the choice of optimisation methods plays a significant role in prediction. Although, it was observed that hybrid GA and Fast Simulated Annealing (FSA) can converge to the same search space and to almost the same values they differ significantly in duration. This unique contribution has demonstrated that a particular interest has to be paid to the choice of optimisation techniques and their decision boundaries. iii Another unique contribution of this work was not only to demonstrate that a dynamic programming is applicable in the problem of missing data, but to also show that it is efficient in addressing the problem of missing data. An NN-GA model was built to impute missing data, using the principle of dynamic programing. This approach makes it possible to modularise the problem of missing data, for maximum efficiency. With the advancements in parallel computing, various modules of the problem could be solved by different processors, working together in parallel. Furthermore, a method for imputing missing data in non-stationary time series data that learns incrementally even when there is a concept drift is proposed. This method works by measuring the heteroskedasticity to detect concept drift and explores an online learning technique. New direction for research, where missing data can be estimated for nonstationary applications are opened by the introduction of this novel method. Thus, this thesis has uniquely opened the doors of research to this area. Many other methods need to be developed so that they can be compared to the unique existing approach proposed in this thesis. Another novel technique for dealing with missing data for on-line condition monitoring problem was also presented and studied. The problem of classifying in the presence of missing data was addressed, where no attempts are made to recover the missing values. The problem domain was then extended to regression. The proposed technique performs better than the NN-GA approach, both in accuracy and time efficiency during testing. The advantage of the proposed technique is that it eliminates the need for finding the best estimate of the data, and hence, saves time. Lastly, instead of using complicated techniques to estimate missing values, an imputation approach based on rough sets is explored. Empirical results obtained using both real and synthetic data are given and they provide a valuable and promising insight to the problem of missing data. The work, has significantly confirmed that rough sets can be reliable for missing data estimation in larger and real databases.Item Hydrological data interpolation using entropy(2006-11-17T07:40:07Z) Ilunga, MasengoThe problem of missing data, insufficient length of hydrological data series and poor quality is common in developing countries. This problem is much more prevalent in developing countries than it is in developed countries. This situation can severely affect the outcome of the water systems managers’ decisions (e.g. reliability of the design, establishment of operating policies for water supply, etc). Thus, numerous data interpolation (infilling) techniques have evolved in hydrology to deal with the missing data. The current study presents merely a methodology by combining different approaches and coping with missing (limited) hydrological data using the theories of entropy, artificial neural networks (ANN) and expectation-maximization (EM) techniques. This methodology is simply formulated into a model named ENANNEX model. This study does not use any physical characteristics of the catchment areas but deals only with the limited information (e.g. streamflow or rainfall) at the target gauge and its similar nearby base gauge(s). The entropy concept was confirmed to be a versatile tool. This concept was firstly used for quantifying information content of hydrological variables (e.g. rainfall or streamflow). The same concept (through directional information transfer index, i.e. DIT) was used in the selection of base/subject gauge. Finally, the DIT notion was also extended to the evaluation of the hydrological data infilling technique performance (i.e. ANN and EM techniques). The methodology was applied to annual total rainfall; annual mean flow series, annual maximum flows and 6-month flow series (means) of selected catchments in the drainage region D “Orange” of South Africa. These data regimes can be regarded as useful for design-oriented studies, flood studies, water balance studies, etc. The results from the case studies showed that DIT is as good index for data infilling technique selection as other criteria, e.g. statistical and graphical. However, the DIT has the feature of being non-dimensionally informational index. The data interpolation iii techniques viz. ANNs and EM (existing methods applied and not yet applied in hydrology) and their new features have been also presented. This study showed that the standard techniques (e.g. Backpropagation-BP and EM) as well as their respective variants could be selected in the missing hydrological data estimation process. However, the capability for the different data interpolation techniques of maintaining the statistical characteristics (e.g. mean, variance) of the target gauge was not neglected. From this study, the relationship between the accuracy of the estimated series (by applying a data infilling technique) and the gap duration was then investigated through the DIT notion. It was shown that a decay (power or exponential) function could better describe that relationship. In other words, the amount of uncertainty removed from the target station in a station-pair, via a given technique, could be known for a given gap duration. It was noticed that the performance of the different techniques depends on the gap duration at the target gauge, the station-pair involved in the missing data estimation and the type of the data regime. This study showed also that it was possible, through entropy approach, to assess (preliminarily) model performance for simulating runoff data at a site where absolutely no record exist: a case study was conducted at Bedford site (in South Africa). Two simulation models, viz. RAFLER and WRSM2000 models, were then assessed in this respect. Both models were found suitable for simulating flows at Bedford.