Hydrological data interpolation using entropy

Ilunga, Masengo
Journal Title
Journal ISSN
Volume Title
The problem of missing data, insufficient length of hydrological data series and poor quality is common in developing countries. This problem is much more prevalent in developing countries than it is in developed countries. This situation can severely affect the outcome of the water systems managers’ decisions (e.g. reliability of the design, establishment of operating policies for water supply, etc). Thus, numerous data interpolation (infilling) techniques have evolved in hydrology to deal with the missing data. The current study presents merely a methodology by combining different approaches and coping with missing (limited) hydrological data using the theories of entropy, artificial neural networks (ANN) and expectation-maximization (EM) techniques. This methodology is simply formulated into a model named ENANNEX model. This study does not use any physical characteristics of the catchment areas but deals only with the limited information (e.g. streamflow or rainfall) at the target gauge and its similar nearby base gauge(s). The entropy concept was confirmed to be a versatile tool. This concept was firstly used for quantifying information content of hydrological variables (e.g. rainfall or streamflow). The same concept (through directional information transfer index, i.e. DIT) was used in the selection of base/subject gauge. Finally, the DIT notion was also extended to the evaluation of the hydrological data infilling technique performance (i.e. ANN and EM techniques). The methodology was applied to annual total rainfall; annual mean flow series, annual maximum flows and 6-month flow series (means) of selected catchments in the drainage region D “Orange” of South Africa. These data regimes can be regarded as useful for design-oriented studies, flood studies, water balance studies, etc. The results from the case studies showed that DIT is as good index for data infilling technique selection as other criteria, e.g. statistical and graphical. However, the DIT has the feature of being non-dimensionally informational index. The data interpolation iii techniques viz. ANNs and EM (existing methods applied and not yet applied in hydrology) and their new features have been also presented. This study showed that the standard techniques (e.g. Backpropagation-BP and EM) as well as their respective variants could be selected in the missing hydrological data estimation process. However, the capability for the different data interpolation techniques of maintaining the statistical characteristics (e.g. mean, variance) of the target gauge was not neglected. From this study, the relationship between the accuracy of the estimated series (by applying a data infilling technique) and the gap duration was then investigated through the DIT notion. It was shown that a decay (power or exponential) function could better describe that relationship. In other words, the amount of uncertainty removed from the target station in a station-pair, via a given technique, could be known for a given gap duration. It was noticed that the performance of the different techniques depends on the gap duration at the target gauge, the station-pair involved in the missing data estimation and the type of the data regime. This study showed also that it was possible, through entropy approach, to assess (preliminarily) model performance for simulating runoff data at a site where absolutely no record exist: a case study was conducted at Bedford site (in South Africa). Two simulation models, viz. RAFLER and WRSM2000 models, were then assessed in this respect. Both models were found suitable for simulating flows at Bedford.
Faculty of Engineering and Built Enviroment School of Civil and Enviromental Engineering 0105772w imasengo@yahoo.com
Missing data, Interpolation, Entropy, Artifical neural Networks, Expectation maximization