ETD Collection
Permanent URI for this collectionhttps://wiredspace.wits.ac.za/handle/10539/104
Please note: Digitised content is made available at the best possible quality range, taking into consideration file size and the condition of the original item. These restrictions may sometimes affect the quality of the final published item. For queries regarding content of ETD collection please contact IR specialists by email : IR specialists or Tel : 011 717 4652 / 1954
Follow the link below for important information about Electronic Theses and Dissertations (ETD)
Library Guide about ETD
Browse
3 results
Search Results
Item Using customised image processing for noise reduction to extract data from early 20th century African newspapers(2017) Usher, SarahThe images from the African articles dataset presented challenges to the Optical Character Recognition (OCR) tool. Despite successful binerisation in the Image Processing step of the pipeline, noise remained in the foreground of the images. This noise caused the OCR tool to misinterpret the text from the images and thus needed removal from the foreground. The technique involved the application of the Maximally Stable Extremal Region (MSER) algorithm, borrowed from Scene-Text Detection, and supervised machine learning classifiers. The algorithm creates regions from the foreground elements. Regions are classifiable into noise and characters based on the characteristics of their shapes. Classifiers were trained to recognise noise and characters. The technique is useful for a researcher wanting to process and analyse the large dataset. They could semi-automate the foreground noise-removal process using this technique. This would allow for better quality OCR output, for use in the Text Analysis step of the pipeline. Better OCR quality means less compromises would be required at the Text Analysis step. These concessions can lead to false results when searching noisy text. Fewer compromises means simpler, less error-prone analysis and more trustworthy results. The technique was tested against specifically selected images from the dataset which exhibited noise. It involved a number of steps. Training regions were selected and manually classified. After training and running many classifiers, the highest performing classifier was selected. The classifier categorised regions from all images. New images were created by removing noise regions from the original images. To discover whether an improvement in the OCR output was achieved, a text comparison was conducted. OCR text was generated from both the original and processed images. The two outputs of each image were compared for similarity against the test text. The test text was a manually created version of the expected OCR output per image. The similarity test for both original and processed images produced a score. A change in the similarity score indicated whether the technique had successfully removed noise or not. The test results showed that blotches in the foreground could be removed, and OCR output improved. Bleed-through and page fold noise was not removable. For images affected by noise blotches, this technique can be applied and hence less concessions will be needed when processing the text generated from those images.Item Modelling temperature in South Africa using extreme value theory(2018) Nemukula, Murendeni M.This dissertation focuses on demonstrating the use of extreme value theory in modelling temperature in South Africa. The purpose of modelling temperature is to investigate the frequency of occurrences of extremely low and extremely high temperatures and how they influence the demand of electricity over time. The data comprise a time series of average hourly temperatures that are collected by the South African Weather Service over the period 2000−2010 and supplied by Eskom. The generalized extreme value distribution (GEVD) for r largest order statistics is fitted to the average maximum daily temperature (non-winter season) using the maximum likelihood estimation method and used to estimate extreme high temperatures which result in high demand of electricity due to use of cooling systems. The estimation of the shape parameter reveals evidence that the Weibull family of distributions is an appropriate fit to the data. A frequency analysis of extreme temperatures is carried out and the results show that most of the extreme temperatures are experienced during the months January, February, November and December of each year. The generalized Pareto distribution (GPD) is firstly used for modelling the average minimum daily temperatures for the period January 2000 to August 2010. A penalized regression cubic smoothing spline is used as a time varying threshold. We then extract excessesabovethecubicregressionsmoothingsplineandfitanon-parametricmixturemodel to get a sufficiently high threshold. The data exhibit evidence of short-range dependence and high seasonality which lead to the declustering of the excesses above the threshold and fit the GPD to cluster maxima. The estimate of the shape parameter shows that the Weibullfamilyofdistributionsisappropriateinmodellingtheuppertailofthedistribution. The stationary GPD and the piecewise linear regression models are used in modelling the influence of temperature above the reference point of 22◦C on the demand of electricity. The stationary and non-stationary point process models are fitted and used in determining the frequency of occurrence of extremely high temperatures. The orthogonal and the reparameterizationapproachesofdeterminingthefrequencyandintensityofextremeshave i been used to establish that, extremely hot days occur in frequencies of 21 and 16 days per annum, respectively. For the fact that temperature is established as a major driver of electricity demand, this dissertation is relevant to the system operators, planners and decision makers in Eskom and most of the utility and engineering companies. Our results are furtherusefultoEskomsinceitisduringthenon-winterperiodthattheyplanformaintenance of their power plants. Modelling temperature is important for the South African economy since electricity sector is considered as one of the most weather sensitive sectors of the economy. Over and above, the modelling approaches that are presented in this dissertation are relevant for modelling heat waves which impose several impacts on energy, economy and health of our citizens.Item An experimental system for computer aided bird call recognition(2014-02-07) Colombick, Illan Samson