Evaluating the reliability of quantification results

Feni, Democracy Yongama

Evaluating the reliability of quantification results

dc.contributor.author	Feni, Democracy Yongama
dc.date.accessioned	2022-07-22T10:33:17Z
dc.date.available	2022-07-22T10:33:17Z
dc.date.issued	2021
dc.description	A thesis submitted to the Faculty of Science, University of the Witwatersrand, in fulfillment of the requirements for the degree of Masters of Science, 2021	en_ZA
dc.description.abstract	Categorical data analysis is often carried out using different techniques, one of which is quantification. Quantification refers to a collection of methods that assign values to the categories of the categorical variables. Assignment of these values means that categorical variables can now be treated as if they were continuous variables, that is, methods that are used for continuous data can be applied directly to this data. If these values are consistent in the way they are assigned to these categories, they are said to be stable. On the other hand, if these assigned values are not stable then inference, and specifically any subsequent classification will also be unstable. Resampling methods such as bootstrapping have been used to assess the stability of quantification results in the past. These methods have typically been used for contingency tables whereas multiple-choice datasets have received less attention. Furthermore, the stability of quantification results has been evaluated in the context of visualization. This research investigated the stability of quantification results under different sampling strategies including subsampling and resampling. In addition to evaluating the stability in the context of visualizations, in this study stability was also assessed in terms of classification. In this research, we used Dual Scaling (DS) to accomplish quantification. To investigate stability, we used different subsampling strategies, specifically, uniform and biased subsampling. In addition to subsampling, bootstrapping was used to further investigate the variation of results. Finally, we used Support Vector Machines (SVMs) and Linear Discriminant Analysis (LDA) to perform classification on the quantified data to evaluate the effect of changing the data on quantification results. The results showed that DS is stable under uniform subsampling and resampling. Quantification results were similar to the parent set for larger sample sizes in the uniform subsampling strategy whereas the sample size affected the stability of DS results for biased subsampling, that is, the results deviated largely from the parent data when the data is extremely biased. Contrary to our knowledge of larger sample sizes yielding better results, when subsampling a variable with two categories of unequal rows, our study showed that the category with fewer rows gave better results. This was tested on the “Gender”, “Income” and “Employment” variables and the same conclusions were reached. The same experiment was applied to the 2017 General Household Survey (GHS) data on “ Gender” variable and the results still yielded the same conclusions. The biased subsampling on evenly distributed categories yields better results than subsamples with a skewed distribution of categories. Although results may indicate instability in the DS method for biased subsampling, the clustering and classification results showed that the quantification results were indeed stable since the classification results did not show a variation to any great extent and the clustering results retained the same groupings under difference subsampling strategies. Therefore, DS can be used for classification and other related techniques since it is stable. Resampling methods, on the other hand, strengthen the results of either the parents ample or the subsamples, that is, resampling validates the underlying results of a study through the repetition of experiments. Finally, although resampling methods are useful for assessing the stability of quantification results, they should never be used as a substitution for gathering more data, this was shown in the experiment with intentions to investigate the effect of resampling up to a certain sample size greater than the sample size of the subsample	en_ZA
dc.description.librarian	CK2022	en_ZA
dc.faculty	Faculty of Science	en_ZA
dc.identifier.uri	https://hdl.handle.net/10539/33051
dc.language.iso	en	en_ZA
dc.school	School of Computer Science and Applied Mathematics	en_ZA
dc.title	Evaluating the reliability of quantification results	en_ZA
dc.type	Thesis	en_ZA

Files

Original bundle

Now showing 1 - 1 of 1

Name:: MSc_1455373_Feni_signed.pdf
Size:: 3.89 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

ETD Collection