Evaluating the reliability of quantification results

dc.contributor.authorFeni, Democracy Yongama
dc.date.accessioned2022-07-22T10:33:17Z
dc.date.available2022-07-22T10:33:17Z
dc.date.issued2021
dc.descriptionA thesis submitted to the Faculty of Science, University of the Witwatersrand, in fulfillment of the requirements for the degree of Masters of Science, 2021en_ZA
dc.description.abstractCategorical data analysis is often carried out using different techniques, one of which is quantification. Quantification refers to a collection of methods that assign values to the categories of the categorical variables. Assignment of these values means that categorical variables can now be treated as if they were continuous variables, that is, methods that are used for continuous data can be applied directly to this data. If these values are consistent in the way they are assigned to these categories, they are said to be stable. On the other hand, if these assigned values are not stable then inference, and specifically any subsequent classification will also be unstable. Resampling methods such as bootstrapping have been used to assess the stability of quantification results in the past. These methods have typically been used for contingency tables whereas multiple-choice datasets have received less attention. Furthermore, the stability of quantification results has been evaluated in the context of visualization. This research investigated the stability of quantification results under different sampling strategies including subsampling and resampling. In addition to evaluating the stability in the context of visualizations, in this study stability was also assessed in terms of classification. In this research, we used Dual Scaling (DS) to accomplish quantification. To investigate stability, we used different subsampling strategies, specifically, uniform and biased subsampling. In addition to subsampling, bootstrapping was used to further investigate the variation of results. Finally, we used Support Vector Machines (SVMs) and Linear Discriminant Analysis (LDA) to perform classification on the quantified data to evaluate the effect of changing the data on quantification results. The results showed that DS is stable under uniform subsampling and resampling. Quantification results were similar to the parent set for larger sample sizes in the uniform subsampling strategy whereas the sample size affected the stability of DS results for biased subsampling, that is, the results deviated largely from the parent data when the data is extremely biased. Contrary to our knowledge of larger sample sizes yielding better results, when subsampling a variable with two categories of unequal rows, our study showed that the category with fewer rows gave better results. This was tested on the “Gender”, “Income” and “Employment” variables and the same conclusions were reached. The same experiment was applied to the 2017 General Household Survey (GHS) data on “ Gender” variable and the results still yielded the same conclusions. The biased subsampling on evenly distributed categories yields better results than subsamples with a skewed distribution of categories. Although results may indicate instability in the DS method for biased subsampling, the clustering and classification results showed that the quantification results were indeed stable since the classification results did not show a variation to any great extent and the clustering results retained the same groupings under difference subsampling strategies. Therefore, DS can be used for classification and other related techniques since it is stable. Resampling methods, on the other hand, strengthen the results of either the parents ample or the subsamples, that is, resampling validates the underlying results of a study through the repetition of experiments. Finally, although resampling methods are useful for assessing the stability of quantification results, they should never be used as a substitution for gathering more data, this was shown in the experiment with intentions to investigate the effect of resampling up to a certain sample size greater than the sample size of the subsampleen_ZA
dc.description.librarianCK2022en_ZA
dc.facultyFaculty of Scienceen_ZA
dc.identifier.urihttps://hdl.handle.net/10539/33051
dc.language.isoenen_ZA
dc.schoolSchool of Computer Science and Applied Mathematicsen_ZA
dc.titleEvaluating the reliability of quantification resultsen_ZA
dc.typeThesisen_ZA

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
MSc_1455373_Feni_signed.pdf
Size:
3.89 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections