Evaluating the reliability of quantification results

No Thumbnail Available

Date

2021

Authors

Feni, Democracy Yongama

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Categorical data analysis is often carried out using different techniques, one of which is quantification. Quantification refers to a collection of methods that assign values to the categories of the categorical variables. Assignment of these values means that categorical variables can now be treated as if they were continuous variables, that is, methods that are used for continuous data can be applied directly to this data. If these values are consistent in the way they are assigned to these categories, they are said to be stable. On the other hand, if these assigned values are not stable then inference, and specifically any subsequent classification will also be unstable. Resampling methods such as bootstrapping have been used to assess the stability of quantification results in the past. These methods have typically been used for contingency tables whereas multiple-choice datasets have received less attention. Furthermore, the stability of quantification results has been evaluated in the context of visualization. This research investigated the stability of quantification results under different sampling strategies including subsampling and resampling. In addition to evaluating the stability in the context of visualizations, in this study stability was also assessed in terms of classification. In this research, we used Dual Scaling (DS) to accomplish quantification. To investigate stability, we used different subsampling strategies, specifically, uniform and biased subsampling. In addition to subsampling, bootstrapping was used to further investigate the variation of results. Finally, we used Support Vector Machines (SVMs) and Linear Discriminant Analysis (LDA) to perform classification on the quantified data to evaluate the effect of changing the data on quantification results. The results showed that DS is stable under uniform subsampling and resampling. Quantification results were similar to the parent set for larger sample sizes in the uniform subsampling strategy whereas the sample size affected the stability of DS results for biased subsampling, that is, the results deviated largely from the parent data when the data is extremely biased. Contrary to our knowledge of larger sample sizes yielding better results, when subsampling a variable with two categories of unequal rows, our study showed that the category with fewer rows gave better results. This was tested on the “Gender”, “Income” and “Employment” variables and the same conclusions were reached. The same experiment was applied to the 2017 General Household Survey (GHS) data on “ Gender” variable and the results still yielded the same conclusions. The biased subsampling on evenly distributed categories yields better results than subsamples with a skewed distribution of categories. Although results may indicate instability in the DS method for biased subsampling, the clustering and classification results showed that the quantification results were indeed stable since the classification results did not show a variation to any great extent and the clustering results retained the same groupings under difference subsampling strategies. Therefore, DS can be used for classification and other related techniques since it is stable. Resampling methods, on the other hand, strengthen the results of either the parents ample or the subsamples, that is, resampling validates the underlying results of a study through the repetition of experiments. Finally, although resampling methods are useful for assessing the stability of quantification results, they should never be used as a substitution for gathering more data, this was shown in the experiment with intentions to investigate the effect of resampling up to a certain sample size greater than the sample size of the subsample

Description

A thesis submitted to the Faculty of Science, University of the Witwatersrand, in fulfillment of the requirements for the degree of Masters of Science, 2021

Keywords

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By