An investigation of the consistency of Statistics South Africa's employment data between surveys
The purpose of the study is to investigate possible reasons as to why different surveys conducted by Statistics South Africa (Stats SA) give different estimates of the percentages in the different employment categories. In order to investigate the different sources of variability, that is, surveys done in different years, surveys using different questionnaires, different sample designs and different employment profiles, the following comparisons were done for Gauteng and the Eastern Cape: • To compare estimates of employment status over time for the March Labour Force Survey (LFS) 2006 and 2007; September LFS 2006 and 2007; and General Household Survey (GHS) September 2006 and July 2007. • To compare estimates of employment status across surveys for LFS September 2006; GHS September 2006; and LFS September 2007, July GHS 2007 and Community Survey (CS) October 2007. In order to generate a set of comparable estimates across surveys and within surveys over time, this study identifies and addresses the various sources of potential non-comparability. The methodologies utilised are Chi-squared Automatic Detection (CHAID) and multinomial logistic regression. These statistical techniques were used to identify variables which are associated with employment status. The predictor variables included in the analysis are age group, highest level of education, marital status, population group, sex and source data. The results from CHAID for all data sets show that age group is the most significant predictor on which data on employment status can be segmented. At the root node (the first level of the CHAID tree), data was partitioned by the categories of age group. Highest level of education, sex, population group and province were significant within the categories of age group. Either province or population group was significant within the age group 20–29 years old depending on the data that is being analysed. Sex was most significant within the age group 50–65 years old. The results of multinomial regression show several significant interactions involving from five to seven factors for different data sets. The logistic regression results were not as good as those of the CHAID analyses, but both techniques give us an indication of the relationships between the predictor variables and employment. The analysis of the CS, LFS and GHS in 2007, when explaining employment status, split on age group. Highest level of education was the most significant predictor when comparing the three data sets. There are differences among the three data sets when explaining employment status. This is due to the use of different mid-year population estimates, differences in the instructions given in the questionnaire for CS 2007 and other surveys, as well as the sample size of the surveys. There are indeed significant differences between Gauteng and Eastern Cape in relation to employment status.
MSc., Faculty of Science, University of the Witwatersrand, 2011
Statistics (South Africa) , Government information (South Africa) , Labor supply (South Africa, statistics)