Computational efficiency of k-anonymization incorporating clustering
No Thumbnail Available
Date
2020
Authors
Netshiunda, Fhulufhelo Emmanuel
Emmanuel, Netshiunda Fhulufhelo
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Data publicizing pose a threat of disclosing data subjects associating
them to their personal sensitive information. k-anonymization is a practical
method used to anonymize datasets to be made publicly available.
The k-anonymization hides identities of data subjects by ensuring that
every record of a publicized dataset has at least k �� 1 (k being a natural
number) other records similar to it with respect to a set of attributes
called quasi-identifiers. To minimize information loss, a clustering technique
is often used to group similar records before k-anonymization is
applied. Processing both the clustering and the k-anonymization using
current algorithms is computationally expensive. It is within this
framework that this research focuses on parallel implementation of the
k-anonymization algorithm which incorporates clustering to achieve time
effective computations
Description
A research report submitted in partial fulfillment of the requirements for the degree of Master of Science in the field of e-Science in the School of Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg, 2020