Chunked extendible arrays and its integration with the global array toolkit for parallel image processing
Date
2016
Authors
Nimako, Gideon
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Several meetings of the Extremely Large Databases Community for large scale
scientific applications have advocated the use of multidimensional arrays as the
appropriate model for representing scientific databases. Scientific databases gradually
grow to massive sizes of the order of terabytes and petabytes. As such, the storage of
such databases requires efficient dynamic storage schemes where the array is allowed
to arbitrarily extend the bounds of the dimensions. Conventional multidimensional
array representations in today’s programming environments do not extend or shrink
their bounds without relocating elements of the data-set. In general extendibility of
the bounds of the dimensions is limited to only one dimension. This thesis presents a
technique for storing dense multidimensional arrays by chunks such that the array can
be extended along any dimension without compromising the access time of an element.
This is done with a computed access mapping function that maps the k-dimensional
index onto a linear index of the storage locations. This concept forms the basis for
the implementation of an array file of any number of dimensions, where the bounds
of the array dimension can be extended arbitrarily. Such a feature currently exists in
the Hierarchical Data Format version 5 (HDF5). However, extending the bound of a
dimension in the HDF5 array file can be unusually expensive in time. Such extensions,
in our storage scheme for dense array files, can be performed while still accessing
elements of the array at orders of magnitude faster than in HDF5 or conventional
array-files. We also present Parallel Chunked Extendible Dense Array (PEXTA), a
new parallel I/O model for the Global Array Toolkit. PEXTA provides the necessary
Application Programming Interface (API) for explicit data transfer between the
memory resident global array and its secondary storage counterpart but also allows
the persistent array to be extended on any dimension without compromising the
access time of an element or sub-array elements. Such APIs provide a platform
for high speed and parallel hyperspectral image processing without performance
degradation, even when the imagery files undergo extensions.
Description
A thesis submitted to the Faculty of Engineering and the Built Environment
in fulfilment of the requirements for the degree of
Doctor of Philosophy, 2016
Online resource (xii, 151 leaves)
Online resource (xii, 151 leaves)
Keywords
Citation
Nimako, Gideon (2016) Chunked extendible arrays and its integration with the global array toolkit for parallel image processing, University of the Witwatersrand, <http://hdl.handle.net/10539/22332>