Model Propagation for High-Parallelism in Data Compression

Date
2023-10
Journal Title
Journal ISSN
Volume Title
Publisher
University of the Witwatersrand, Johannesburg
Abstract
Recent data compression research focuses on the parallelisation of existing algorithms (LZ77, BZIP2 etc.) by exploiting their inherent parallelism. Little work has been performed on parallelising highly sequential algorithms, whose slow compression speeds would benefit the most from parallelism. This dissertation presents a generalised parallelisation approach that can be potentially adopted by any compression algorithms, with model sequentiality in mind. The scheme presents a novel divide-and-conquer approach when dividing the data stream into smaller data blocks for parallelisation. The scheme, branching propagation, is implemented with prediction by partial matching (PPM), an algorithm of the statistical-modelling family known for their serial nature, which is shown to suffer from compression ratio increases when parallelised. A speedup of 5.2-7x has been achieved at 16 threads, with at most a 6.5% increase in size relative to serial performance, while the conventional approach showed up to a 7.5x speedup with an 8.0% increase. The branching propagation approach has been shown to offer better compression ratios over conventional approaches with increasing parallelism (a difference of 11% increase at 256 threads), albeit at slightly slower speeds. To quantify the speedup over ratio penalty, an alternate metric called speedup-to-ratio increase (SRI) is used. This shows that when serial dependency is maintained, branching propagation is superior in standard configurations, which offers substantial speed while minimising the compression ratio penalty relative to the speedup. However, at lower serial dependency, the conventional approach is generally preferable, with 9-16x speedup per 1% increase in compression ratio at maximal speed compared to branching propagation’s 6-13x speedup per 1%.
Description
A Dissertation submitted in fulfillment of the requirements for the degree of Master of Science, in the Faculty of Engineering and Built Environment, in the University of the Witwatersrand, Johannesburg, in the School of Electrical and Information Engineering, in 2023.
Keywords
Data compression, Parallelising highly sequential algorithms, Prediction by partial matching (PPM), Speedup-to-ratio increase (SRI), UCTD
Citation
Lin, Shaw Chian. (2023). Model Propagation for High-Parallelism in Data Compression. [Master's dissertation, University of the Witwatersrand, Johannesburg]. WIReDSpace. https://hdl.handle.net/10539/38910