Lin, Shaw Chian2024-07-122024-07-122023-10Lin, Shaw Chian. (2023). Model Propagation for High-Parallelism in Data Compression. [Master's dissertation, University of the Witwatersrand, Johannesburg]. WIReDSpace. https://hdl.handle.net/10539/38910https://hdl.handle.net/10539/38910A Dissertation submitted in fulfillment of the requirements for the degree of Master of Science, in the Faculty of Engineering and Built Environment, in the University of the Witwatersrand, Johannesburg, in the School of Electrical and Information Engineering, in 2023.Recent data compression research focuses on the parallelisation of existing algorithms (LZ77, BZIP2 etc.) by exploiting their inherent parallelism. Little work has been performed on parallelising highly sequential algorithms, whose slow compression speeds would benefit the most from parallelism. This dissertation presents a generalised parallelisation approach that can be potentially adopted by any compression algorithms, with model sequentiality in mind. The scheme presents a novel divide-and-conquer approach when dividing the data stream into smaller data blocks for parallelisation. The scheme, branching propagation, is implemented with prediction by partial matching (PPM), an algorithm of the statistical-modelling family known for their serial nature, which is shown to suffer from compression ratio increases when parallelised. A speedup of 5.2-7x has been achieved at 16 threads, with at most a 6.5% increase in size relative to serial performance, while the conventional approach showed up to a 7.5x speedup with an 8.0% increase. The branching propagation approach has been shown to offer better compression ratios over conventional approaches with increasing parallelism (a difference of 11% increase at 256 threads), albeit at slightly slower speeds. To quantify the speedup over ratio penalty, an alternate metric called speedup-to-ratio increase (SRI) is used. This shows that when serial dependency is maintained, branching propagation is superior in standard configurations, which offers substantial speed while minimising the compression ratio penalty relative to the speedup. However, at lower serial dependency, the conventional approach is generally preferable, with 9-16x speedup per 1% increase in compression ratio at maximal speed compared to branching propagation’s 6-13x speedup per 1%.en©2023 University of the Witwatersrand, JohannesburgData compressionParallelising highly sequential algorithmsPrediction by partial matching (PPM)Speedup-to-ratio increase (SRI)UCTDSDG-9: Industry, innovation and infrastructureModel Propagation for High-Parallelism in Data CompressionDissertationUniversity of the Witwatersrand, Johannesburg