Model Propagation for High-Parallelism in Data Compression

dc.contributor.authorLin, Shaw Chian
dc.contributor.supervisorCheng, Ling
dc.date.accessioned2024-07-12T15:14:16Z
dc.date.available2024-07-12T15:14:16Z
dc.date.issued2023-10
dc.descriptionA Dissertation submitted in fulfillment of the requirements for the degree of Master of Science, in the Faculty of Engineering and Built Environment, in the University of the Witwatersrand, Johannesburg, in the School of Electrical and Information Engineering, in 2023.
dc.description.abstractRecent data compression research focuses on the parallelisation of existing algorithms (LZ77, BZIP2 etc.) by exploiting their inherent parallelism. Little work has been performed on parallelising highly sequential algorithms, whose slow compression speeds would benefit the most from parallelism. This dissertation presents a generalised parallelisation approach that can be potentially adopted by any compression algorithms, with model sequentiality in mind. The scheme presents a novel divide-and-conquer approach when dividing the data stream into smaller data blocks for parallelisation. The scheme, branching propagation, is implemented with prediction by partial matching (PPM), an algorithm of the statistical-modelling family known for their serial nature, which is shown to suffer from compression ratio increases when parallelised. A speedup of 5.2-7x has been achieved at 16 threads, with at most a 6.5% increase in size relative to serial performance, while the conventional approach showed up to a 7.5x speedup with an 8.0% increase. The branching propagation approach has been shown to offer better compression ratios over conventional approaches with increasing parallelism (a difference of 11% increase at 256 threads), albeit at slightly slower speeds. To quantify the speedup over ratio penalty, an alternate metric called speedup-to-ratio increase (SRI) is used. This shows that when serial dependency is maintained, branching propagation is superior in standard configurations, which offers substantial speed while minimising the compression ratio penalty relative to the speedup. However, at lower serial dependency, the conventional approach is generally preferable, with 9-16x speedup per 1% increase in compression ratio at maximal speed compared to branching propagation’s 6-13x speedup per 1%.
dc.description.submitterMM2024
dc.facultyFaculty of Engineering and the Built Environment
dc.identifier.citationLin, Shaw Chian. (2023). Model Propagation for High-Parallelism in Data Compression. [Master's dissertation, University of the Witwatersrand, Johannesburg]. WIReDSpace. https://hdl.handle.net/10539/38910
dc.identifier.urihttps://hdl.handle.net/10539/38910
dc.language.isoen
dc.publisherUniversity of the Witwatersrand, Johannesburg
dc.rights©2023 University of the Witwatersrand, Johannesburg
dc.rights.holderUniversity of the Witwatersrand, Johannesburg
dc.schoolSchool of Electrical and Information Engineering
dc.subjectData compression
dc.subjectParallelising highly sequential algorithms
dc.subjectPrediction by partial matching (PPM)
dc.subjectSpeedup-to-ratio increase (SRI)
dc.subjectUCTD
dc.subject.otherSDG-9: Industry, innovation and infrastructure
dc.titleModel Propagation for High-Parallelism in Data Compression
dc.typeDissertation
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Lin_Model propagation_2023.pdf
Size:
887.82 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.43 KB
Format:
Item-specific license agreed upon to submission
Description: