A comparative analysis of two Image synthesis networks repurposed for video synthesis
No Thumbnail Available
Date
2021
Authors
Weiher, Kyle
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Modern, state-of-the-art video synthesis networks are large, complex, and difficult to train due to steep hardware requirements. Image synthesis networks are, however, far easier to train, and thus present an opportunity. We present a comparative analysis of two image synthesis networks when repurposed for video synthesis, namely the Cascaded Refinement Network (CRN), and pix2pixHD. We are primarily interested in the temporal consistency between output frames, however we analyse aspects such as image quality as well. The networks are not equal in neither features nor performance, with pix2pixHD superior in both regards, and thus we compare both base network configurations, and configurations where we port components from pix2pixHD to the CRN. Furthermore, we test video specific features, namely the use of prior frame information in the synthesis and training processes, and the use of optical flow estimation to warp prior frame content directly to the next frame. We find that while the CRN performed far worse than pix2pixHD in their base configurations, the CRN with pix2pixHD features closed the gap. Next, the addition of prior frame information provides the largest improvement to temporal consistency, leading to acceptably smooth synthetic video. Finally, optical flow estimation further improves the performance of pix2pixHD, while hindering the CRN
Description
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Master of Science, 2021