SummaryNet: two-stream convolutional networks for automatic video summarisation
Date
2020
Authors
Jappie, Ziyad
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Video summarisation is the task of automatically summarising a video sequence, to extract “important”
parts of the video so as to give an overview of what has occurred. The benefit of solving this problem
is that it can be applied to a myriad of fields such as the entertainment industry, sports, e-learning and
many more. There is a distinct inherent difficulty with video summarisation due to its subjectivity -
there is no one defined correct answer. As such, it is particularly difficult to define and measure tangible
performance. This is in addition to the other difficulties associated with general video processing.
We present a novel two-stream network framework for automatic video summarisation, which we call
SummaryNet. The SummaryNet employs a deep two-stream network to model pertinent spatio-temporal
features by leveraging RGB as well as optical flow information. We use the Two-Stream Inflated 3D
ConvNet (I3D) network to extract high-level, semantic feature representations as inputs to our SummaryNet
model. Experimental results on common benchmark datasets show that the considered method
achieves comparable or better results than the state-of-the-art video summarisation methods
Description
A dissertation submitted to the Faculty of Science, University of the Witwatersrand,
Johannesburg, in fulfilment of the requirements for the degree of Master of Science, 2020
Keywords
Citation
Jappie, Ziyad (2020) SummaryNet: two-stream convolutional networks for automatic video summarisation, University of the Witwatersrand, Johannesburg, https://hdl.handle.net/10539/30207