3. Electronic Theses and Dissertations (ETDs) - All submissions
Permanent URI for this communityhttps://wiredspace.wits.ac.za/handle/10539/45
Browse
4 results
Search Results
Item Deep learning based semantic segmentation of unstructured outdoor environments(2020) Ndlovu, NkosinathiThe past decade has seen increased interest in, and demand for autonomous vehicles. Complete and successful autonomy of mobile systems such as unmanned ground vehicles (UGVs) depends on perception. Perception is the ability of an agent to semantically interpret its operational environment through vision. Deep learning approaches have recently been continuously more successful over traditional/classical methods in perception and vision tasks. This is primarily due to their non-reliance on selected hand-crafted features, they adopt a more robust and generalised learned-feature approach through representation learning. Convolutional Neural Networks (CNNs) have been widely used towards the goal of scene parsing and perception. In this research we focus on using CNN architectures for semantic segmentation in unstructured outdoor environments for autonomous navigation. Our first contribution is to provide a novel dataset for unstructured outdoor domains: the CSIR dataset. We seek to establish whether it is possible to semantically segment an unstructured scene into pre-defined classes such as grass, road, sky, trees etc. This is achieved through an exhaustive comparative study on state-of-the-art CNN architectures on this dataset, and a similar additional dataset: the Freiburg Forest dataset. Furthermore, we seek to establish whether there are any benefits in using transfer learning and pre-trained weights in training CNN architectures for semantic segmentation with limited datasets. Lastly, we identify the important architectural factors necessary for successful semantic segmentation in unstructured outdoor scenes.Item SummaryNet: two-stream convolutional networks for automatic video summarisation(2020) Jappie, ZiyadVideo summarisation is the task of automatically summarising a video sequence, to extract “important” parts of the video so as to give an overview of what has occurred. The benefit of solving this problem is that it can be applied to a myriad of fields such as the entertainment industry, sports, e-learning and many more. There is a distinct inherent difficulty with video summarisation due to its subjectivity - there is no one defined correct answer. As such, it is particularly difficult to define and measure tangible performance. This is in addition to the other difficulties associated with general video processing. We present a novel two-stream network framework for automatic video summarisation, which we call SummaryNet. The SummaryNet employs a deep two-stream network to model pertinent spatio-temporal features by leveraging RGB as well as optical flow information. We use the Two-Stream Inflated 3D ConvNet (I3D) network to extract high-level, semantic feature representations as inputs to our SummaryNet model. Experimental results on common benchmark datasets show that the considered method achieves comparable or better results than the state-of-the-art video summarisation methodsItem Joint decoding of parallel power line communication and visible light communication systems(2018) Onwuatuelo, Daniel ObinnaMany indoor applications operate at narrow band (3kHz148.5kHz) speed and for such applications, power line communication (PLC) and visible light communication (VLC) networks can be naturally connected and adapted to complement each other in order to gain more overall system performance in terms of bit error rate (BER) and computational complexity. In this research,the joint decoding of parallel PLC and VLC systems is proposed and its BER performance is compared to that of the PLCa nd the VLC systems. The joint decoding is applied either at the inner (Viterbi) or at the outer (Reed-Solomon) decoder. The proposed system is adopted according to the PLC G3 physical layer specification but direct current optical orthogonal frequency division multiplexing OFDM (DCO-OFDM) is used in the VLC system to ensure that only positive (unipolar) signals are transmitted. A realistic VLC channel model is adopted in this research by considering the VLC channel as an additive white Gaussian noise (AWGN) channel affected by attenuation in terms of angle of orientation between the source and the receiver and effective surface area of the receiver. Furthermore, the PLC channel is modeled as an AWGN channel with background and impulsive noise generated using Middleton Class Anoisedistributionmodel. Itisshownthroughsimulationresultsandanalysisthatthe proposed joint decoded system outperforms the PLC and the VLC systems in terms of BERperformancedependingonthedistanceofseparationbetweenthesourceandthe receiver. Key words: Power line communication (PLC), Visible light communication (VLC), Bit error rate (BER), Joint decoding, Orthogonal frequency division multiplexing (OFDM), DCopticalOFDM(DCO-OFDM),AdditivewhiteGaussiannoise(AWGN).Item Deformable part model with CNN features for facial landmark detection under occlusion(2018) Brink, HannoDetecting and localizing facial regions in images is a fundamental building block of many applications in the field of affective computing and human-computer interaction. This allows systems to do a variety of higher level analysis such as facial expression recognition. Facial expression recognition is based on the effective extraction of relevant facial features. Many techniques have been proposed to deal with the robust extraction of these features under a wide variety of poses and occlusion conditions. These techniques include Deformable Part Models (DPMs), and more recently deep Convolutional Neural Networks (CNNs). Recently, hybrid models based on DPMs and CNNs have been proposed considering the generalization properties of CNNs and DPMs. In this work we propose a combined system, using CNNs as features for a DPM with a focus on dealing with occlusion. We also propose a method of face localization allowing occluded regions to be detected and explicitly ignored during the detection step.