3. Electronic Theses and Dissertations (ETDs) - All submissions
Permanent URI for this communityhttps://wiredspace.wits.ac.za/handle/10539/45
Browse
3 results
Search Results
Item Data scientist : using a competency based approach to explore an emerging role(2018) Nosarka, Naseema BanuPurpose: The aim in this research study was to explore the role and competencies of Data Scientists in South Africa as the role starts to emerge. Due to the newness of the role, jobs in this sphere are currently being filled by skilled professionals moving from other related areas. Knowledge and skills for Data Scientists were explored in order to examine the role of a Data Scientist and the competencies they should have. Design/methodology/approach: The studies that have been published on the role of a Data Scientist are limited as the field of Data Science is still new. Therefore the design of the research was exploratory and used qualitative methods. Data gathered for this research was analysed using thematic analysis. The study used respondents drawn from the banking and insurance industries as they are amongst the first to employ Data Scientists in the real sense of the term in South Africa. Six Data Scientists were interviewed. Originality/value: Research that focuses on the role of Data Scientists especially in South Africa is limited as most of the research has taken place in developed countries. There is also limited research on the role of a Data Scientist within the banking and insurance industry. This study contributes to practitioner and research knowledge by exploring the emerging role of a Data Scientist in the South African context. Practical implications: This research improves our understanding of the knowledge and skills Data Scientists should have within the banking and insurance industry. This research adds insight by highlighting the role that Data Scientists are currently undertaking by providing information on the specific skills that they report as required. This research can help in the shaping of education and developing the required skills for individuals who intend to pursue the career path of a Data Scientist as well as help managers hire the right people for the position of a Data Scientist.Item Controller-plane workload characterization and forecasting in software-defined networking(2017) Nkosi, EmmanuelSoftware-defined networking (SDN) is the physical separation of the control and data planes in networking devices. A logically centralised controller plane which uses a network-wide view data structure to control several data plane devices is another defining attribute of SDN. The centralised controllers and the network-wide view data structure are difficult to scale as the network and the data it carries grow. Solutions which have been proposed to combat this challenge in SDN lack the use of the statistical properties of the workload or network traffic seen by SDN controllers. Hence, the objective of this research is twofold: Firstly, the statistical properties of the controller workload are investigated. Secondly, Autoregressive Integrated Moving Average Models (ARIMA) and Artificial Neural Network (ANN) models are investigated to establish the feasibility of forecasting the controller workload signal. Representations of the state of the controller plane in the network-wide view in the form of forecasts of the controller workload will enable control applications to detect dwindling controller resources and therefore alleviate controller congestion. On the other hand, realistic statistical traffic models of the controller workload variable are sought for the design and evaluation of SDN controllers. A data center network prototype is created by making use of an SDN network emulator called Mininet and an SDN controller called Onos. It was found that 1–2% of flows arrive within 10 s of each other and more than 80% have inter-arrival times in the range of 10 s–10ms. These inter-arrival times were found to follow a beta distribution, which is similar to findings made in Machine Type Communications (MTC). The use of ARIMA and ANN to forecast the controller workload established that it is feasible to forecast the workload seen by SDN controllers. The accuracy of these models was found to be comparable for continuously valued time series signals. The ANN model was found to be applicable even in discretely valued time series data.Item An SDN-based firewall shunt for data-intensive science applications(2016) Miteff, SimeonData-intensive research computing requires the capability to transfer les over long distances at high throughput. Stateful rewalls introduce su cient packet loss to prevent researchers from fully exploiting high bandwidth-delay network links [25]. To work around this challenge, the science DMZ design [19] trades o stateful packet ltering capability for loss-free forwarding via an ordinary Ethernet switch. We propose a novel extension to the science DMZ design, which uses an SDN-based rewall. This report introduces NFShunt, a rewall based on Linux's Net lter combined with OpenFlow switching. Implemented as an OpenFlow 1.0 controller coupled to Net lter's connection tracking, NFShunt allows the bypass-switching policy to be expressed as part of an iptables rewall rule-set. Our implementation is described in detail, and latency of the control-plane mechanism is reported. TCP throughput and packet loss is shown at various round-trip latencies, with comparisons to pure switching, as well as to a high-end Cisco rewall. Cost, as well as operations and maintenance aspects, are compared and analysed. The results support reported observations regarding rewall introduced packet-loss, and indicate that the SDN design of NFShunt is a technically viable and cost-e ective approach to enhancing a traditional rewall to meet the performance needs of data-intensive researchers