The role of graph theory in big data analytics

Mashia, Kellen2020-09-082020-09-082019https://hdl.handle.net/10539/29539A dissertation submitted in the fulfilment of the requirement for the Master of Science in the School of Mathematics, 2019Subgraph isomorphism detection is a computationally expensive problem which determines if there exists a subgraph Gs = (Vs, Es) in a large graph (called the data graph) denoted by G = (VG, EG); which matches some other given smaller graph Q = (VQ, EQ) (called the pattern graph). According to J. R. Ullmann in [19], the subgraph isomorphism problem can be solved through a brute-force enumeration procedure. In this dissertation we extensively study the Ullmann algorithm for subgraph isomorphism detection along with some of the improvements developed in [3]; and then we shall discuss the performance of the algorithm by conducting a simulation analysis for distinct, randomly generated undirected graphs. The algorithm is theoretically compared with the one proposed by L. P. Cordella in [6]. However, we only carried out an efficiency analysis on the Ullmann algorithm due to lack of large sets of experimental data. Much work has been done in this field, however, it still stands to be an open question in computer science to improve the execution run-time of the algorithm. In future research, we intend to revisit the graph and subgraph isomorphism detection problems as a possible PhD research in Computer Science. In chapter one we introduced preliminary definitions which may be of great help in understanding the main work in chapter two. In chapter two we study in detail the Ullmann algorithm for the subgraph isomorphism problem which we then implement on several examples to examine the execution run-time of the algorithm. The purpose of chapter three is to introduce another useful algorithm for subgraph isomorphism detection, called the VF2. In this chapter we study in detail the VF2 algorithm for subgraph isomorphism detection. In chapter four we give a detailed simulation analysis by implementing the Ullmann algorithm on one large data graph G = (VG, EG) of order 30 and size 30, along with 15 randomly generated undirected graphs Q = (VQ, EQ). In this chapter we examine the algorithms efficiency by applying it on all 15 random graphs to search for isomorphic subgraphs in the large data graph. Chapter five is the concluding chapter and chapter six entails the appendices.enThe role of graph theory in big data analyticsThesis