Maximum clique and edge bi-clique problems via matrix decomposition and truncated nuclear norm
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In network science, cliques and bi-cliques are important structures well suited to several applications, including community detection, clustering of gene expression data, social network analysis, code theory, and bioinformatics.
In this work, we consider the problem of identifying the planted clique, the planted edge biclique in graphs, and the maximum clique and maximum edge bi-clique in random graphs. We have proposed mathematical models for these problems. The model for each problem resembles the matrix decomposition of the adjacency matrix of a given graph. The objective function of one of the proposed mathematical models includes the nuclear norm and a weighted `1-norm of the lowrank and sparse matrix of the decomposition, respectively. The weighted `1-norm has an advantage over the known `1-norm in reducing the error. The use of dynamically changing the weights for the `1-norm has been motivated. We have used proximal operators within the iterates of the ADMM (Alternating Direction Method of Multipliers) algorithm to solve the optimization problem. We have also proposed another mathematical model which is based on the truncated nuclear norm and the weighted `1-norm for finding the planted clique, the planted edge bi-clique in graphs, and the maximum clique and maximum edge bi-clique in random graphs. Furthermore, the optimization problems are solved by a two-step ADMM algorithm. The algorithms have been implemented for both the nuclear norm-based matrix decomposition model as well as the truncated nuclear normbased matrix decomposition model. Convergence of the proposed ADMM algorithms for both models has been provided. One of the main features of our algorithms is that both algorithms require no input from the user other than the adjacency matrix of the input graph. The theoretical guarantee of the planted clique, the maximum clique, the planted edge bi-clique, and the maximum iii edge bi-clique, in the form of the low-rank matrix, has also been established using approximate dual certificates constructed via golfing scheme for the nuclear norm based matrix decomposition model. We have suggested conditions that guarantee the recovery and uniqueness of the solution, as well as a tight bound on the dual matrix that validates optimality conditions.
We have tested our ADMM algorithm for numerous graphs, including graphs in which a clique or a bi-clique is planted and the remaining edges are added with probability p, random graphs where no cliques or bi-cliques are being planted, and real-world graphs. Numerical results for planted cliques and bi-cliques are presented showing clear advantages of our model when compared with two recent mathematical models. Results are also presented for randomly generated graphs with minimal errors. These errors are found using a formula we have proposed based on the sizes of the clique and the bi-clique. Moreover, we have applied our algorithm to real-world graphs for which cliques have been recovered successfully.
We have also tested our algorithm on truncated nuclear norm problem with different graphs, including graphs with a planted clique or bi-clique, and random graphs where no cliques or bicliques are planted. Numerical results for both graphs show clear advantages of our model when compared with the ‘regular’ truncated nuclear norm model.
Description
A thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy to the Faculty of Science, University of the Witwatersrand, 2022
Keywords
Bi-Clique Problems, Matrix Decomposition, Truncated Nuclear Norm