Generalizing regularization of neural networks to correlated parameter spaces

Jarvis, Devon
Journal Title
Journal ISSN
Volume Title
A common assumption of regularization techniques used with artificial neural networks is that their parameters are independently distributed. This is primarily done for simplicity or to enforce this constraint on the model parameters. In this work we provide theoretical and empirical results showing that the independence assumption is unreasonable and unhelpful for regularization. We create and evaluate a novel regularization method called Metric regularization which adapts the degree of regularization for each parameter of the network based on how important the parameter is for reducing the loss on the training data. Importantly Metric regularization accounts for the impact that a parameter has on the other model parameters to determine how important it is for reducing the loss. Thus, our novel regularization method adapts to the correlation of the parameters in the model. We provide theoretical results showing that Metric regularization has the Minimum Mean Squared Error property. We also evaluate the utility of Metric regularization empirically and find that it is damaging to the model which is unable to effectively fit the training data as a result. We instead find that regularization methods which adaptively choose to regularize only the parameters which are unhelpful for fitting the training data are able to improve the generalizability of the networks without hindering the training data performance. We provide justifications for the apparent disconnect between our theoretical and empirical results for Metric regularization and in so doing shed some light on what causes a generalization gap with networks as well as the impacts of different initialization regimes used when training networks
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Master of Science