Exploring the latent space of autoencoders with a metric of artistic style

Rich, William2022-08-102022-08-102021https://hdl.handle.net/10539/33094A research report submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in partial fulfilment of the requirements for the degree of Master of Science in Computer Science by Course Work and Research Report, 2021This research report uses a metric called style loss to quantitatively explore the latent space of autoencoders. Style loss measures the difference in artistic style between two images [Gatys et al.2015]. The goal of the research is to use style loss to compare the latent space of five autoencoder models, in order to determine whether or not style loss is useful in better understanding the latent space in terms of its organisation and regularisation. The five autoencoder models used are an Adversarial Autoencoder (AAE), Triplet Variational Autoencoder (TVAE), β-Variational Autoencoder (β-VAE), Variational Autoencoder (VAE), and Autoencoder (AE). In the experiments a data set of famous artworks is used, and the artistic style (Impressionism, Expressionism, Realism, Romanticism) is used as the class labels. For each model, all of the artworks are encoded using the encoder networks of the autoencoders. Pairs of points in the latent space are chosen according to various criteria. Then for each pair, a linear trajectory is determined, and10equally spaced points are sampled from the trajectory. Each point in the trajectory is then decoded using the decoder network of the respective autoencoder model, to yield a series of images. For each consecutive pair of images in this series the style loss is then evaluated. Initially experiments are conducted to get a basic understanding of how the latent space of the five different models compare. For this Principal Component Analysis (PCA) is used along with various clustering algorithms. From these experiments it is found that the AAE has spherical covariance in the latent space. The distribution in the latent space for the TVAE is oblong, and the clustering experiments strongly suggest that the data is grouped according to the class labels. For theβ-VAE, VAE, and AE the data manifold is also oblong, but there is not compelling evidence to suggest that the data has been clustered according to the class labels. A major finding in the style loss experiments is that there are two distinct patterns. The first pattern, mainly seen in the AAE and VAE, is that the style loss takes a distinctive horseshoe shape, and the second pattern is that the style loss takes a wave-like shape. By investigating the image reconstructions, it appears that the horseshoe pattern occurs when the image is transformed along the interpolation, and the wave pattern occurs when there is a cross-dissolve between the images. This indicates that style loss can be used to better understand the latent space, as there is a strong correlation between the amount of cross-dissolve and the pattern of style loss along interpolation trajectoriesenExploring the latent space of autoencoders with a metric of artistic styleThesis