Analyzing the performance and generalisability of incorporating SimCLR into Proximal Policy Optimization in procedurally generated environments
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of the Witwatersrand, Johannesburg
Abstract
Multiple approaches to state representation learning have been shown to improve the performance of reinforcement learning agents substantially. When used in reinforcement learning, a known challenge in state representation learning is enabling an agent to represent environment states with similar characteristics in a manner that would allow said agent to comprehend it as such. We propose a novel algorithm that combines contrastive learning with reinforcement learning so that agents learn to group states by common physical characteristics and action preferences during training. We subsequently generalise these learnings to previously encountered environment obstacles. To enable a reinforcement learning agent to use contrastive learning within its environment interaction loop, we propose a state representation learning model that employs contrastive learning to group states using observations coupled with the action the agent chose within its current state. Our approach uses a combination of two algorithms that we augment to demonstrate the effectiveness of combining contrastive learning with reinforcement learning. The state representation model for contrastive learning is a Simple Framework for Contrastive Learning of Visual Representations (SimCLR) by Chen et al. [2020], which we amend to include action values from the chosen reinforcement learning environment. The policy gradient algorithm (PPO) is our chosen reinforcement learning approach for policy learning, which we combine with SimCLR to form our novel algorithm, Action Contrastive Policy Optimization (ACPO).
When combining these augmented algorithms for contrastive reinforcement learning, our results show significant improvement in training performance and generalisation to unseen environment obstacles of similar structure (physical layout of interactive objects) and mechanics (the rules of physics and transition probabilities).
Description
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in partial fulfillment of the requirements for the degree of Master of Science, Johannesburg 2024
Keywords
Deep learning, Reinforcement learning, Machine learning, Contrastive learning, AI
Citation
Gilbert, Nikhil. (2024). Analyzing the performance and generalisability of incorporating SimCLR into Proximal Policy Optimization in procedurally generated environments [PhD thesis, University of the Witwatersrand, Johannesburg]. WireDSpace.https://hdl.handle.net/10539/41832