Analyzing the performance and generalisability of incorporating SimCLR into Proximal Policy Optimization in procedurally generated environments

Thumbnail Image

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

University of the Witwatersrand, Johannesburg

Abstract

Multiple approaches to state representation learning have been shown to improve the performance of reinforcement learning agents substantially. When used in reinforcement learning, a known challenge in state representation learning is enabling an agent to represent environment states with similar characteristics in a manner that would allow said agent to comprehend it as such. We propose a novel algorithm that combines contrastive learning with reinforcement learning so that agents learn to group states by common physical characteristics and action preferences during training. We subsequently generalise these learnings to previously encountered environment obstacles. To enable a reinforcement learning agent to use contrastive learning within its environment interaction loop, we propose a state representation learning model that employs contrastive learning to group states using observations coupled with the action the agent chose within its current state. Our approach uses a combination of two algorithms that we augment to demonstrate the effectiveness of combining contrastive learning with reinforcement learning. The state representation model for contrastive learning is a Simple Framework for Contrastive Learning of Visual Representations (SimCLR) by Chen et al. [2020], which we amend to include action values from the chosen reinforcement learning environment. The policy gradient algorithm (PPO) is our chosen reinforcement learning approach for policy learning, which we combine with SimCLR to form our novel algorithm, Action Contrastive Policy Optimization (ACPO). When combining these augmented algorithms for contrastive reinforcement learning, our results show significant improvement in training performance and generalisation to unseen environment obstacles of similar structure (physical layout of interactive objects) and mechanics (the rules of physics and transition probabilities).

Description

A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in partial fulfillment of the requirements for the degree of Master of Science, Johannesburg 2024

Keywords

Deep learning, Reinforcement learning, Machine learning, Contrastive learning, AI

Citation

Gilbert, Nikhil. (2024). Analyzing the performance and generalisability of incorporating SimCLR into Proximal Policy Optimization in procedurally generated environments [PhD thesis, University of the Witwatersrand, Johannesburg]. WireDSpace.https://hdl.handle.net/10539/41832

Endorsement

Review

Supplemented By

Referenced By