Analyzing the performance and generalisability of incorporating SimCLR into Proximal Policy Optimization in procedurally generated environments

Gilbert, Nikhil

Analyzing the performance and generalisability of incorporating SimCLR into Proximal Policy Optimization in procedurally generated environments

dc.contributor.author	Gilbert, Nikhil
dc.contributor.supervisor	Rosman, Benjamin
dc.date.accessioned	2024-10-23T09:10:06Z
dc.date.available	2024-10-23T09:10:06Z
dc.date.issued	2024
dc.description	A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in partial fulfillment of the requirements for the degree of Master of Science, Johannesburg 2024
dc.description.abstract	Multiple approaches to state representation learning have been shown to improve the performance of reinforcement learning agents substantially. When used in reinforcement learning, a known challenge in state representation learning is enabling an agent to represent environment states with similar characteristics in a manner that would allow said agent to comprehend it as such. We propose a novel algorithm that combines contrastive learning with reinforcement learning so that agents learn to group states by common physical characteristics and action preferences during training. We subsequently generalise these learnings to previously encountered environment obstacles. To enable a reinforcement learning agent to use contrastive learning within its environment interaction loop, we propose a state representation learning model that employs contrastive learning to group states using observations coupled with the action the agent chose within its current state. Our approach uses a combination of two algorithms that we augment to demonstrate the effectiveness of combining contrastive learning with reinforcement learning. The state representation model for contrastive learning is a Simple Framework for Contrastive Learning of Visual Representations (SimCLR) by Chen et al. [2020], which we amend to include action values from the chosen reinforcement learning environment. The policy gradient algorithm (PPO) is our chosen reinforcement learning approach for policy learning, which we combine with SimCLR to form our novel algorithm, Action Contrastive Policy Optimization (ACPO). When combining these augmented algorithms for contrastive reinforcement learning, our results show significant improvement in training performance and generalisation to unseen environment obstacles of similar structure (physical layout of interactive objects) and mechanics (the rules of physics and transition probabilities).
dc.description.submitter	MM2024
dc.faculty	Faculty of Science
dc.identifier	https://orcid.org/ 0000-0001-8781-9331
dc.identifier.citation	Gilbert, Nikhil. (2024). Analyzing the performance and generalisability of incorporating SimCLR into Proximal Policy Optimization in procedurally generated environments [PhD thesis, University of the Witwatersrand, Johannesburg]. WireDSpace.https://hdl.handle.net/10539/41832
dc.identifier.uri	https://hdl.handle.net/10539/41832
dc.language.iso	en
dc.publisher	University of the Witwatersrand, Johannesburg
dc.rights	© 2024 University of the Witwatersrand, Johannesburg. All rights reserved. The copyright in this work vests in the University of the Witwatersrand, Johannesburg. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of University of the Witwatersrand, Johannesburg.
dc.school	School of Computer Science and Applied Mathematics
dc.subject	Deep learning
dc.subject	Reinforcement learning
dc.subject	Machine learning
dc.subject	Contrastive learning
dc.subject	AI
dc.subject.other	SDG-8: Decent work and economic growth
dc.title	Analyzing the performance and generalisability of incorporating SimCLR into Proximal Policy Optimization in procedurally generated environments
dc.type	Dissertation

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Gilbert_Assessment_2024.pdf
Size:: 3.54 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.43 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Electronic Theses and Dissertations (Masters)