Analyzing the performance and generalisability of incorporating SimCLR into Proximal Policy Optimization in procedurally generated environments
dc.contributor.author | Gilbert, Nikhil | |
dc.contributor.supervisor | Rosman, Benjamin | |
dc.date.accessioned | 2024-10-23T09:10:06Z | |
dc.date.available | 2024-10-23T09:10:06Z | |
dc.date.issued | 2024 | |
dc.description | A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in partial fulfillment of the requirements for the degree of Master of Science, Johannesburg 2024 | |
dc.description.abstract | Multiple approaches to state representation learning have been shown to improve the performance of reinforcement learning agents substantially. When used in reinforcement learning, a known challenge in state representation learning is enabling an agent to represent environment states with similar characteristics in a manner that would allow said agent to comprehend it as such. We propose a novel algorithm that combines contrastive learning with reinforcement learning so that agents learn to group states by common physical characteristics and action preferences during training. We subsequently generalise these learnings to previously encountered environment obstacles. To enable a reinforcement learning agent to use contrastive learning within its environment interaction loop, we propose a state representation learning model that employs contrastive learning to group states using observations coupled with the action the agent chose within its current state. Our approach uses a combination of two algorithms that we augment to demonstrate the effectiveness of combining contrastive learning with reinforcement learning. The state representation model for contrastive learning is a Simple Framework for Contrastive Learning of Visual Representations (SimCLR) by Chen et al. [2020], which we amend to include action values from the chosen reinforcement learning environment. The policy gradient algorithm (PPO) is our chosen reinforcement learning approach for policy learning, which we combine with SimCLR to form our novel algorithm, Action Contrastive Policy Optimization (ACPO). When combining these augmented algorithms for contrastive reinforcement learning, our results show significant improvement in training performance and generalisation to unseen environment obstacles of similar structure (physical layout of interactive objects) and mechanics (the rules of physics and transition probabilities). | |
dc.description.submitter | MM2024 | |
dc.faculty | Faculty of Science | |
dc.identifier | https://orcid.org/ 0000-0001-8781-9331 | |
dc.identifier.citation | Gilbert, Nikhil. (2024). Analyzing the performance and generalisability of incorporating SimCLR into Proximal Policy Optimization in procedurally generated environments [PhD thesis, University of the Witwatersrand, Johannesburg]. WireDSpace.https://hdl.handle.net/10539/41832 | |
dc.identifier.uri | https://hdl.handle.net/10539/41832 | |
dc.language.iso | en | |
dc.publisher | University of the Witwatersrand, Johannesburg | |
dc.rights | © 2024 University of the Witwatersrand, Johannesburg. All rights reserved. The copyright in this work vests in the University of the Witwatersrand, Johannesburg. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of University of the Witwatersrand, Johannesburg. | |
dc.school | School of Computer Science and Applied Mathematics | |
dc.subject | Deep learning | |
dc.subject | Reinforcement learning | |
dc.subject | Machine learning | |
dc.subject | Contrastive learning | |
dc.subject | AI | |
dc.subject.other | SDG-8: Decent work and economic growth | |
dc.title | Analyzing the performance and generalisability of incorporating SimCLR into Proximal Policy Optimization in procedurally generated environments | |
dc.type | Dissertation |