Analyzing the performance and generalisability of incorporating SimCLR into Proximal Policy Optimization in procedurally generated environments

dc.contributor.authorGilbert, Nikhil
dc.contributor.supervisorRosman, Benjamin
dc.date.accessioned2024-10-23T09:10:06Z
dc.date.available2024-10-23T09:10:06Z
dc.date.issued2024
dc.descriptionA dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in partial fulfillment of the requirements for the degree of Master of Science, Johannesburg 2024
dc.description.abstractMultiple approaches to state representation learning have been shown to improve the performance of reinforcement learning agents substantially. When used in reinforcement learning, a known challenge in state representation learning is enabling an agent to represent environment states with similar characteristics in a manner that would allow said agent to comprehend it as such. We propose a novel algorithm that combines contrastive learning with reinforcement learning so that agents learn to group states by common physical characteristics and action preferences during training. We subsequently generalise these learnings to previously encountered environment obstacles. To enable a reinforcement learning agent to use contrastive learning within its environment interaction loop, we propose a state representation learning model that employs contrastive learning to group states using observations coupled with the action the agent chose within its current state. Our approach uses a combination of two algorithms that we augment to demonstrate the effectiveness of combining contrastive learning with reinforcement learning. The state representation model for contrastive learning is a Simple Framework for Contrastive Learning of Visual Representations (SimCLR) by Chen et al. [2020], which we amend to include action values from the chosen reinforcement learning environment. The policy gradient algorithm (PPO) is our chosen reinforcement learning approach for policy learning, which we combine with SimCLR to form our novel algorithm, Action Contrastive Policy Optimization (ACPO). When combining these augmented algorithms for contrastive reinforcement learning, our results show significant improvement in training performance and generalisation to unseen environment obstacles of similar structure (physical layout of interactive objects) and mechanics (the rules of physics and transition probabilities).
dc.description.submitterMM2024
dc.facultyFaculty of Science
dc.identifierhttps://orcid.org/ 0000-0001-8781-9331
dc.identifier.citationGilbert, Nikhil. (2024). Analyzing the performance and generalisability of incorporating SimCLR into Proximal Policy Optimization in procedurally generated environments [PhD thesis, University of the Witwatersrand, Johannesburg]. WireDSpace.https://hdl.handle.net/10539/41832
dc.identifier.urihttps://hdl.handle.net/10539/41832
dc.language.isoen
dc.publisherUniversity of the Witwatersrand, Johannesburg
dc.rights© 2024 University of the Witwatersrand, Johannesburg. All rights reserved. The copyright in this work vests in the University of the Witwatersrand, Johannesburg. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of University of the Witwatersrand, Johannesburg.
dc.schoolSchool of Computer Science and Applied Mathematics
dc.subjectDeep learning
dc.subjectReinforcement learning
dc.subjectMachine learning
dc.subjectContrastive learning
dc.subjectAI
dc.subject.otherSDG-8: Decent work and economic growth
dc.titleAnalyzing the performance and generalisability of incorporating SimCLR into Proximal Policy Optimization in procedurally generated environments
dc.typeDissertation
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Gilbert_Assessment_2024.pdf
Size:
3.54 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.43 KB
Format:
Item-specific license agreed upon to submission
Description: