Electronic Theses and Dissertations (Masters)
Permanent URI for this collection
Browse
Browsing Electronic Theses and Dissertations (Masters) by Author "Rosman, Benjamin"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item A fully-decentralised general-sum approach for multi-agent reinforcement learning using minimal modelling(University of the Witwatersrand, Johannesburg, 2023-08) Kruger, Marcel Matthew Anthony; Rosman, Benjamin; James, Steven; Shipton, JarrodMulti-agent reinforcement learning is a prominent area of research in machine learning, extending reinforcement learning to scenarios where multiple agents concurrently learn and interact within the same environment. Most existing methods rely on centralisation during training, while others employ agent modelling. In contrast, we propose a novel method that adapts the role of entropy to assist in fully-decentralised training without explicitly modelling other agents using additional information to which most centralised methods assume access. We augment entropy to encourage more deterministic agents, and instead, we let the non-stationarity inherent in MARL serve as a mode for exploration. We empirically evaluate the performance of our method across five distinct environments, each representing unique challenges. Our assessment encompasses both cooperative and competitive cases. Our findings indicate that the approach of penalising entropy, rather than rewarding it, enables agents to perform at least as well as the prevailing standard of entropy maximisation. Moreover, our alternative approach achieves several of the original objectives of entropy regularisation in reinforcement learning, such as increased sample efficiency and potentially better final rewards. Whilst entropy has a significant role, our results in the competitive case indicate that position bias is still a considerable challenge.Item Analyzing the performance and generalisability of incorporating SimCLR into Proximal Policy Optimization in procedurally generated environments(University of the Witwatersrand, Johannesburg, 2024) Gilbert, Nikhil; Rosman, BenjaminMultiple approaches to state representation learning have been shown to improve the performance of reinforcement learning agents substantially. When used in reinforcement learning, a known challenge in state representation learning is enabling an agent to represent environment states with similar characteristics in a manner that would allow said agent to comprehend it as such. We propose a novel algorithm that combines contrastive learning with reinforcement learning so that agents learn to group states by common physical characteristics and action preferences during training. We subsequently generalise these learnings to previously encountered environment obstacles. To enable a reinforcement learning agent to use contrastive learning within its environment interaction loop, we propose a state representation learning model that employs contrastive learning to group states using observations coupled with the action the agent chose within its current state. Our approach uses a combination of two algorithms that we augment to demonstrate the effectiveness of combining contrastive learning with reinforcement learning. The state representation model for contrastive learning is a Simple Framework for Contrastive Learning of Visual Representations (SimCLR) by Chen et al. [2020], which we amend to include action values from the chosen reinforcement learning environment. The policy gradient algorithm (PPO) is our chosen reinforcement learning approach for policy learning, which we combine with SimCLR to form our novel algorithm, Action Contrastive Policy Optimization (ACPO). When combining these augmented algorithms for contrastive reinforcement learning, our results show significant improvement in training performance and generalisation to unseen environment obstacles of similar structure (physical layout of interactive objects) and mechanics (the rules of physics and transition probabilities).Item Creating an adaptive collaborative playstyle-aware companion agent(University of the Witwatersrand, Johannesburg, 2023-09) Arendse, Lindsay John; Rosman, BenjaminCompanion characters in video games play a unique part in enriching player experience. Companion agents support the player as an ally or sidekick and would typically help the player by providing hints, resources, or even fight along-side the human player. Players often adopt a certain approach or strategy, referred to as a playstyle, whilst playing video games. Players do not only approach challenges in games differently, but also play games differently based on what they find rewarding. Companion agent characters thus have an important role to play by assisting the player in a way which aligns with their playstyle. Existing companion agent approaches fall short and adversely affect the collaborative experience when the companion agent is not able to assist the human player in a manner consistent with their playstyle. Furthermore, if the companion agent cannot assist in real time, player engagement levels are lowered since the player will need to wait for the agent to compute its action - leading to a frustrating player experience. We therefore present a framework for creating companion agents that are adaptive such that they respond in real time with actions that align with the player’s playstyle. Companion agents able to do so are what we refer to as playstyle-aware. Creating a playstyle-aware adaptive agent firstly requires a mechanism for correctly classifying or identifying the player style, before attempting to assist the player with a given task. We present a method which can enable the real time in-game playstyle classification of players. We contribute a hybrid probabilistic supervised learning framework, using Bayesian Inference informed by a K-Nearest Neighbours based likelihood, that is able to classify players in real time at every step within a given game level using only the latest player action or state observation. We empirically evaluate our hybrid classifier against existing work using MiniDungeons, a common benchmark game domain. We further evaluate our approach using real player data from the game Super Mario Bros. We out perform our comparative study and our results highlight the success of our framework in identifying playstyles in a complex human player setting. The second problem we explore is the problem of assisting the identified playstyle with a suitable action. We formally define this as the ‘Learning to Assist’ problem, where given a set of companion agent policies, we aim to determine the policy which best complements the observed playstyle. An action is complementary such that it aligns with the goal of the playstyle. We extend MiniDungeons into a two-player game called Collaborative MiniDungeons which we use to evaluate our companion agent against several comparative baselines. The results from this experiment highlights that companion agents which are able to adapt and assist different playstyles on average bring about a greater player experience when using a playstyle specific reward function as a proxy for what the players find rewarding. In this way we present an approach for creating adaptive companion agents which are playstyle-aware and able to collaborate with players in real time.Item Estimating skills in discrete pursuit-evasion games(University of the Witwatersrand, Johannesburg, 2023) Gomes, Byron John; Rosman, BenjaminGame Theory is a well-established field in mathematics, economics, and computer science, with a rich history of studying n-person, zero-sum games. Researchers have utilized the best computational power of their time to create computational players that are able to beat the best human players at complex two-player, zero-sum games such as Chess and Go. In the field of Reinforcement Learning and Robotics, these types of games are considered useful environments to conduct experiments about agent behavior and learning. In this research report we explore a subset of discrete skill-dependent pursuit-evasion games upon which we build a framework to estimate player skills. In this game environment a player’s skill determines the actions available to them in each state and the transition dynamics resulting from the chosen action. The game offers a simplified depresentation of more complex games which often have vast state and action spaces, making it difficult to model and analyze player behavior. In this game environment we find that players with incorrect assumptions about an opponent’s skill perform sub-optimally at winning games. Given that knowledge of an opponent’s skill impacts on player performance, we demonstrate that players can use Bayesian inference to estimate their opponent’s skill, based on the action outcomes of an opponent. We also demonstrate that skill estimation is a valuable exercise for players to undertake and show that the performance of players that estimate their opponent’s skill converges to the performance of players given perfect knowledge of their opponent’s skill. This research contributes to our understanding of Bayesian skill estimation in skill-dependent pursuit-evasion games which may be useful in the fields of Multi-agent Reinforcement Learning and Robotics.Item MultiI-View Ranking: Tasking Transformers to Generate and Validate Solutions to Math Word Problems(University of the Witwatersrand, Johannesburg, 2023-11) Mzimba, Rifumo; Klein, Richard; Rosman, BenjaminThe recent developments and success of the Transformer model have resulted in the creation of massive language models that have led to significant improvements in the comprehension of natural language. When fine-tuned for downstream natural language processing tasks with limited data, they achieve state-of-the-art performance. However, these robust models lack the ability to reason mathematically. It has been demonstrated that, when fine-tuned on the small-scale Math Word Problems (MWPs) benchmark datasets, these models are not able to generalize. Therefore, to overcome this limitation, this study proposes to augment the generative objective used in the MWP task with complementary objectives that can assist the model in reasoning more deeply about the MWP task. Specifically, we propose a multi-view generation objective that allows the model to understand the generative task as an abstract syntax tree traversal beyond the sequential generation task. In addition, we propose a complementary verification objective to enable the model to develop heuristics that can distinguish between correct and incorrect solutions. These two goals comprise our multi-view ranking (MVR) framework, in which the model is tasked to generate the prefix, infix, and postfix traversals for a given MWP, and then use the verification task to rank the generated expressions. Our experiments show that the verification objective is more effective at choosing the best expression than the widely used beam search. We further show that when our two objectives are used in conjunction, they can effectively guide our model to learn robust heuristics for the MWP task. In particular, we achieve an absolute percentage improvement of 9.7% and 5.3% over our baseline and the state-of-the-art models on the SVAMP datasets. Our source code can be found on https://github.com/ProxJ/msc-final.