Browsing by Author "Rosman, Benjamin"

Now showing 1 - 7 of 7

A fully-decentralised general-sum approach for multi-agent reinforcement learning using minimal modelling
(University of the Witwatersrand, Johannesburg, 2023-08) Kruger, Marcel Matthew Anthony; Rosman, Benjamin; James, Steven; Shipton, Jarrod
Multi-agent reinforcement learning is a prominent area of research in machine learning, extending reinforcement learning to scenarios where multiple agents concurrently learn and interact within the same environment. Most existing methods rely on centralisation during training, while others employ agent modelling. In contrast, we propose a novel method that adapts the role of entropy to assist in fully-decentralised training without explicitly modelling other agents using additional information to which most centralised methods assume access. We augment entropy to encourage more deterministic agents, and instead, we let the non-stationarity inherent in MARL serve as a mode for exploration. We empirically evaluate the performance of our method across five distinct environments, each representing unique challenges. Our assessment encompasses both cooperative and competitive cases. Our findings indicate that the approach of penalising entropy, rather than rewarding it, enables agents to perform at least as well as the prevailing standard of entropy maximisation. Moreover, our alternative approach achieves several of the original objectives of entropy regularisation in reinforcement learning, such as increased sample efficiency and potentially better final rewards. Whilst entropy has a significant role, our results in the competitive case indicate that position bias is still a considerable challenge.
Analyzing the performance and generalisability of incorporating SimCLR into Proximal Policy Optimization in procedurally generated environments
(University of the Witwatersrand, Johannesburg, 2024) Gilbert, Nikhil; Rosman, Benjamin
Multiple approaches to state representation learning have been shown to improve the performance of reinforcement learning agents substantially. When used in reinforcement learning, a known challenge in state representation learning is enabling an agent to represent environment states with similar characteristics in a manner that would allow said agent to comprehend it as such. We propose a novel algorithm that combines contrastive learning with reinforcement learning so that agents learn to group states by common physical characteristics and action preferences during training. We subsequently generalise these learnings to previously encountered environment obstacles. To enable a reinforcement learning agent to use contrastive learning within its environment interaction loop, we propose a state representation learning model that employs contrastive learning to group states using observations coupled with the action the agent chose within its current state. Our approach uses a combination of two algorithms that we augment to demonstrate the effectiveness of combining contrastive learning with reinforcement learning. The state representation model for contrastive learning is a Simple Framework for Contrastive Learning of Visual Representations (SimCLR) by Chen et al. [2020], which we amend to include action values from the chosen reinforcement learning environment. The policy gradient algorithm (PPO) is our chosen reinforcement learning approach for policy learning, which we combine with SimCLR to form our novel algorithm, Action Contrastive Policy Optimization (ACPO). When combining these augmented algorithms for contrastive reinforcement learning, our results show significant improvement in training performance and generalisation to unseen environment obstacles of similar structure (physical layout of interactive objects) and mechanics (the rules of physics and transition probabilities).
Creating an adaptive collaborative playstyle-aware companion agent
(University of the Witwatersrand, Johannesburg, 2023-09) Arendse, Lindsay John; Rosman, Benjamin
Companion characters in video games play a unique part in enriching player experience. Companion agents support the player as an ally or sidekick and would typically help the player by providing hints, resources, or even fight along-side the human player. Players often adopt a certain approach or strategy, referred to as a playstyle, whilst playing video games. Players do not only approach challenges in games differently, but also play games differently based on what they find rewarding. Companion agent characters thus have an important role to play by assisting the player in a way which aligns with their playstyle. Existing companion agent approaches fall short and adversely affect the collaborative experience when the companion agent is not able to assist the human player in a manner consistent with their playstyle. Furthermore, if the companion agent cannot assist in real time, player engagement levels are lowered since the player will need to wait for the agent to compute its action - leading to a frustrating player experience. We therefore present a framework for creating companion agents that are adaptive such that they respond in real time with actions that align with the player’s playstyle. Companion agents able to do so are what we refer to as playstyle-aware. Creating a playstyle-aware adaptive agent firstly requires a mechanism for correctly classifying or identifying the player style, before attempting to assist the player with a given task. We present a method which can enable the real time in-game playstyle classification of players. We contribute a hybrid probabilistic supervised learning framework, using Bayesian Inference informed by a K-Nearest Neighbours based likelihood, that is able to classify players in real time at every step within a given game level using only the latest player action or state observation. We empirically evaluate our hybrid classifier against existing work using MiniDungeons, a common benchmark game domain. We further evaluate our approach using real player data from the game Super Mario Bros. We out perform our comparative study and our results highlight the success of our framework in identifying playstyles in a complex human player setting. The second problem we explore is the problem of assisting the identified playstyle with a suitable action. We formally define this as the ‘Learning to Assist’ problem, where given a set of companion agent policies, we aim to determine the policy which best complements the observed playstyle. An action is complementary such that it aligns with the goal of the playstyle. We extend MiniDungeons into a two-player game called Collaborative MiniDungeons which we use to evaluate our companion agent against several comparative baselines. The results from this experiment highlights that companion agents which are able to adapt and assist different playstyles on average bring about a greater player experience when using a playstyle specific reward function as a proxy for what the players find rewarding. In this way we present an approach for creating adaptive companion agents which are playstyle-aware and able to collaborate with players in real time.
Estimating skills in discrete pursuit-evasion games
(University of the Witwatersrand, Johannesburg, 2023) Gomes, Byron John; Rosman, Benjamin
Game Theory is a well-established field in mathematics, economics, and computer science, with a rich history of studying n-person, zero-sum games. Researchers have utilized the best computational power of their time to create computational players that are able to beat the best human players at complex two-player, zero-sum games such as Chess and Go. In the field of Reinforcement Learning and Robotics, these types of games are considered useful environments to conduct experiments about agent behavior and learning. In this research report we explore a subset of discrete skill-dependent pursuit-evasion games upon which we build a framework to estimate player skills. In this game environment a player’s skill determines the actions available to them in each state and the transition dynamics resulting from the chosen action. The game offers a simplified depresentation of more complex games which often have vast state and action spaces, making it difficult to model and analyze player behavior. In this game environment we find that players with incorrect assumptions about an opponent’s skill perform sub-optimally at winning games. Given that knowledge of an opponent’s skill impacts on player performance, we demonstrate that players can use Bayesian inference to estimate their opponent’s skill, based on the action outcomes of an opponent. We also demonstrate that skill estimation is a valuable exercise for players to undertake and show that the performance of players that estimate their opponent’s skill converges to the performance of players given perfect knowledge of their opponent’s skill. This research contributes to our understanding of Bayesian skill estimation in skill-dependent pursuit-evasion games which may be useful in the fields of Multi-agent Reinforcement Learning and Robotics.
MultiI-View Ranking: Tasking Transformers to Generate and Validate Solutions to Math Word Problems
(University of the Witwatersrand, Johannesburg, 2023-11) Mzimba, Rifumo; Klein, Richard; Rosman, Benjamin
The recent developments and success of the Transformer model have resulted in the creation of massive language models that have led to significant improvements in the comprehension of natural language. When fine-tuned for downstream natural language processing tasks with limited data, they achieve state-of-the-art performance. However, these robust models lack the ability to reason mathematically. It has been demonstrated that, when fine-tuned on the small-scale Math Word Problems (MWPs) benchmark datasets, these models are not able to generalize. Therefore, to overcome this limitation, this study proposes to augment the generative objective used in the MWP task with complementary objectives that can assist the model in reasoning more deeply about the MWP task. Specifically, we propose a multi-view generation objective that allows the model to understand the generative task as an abstract syntax tree traversal beyond the sequential generation task. In addition, we propose a complementary verification objective to enable the model to develop heuristics that can distinguish between correct and incorrect solutions. These two goals comprise our multi-view ranking (MVR) framework, in which the model is tasked to generate the prefix, infix, and postfix traversals for a given MWP, and then use the verification task to rank the generated expressions. Our experiments show that the verification objective is more effective at choosing the best expression than the widely used beam search. We further show that when our two objectives are used in conjunction, they can effectively guide our model to learn robust heuristics for the MWP task. In particular, we achieve an absolute percentage improvement of 9.7% and 5.3% over our baseline and the state-of-the-art models on the SVAMP datasets. Our source code can be found on https://github.com/ProxJ/msc-final.
Play-style Identification and Player Modelling for Generating Tailored Advice in Video Games
(University of the Witwatersrand, Johannesburg, 2023-09) Ingram, Branden Corwin; Rosman, Benjamin; Van Alten, Clint; Klein, Richard
Recent advances in fields such as machine learning have enabled the development of systems that are able to achieve super-human performance on a number of domains, specifically in complex games such as Go and StarCraft. Based on these successes, it is reasonable to ask if these learned behaviours could be utilised to improve the performance of humans on the same tasks. However, the types of models used in these systems are typically not easily interpretable, and can not be directly used to improve the performance of a human. Additionally, humans tend to develop stylistic traits based on preference which aid in solving problems or competing at high levels. This thesis looks to address these difficulties by developing an end-to-end pipeline that can provide beneficial advice tailored to a player’s style in a video game setting. Towards this end, we demonstrate the ability to firstly cluster variable length multi-dimensional gameplay trajectories with respect to play-style in an unsupervised fashion. Secondly, we demonstrate the ability to learn to model an individual player’s actions during gameplay. Thirdly we demonstrate the ability to learn policies representative of all the play-styles identified with an environment. Finally, we demonstrate how the utilisation of these components can generate advice which is tailored to the individual’s style. This system would be particularly useful for improving tutorial systems that quickly become redundant lacking any personalisation. Additionally, this pipeline serves as a way for developers to garner insights on their player base which can be utilised for more informed decision-making on future feature releases and updates. For players, they gain a useful tool which can be utilised to learn how to play better as well identify as the characteristics of their gameplay as well as opponents. Furthermore, we contend that our approach has the potential to be employed in a broad range of learning domains.
The application of machine learning methods to satellite data for the management of invasive water hyacinth
(University of the Witwatersrand, Johannesburg, 2023-06) Singh, Geethe; Reynolds, Chevonne; Byrne, Marcus; Rosman, Benjamin
Biological invasions are responsible for some of the most devastating impacts on the world’s ecosystems, with freshwater ecosystems among the worst affected. Invasions threaten not only freshwater biodiversity, but also the provision of ecosystem services. Tackling the impact of invasive aquatic alien plant (IAAP) species in freshwater systems is an ongoing challenge. In the case of water hyacinth (Pontederia crassipes, previously Eichhorniae crassipes), the worst IAAP presents a long-standing management challenge that requires detailed and frequently updated information on its distribution, the context that influences its occurrence, and a systematic way to identify effective biocontrol release events. This is particularly urgent in South Africa, where freshwater resources are scarce and under increasing pressure. This research employs recent advances in machine learning (ML), remote sensing, and cloud computing to improve the chances of successful water hyacinth management. This is achieved by (i) mapping the occurrence of water hyacinth across a large extent, (ii) identifying the factors that are likely driving the occurrence of the weed at multiple scales, from a waterbody level to a national extent, and (iii) finally identifying periods for effective biocontrol release. Consequently, the capacity of these tools demonstrates their potential to facilitate wide-scale, consistent, automated, pre-emptive, data-driven, and evidence-based decision making for managing water hyacinth. The first chapter is a general introduction to the research problem and research questions. In the second chapter, the research combines a novel image thresholding method for water detection with an unsupervised method for aquatic vegetation detection and a supervised random forest model in a hierarchical way to localise and discriminate water hyacinth from other IAAP’s at a national extent. The value of this work is marked by the comparison of the user (87%) and producer accuracy (93%) of the introduced method with previous small-scale studies. As part of this chapter, the results also show the sensor-agnostic and temporally consistent capability of the introduced hierarchical approach to monitor water and aquatic vegetation using Sentinel-2 and Landsat-8 for long periods (from 2013 - present). Lastly, this work demonstrates encouraging results when using a Deep Neural Network (DNN) to directly detect aquatic vegetation and circumvents the need for accurate water extent data. The two chapters that follow (Chapter 3 and 4 described below) introduce an application each that build off the South African water hyacinth distribution and aquatic vegetation time series (derived in Chapter 2). The third chapter uses a species distribution model (SDM) that links climatic, socio-economic, ecological, and hydrological conditions to the presence/absence of water hyacinth throughout South Africa at a waterbody level. Thereafter, explainable AI (xAI) methods (specifically SHapley Additive exPlanations or SHAP) are applied to better understand the factors that are likely driving the occurrence of water hyacinth. The analyses of 82 variables (of 140 considered) show that the most common group of drivers primarily associated with the occurrence of water hyacinth in South Africa are climatically related (41.4%). This is followed by natural land cover categories (32.9%) and socio-economic variables (10.7%), which include artificial land-cover. The two least influential groups are hydrological variables (10.4%) including water seasonality, runoff, and flood risk, and ecological variables (4.7%) including riparian soil conditions and interspecies competition. These results suggest the importance of considering landscape context when prioritising the type (mechanical, biological, chemical, or integrated) of weed management to use. To enable the prioritisation of suitable biocontrol release dates, the fourth chapter forecasts 70-day open water proportion post-release as a reward for effective biocontrol. This enabled the simulation of the effect of synthetic biocontrol release events under a multiarmed bandit framework for the identification of two effective biocontrol release periods (late spring/early summer (mid-November) and late summer (late February to mid-March)). The latter release period was estimated to result in an 8-27% higher average open-water cover post-release compared to actual biocontrol release events during the study period (May 2018 - July 2020). Hartbeespoort Dam, South Africa, is considered as a case study for improving the pre-existing management strategy used during the biocontrol of water hyacinth. The novel frameworks introduced in this work go a long way in advancing IAAP species management in the age of both ongoing drives towards the adoption of artificial intelligence and sustainability for a better future. It goes beyond (i) traditional small-scale and infrequent mapping, (ii) standard SDMs, to now include the benefits of spatially explicit model explainability, and (iii) introduces a semi-automated and widely applicable method to explore potential biocontrol release events. The direct benefit of this work, or indirect benefits from derivative work outweighs both the low production costs or equivalent field and lab work. To improve the adoption of modern ML and Earth Observation (EO) tools for invasive species management, some of the developed tools are publicly accessible. In addition, a human-AI symbiosis that combines strengths and compensates for weaknesses is strongly recommended. For each application, directions are provided for future research based on the drawbacks and limitations of the introduced systems. These future efforts will likely increase the adoption of EO-derived products by water managers and improve the reliability of these products.