Electronic Theses and Dissertations (Masters)

Permanent URI for this collectionhttps://hdl.handle.net/10539/38006

Browse

Search Results

Now showing 1 - 5 of 5

Optimisation of Kick Latency for Enhanced Performance of Robots in the RoboCup Three-Dimensional League through Proximal Policy Optimisation (PPO)
(University of the Witwatersrand, Johannesburg, 2024-07) Nekhumbe, Humbulani Colbert; Ranchod, Pravesh
This study aimed to enhance the kicking ability of Nao robots in the three-dimensional RoboCup simulation by addressing a crucial challenge observed in the University of Witwatersrand RoboCup team. The focal challenge revolved around a noticeable delay and slow movement manifested by the robot during ball kicks, leading to vulnerabilities in ball possession against opposing teams. To surmount this challenge, the implementation of Proximal Policy Optimisation (PPO), a methodology pioneered by OpenAI, was advocated. The precise objective was to optimise kick parameters, with a primary emphasis on curtailing kick latency. This optimisation aimed to ensure swift and accurate execution across various kicking scenarios, encompassing actions like propelling the ball into the opponent’s territory to bolster ball possession and thwart adversary manoeuvres. Harnessing the iterative advancements embedded in PPO, the successor to Trust Region Policy Optimisation (TRPO), the endeavour was to refine the kicking behaviour of Nao robots. This optimisation process significantly reduced the observed kick delay, and this made the robot more agile and effective at competing in the complex three-dimensional RoboCup simulation environment. The study’s outcomes highlighted substantial progress in reducing kick latency and improving the adaptability of robotic soccer players, opening up possibilities for further exploration in reinforcement learning for autonomous agents.
Counting Reward Automata: Exploiting Structure in Reward Functions Expressible in Decidable Formal Languages
(University of the Witwatersrand, Johannesburg, 2024-07) Bester, Tristan; Rosman, Benjamin; James, Steven; Tasse, Geraud Nangue
In general, reinforcement learning agents are restricted from directly accessing the environment model. This restricts the agent’s access to the environmental dynamics and reward models, which are only accessible through repeated environmental interactions. As reinforcement learning is well suited for use in complex environments, which are challenging to model, the general assumption that the transition probabilities associated with the environment are unknown is justified. However, as agents cannot discern rewards directly from the environment, reward functions must be designed and implemented for both simulated and real-world environments. As a result, the assumption that the reward model must remain hidden from the agent is unnecessary and detrimental to learning. Previously, methods have been developed that utilise the structure of the reward function to enable more sample-efficient learning. These methods employ a finite state machine variant to facilitate reward specification in a manner that exposes the internal structure of the reward function. This approach is particularly effective when solving long-horizon tasks as it enables the use of counterfactual reasoning with off-policy learning which significantly improves sample efficiency. However, as these approaches are dependent on finite-state machines, they are only able to express a small number of reward functions. This severely limits the applicability of these approaches as they cannot model simple tasks such as “fetch a coffee for each person in the office” which involves counting – one of the numerous properties finite state machines cannot model. This work addresses the limited expressiveness of current state machine-based approaches to reward modelling. Specifically, we introduce a novel approach compatible with any reward function which can be expressed as a well-defined algorithm We present the counting reward automaton – an abstract machine capable of modelling reward functions expressible in any decidable formal language. Unlike previous approaches to state machine-based reward modelling, which are limited to the expression of tasks as regular languages, our framework allows for tasks described by decidable formal languages. It follows that our framework is an extremely general approach to reward modelling – compatible with any task specification expressible as a well-defined algorithm. This is a significant contribution as it greatly extends the class of problems which can benefit from the improved learning techniques facilitated by state machine-based reward modelling. We prove that an agent equipped with such an abstract machine is able to solve an extended set of tasks. We show that this increase in expressive power does not come at the cost of increased automaton complexity. This is followed by the introduction of several learning algorithms designed to increase sample efficiency through the exploitation of automaton structure. These algorithms are based on counterfactual reasoning with off-policy RL and use techniques from the fields of HRL and reward shaping. Finally, we evaluate our approach in several domains requiring long-horizon plans. Empirical results demonstrate that our method outperforms competing approaches in terms of automaton complexity, sample efficiency, and task completion.
Creating an adaptive collaborative playstyle-aware companion agent
(University of the Witwatersrand, Johannesburg, 2023-09) Arendse, Lindsay John; Rosman, Benjamin
Companion characters in video games play a unique part in enriching player experience. Companion agents support the player as an ally or sidekick and would typically help the player by providing hints, resources, or even fight along-side the human player. Players often adopt a certain approach or strategy, referred to as a playstyle, whilst playing video games. Players do not only approach challenges in games differently, but also play games differently based on what they find rewarding. Companion agent characters thus have an important role to play by assisting the player in a way which aligns with their playstyle. Existing companion agent approaches fall short and adversely affect the collaborative experience when the companion agent is not able to assist the human player in a manner consistent with their playstyle. Furthermore, if the companion agent cannot assist in real time, player engagement levels are lowered since the player will need to wait for the agent to compute its action - leading to a frustrating player experience. We therefore present a framework for creating companion agents that are adaptive such that they respond in real time with actions that align with the player’s playstyle. Companion agents able to do so are what we refer to as playstyle-aware. Creating a playstyle-aware adaptive agent firstly requires a mechanism for correctly classifying or identifying the player style, before attempting to assist the player with a given task. We present a method which can enable the real time in-game playstyle classification of players. We contribute a hybrid probabilistic supervised learning framework, using Bayesian Inference informed by a K-Nearest Neighbours based likelihood, that is able to classify players in real time at every step within a given game level using only the latest player action or state observation. We empirically evaluate our hybrid classifier against existing work using MiniDungeons, a common benchmark game domain. We further evaluate our approach using real player data from the game Super Mario Bros. We out perform our comparative study and our results highlight the success of our framework in identifying playstyles in a complex human player setting. The second problem we explore is the problem of assisting the identified playstyle with a suitable action. We formally define this as the ‘Learning to Assist’ problem, where given a set of companion agent policies, we aim to determine the policy which best complements the observed playstyle. An action is complementary such that it aligns with the goal of the playstyle. We extend MiniDungeons into a two-player game called Collaborative MiniDungeons which we use to evaluate our companion agent against several comparative baselines. The results from this experiment highlights that companion agents which are able to adapt and assist different playstyles on average bring about a greater player experience when using a playstyle specific reward function as a proxy for what the players find rewarding. In this way we present an approach for creating adaptive companion agents which are playstyle-aware and able to collaborate with players in real time.
A Continuous Reinforcement Learning Approach to Self-Adaptive Particle Swarm Optimisation
(University of the Witwatersrand, Johannesburg, 2023-08) Tilley, Duncan; Cleghorn, Christopher
Particle Swarm Optimisation (PSO) is a popular black-box optimisation technique due to its simple implementation and surprising ability to perform well on various problems. Unfortunately, PSO is fairly sensitive to the choice of hyper-parameters. For this reason, many self-adaptive techniques have been proposed that attempt to both simplify hyper-parameter selection and improve the performance of PSO. Surveys however show that many self-adaptive techniques are still outperformed by time-varying techniques where the value of coefficients are simply increased or decreased over time. More recent works have shown the successful application of Reinforcement Learning (RL) to learn self-adaptive control policies for optimisers such as differential evolution, genetic algorithms, and PSO. However, many of these applications were limited to only discrete state and action spaces, which severely limits the choices available to a control policy, given that the PSO coefficients are continuous variables. This dissertation therefore investigates the application of continuous RL techniques to learn a self-adaptive control policy that can make full use of the continuous nature of the PSO coefficients. The dissertation first introduces the RL framework used to learn a continuous control policy by defining the environment, action-space, state-space, and a number of possible reward functions. An effective learning environment that is able to overcome the difficulties of continuous RL is then derived through a series of experiments, culminating in a successfully learned continuous control policy. The policy is then shown to perform well on the benchmark problems used during training when compared to other self-adaptive PSO algorithms. Further testing on benchmark problems not seen during training suggest that the learned policy may however not generalise well to other functions, but this is shown to also be a problem in other PSO algorithms. Finally, the dissertation performs a number of experiments to provide insights into the behaviours learned by the continuous control policy.
A fully-decentralised general-sum approach for multi-agent reinforcement learning using minimal modelling
(University of the Witwatersrand, Johannesburg, 2023-08) Kruger, Marcel Matthew Anthony; Rosman, Benjamin; James, Steven; Shipton, Jarrod
Multi-agent reinforcement learning is a prominent area of research in machine learning, extending reinforcement learning to scenarios where multiple agents concurrently learn and interact within the same environment. Most existing methods rely on centralisation during training, while others employ agent modelling. In contrast, we propose a novel method that adapts the role of entropy to assist in fully-decentralised training without explicitly modelling other agents using additional information to which most centralised methods assume access. We augment entropy to encourage more deterministic agents, and instead, we let the non-stationarity inherent in MARL serve as a mode for exploration. We empirically evaluate the performance of our method across five distinct environments, each representing unique challenges. Our assessment encompasses both cooperative and competitive cases. Our findings indicate that the approach of penalising entropy, rather than rewarding it, enables agents to perform at least as well as the prevailing standard of entropy maximisation. Moreover, our alternative approach achieves several of the original objectives of entropy regularisation in reinforcement learning, such as increased sample efficiency and potentially better final rewards. Whilst entropy has a significant role, our results in the competitive case indicate that position bias is still a considerable challenge.

Electronic Theses and Dissertations (Masters)

Browse

Filters

Settings

Sort By

Results per page

Search Results