Beukman, Michael2024-01-262024-01-262024https://hdl.handle.net/10539/37446A research report submitted in partial fulfilment of the requirements for the degree Master of Science to the Faculty of Science, School of Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg, 2023Reinforcement learning (RL) is a widely-used method for training agents to interact with an external environment, and is commonly used in fields such as robotics. While RL has achieved success in several domains, many methods fail to generalise well to scenarios different from those encountered during training. This is a significant limitation that hinders RL’s real-world applicability. In this work, we consider the problem of generalising to new transition dynamics, corresponding to cases in which the effects of the agent’s actions differ; for instance, walking on a slippery vs. rough floor. To address this problem, we introduce a neural network architecture, the Decision Adapter, which leverages contextual information to modulate the behaviour of an agent, depending on the setting it is in. In particular, our method uses the context – information about the current environment, such as the floor’s friction – to generate the weights of an adapter module which influences the agent’s actions. This, for instance, allows an agent to act differently when walking on ice compared to gravel. We theoretically show that our approach generalises a prior network architecture and empirically demonstrate that it results in superior generalisation performance compared to previous approaches in several environments. Furthermore, we show that our method can be applied to multiple RL algorithms, making it a widely-applicable approach to improve generalisationenReinforcement learningRoboticsDynamics generalisation in reinforcement learning through the use of adaptive policiesDissertation