Crafford, Gerrie2021-12-172021-12-172021Crafford, Gerrie (2021) Improving reinforcement learning with ensembles of different learners, University of the Witwatersrand, Johannesburg, <http://hdl.handle.net/10539/32383>https://hdl.handle.net/10539/32383A dissertation submitted in partial fulfilment for the degree of Master of Science to the Faculty of Science, School of Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg, 2021Different reinforcement learning methods exist to address the problem of combining multiple dif ferent learners to generate a superior learner, from ensemble methods to policy reuse methods. These methods usually assume that each learner uses the same algorithm and/or state represen tation and often require learners to be pre-trained. This assumption prevents very different types of learners, that can potentially complement each other well, from being used together. We propose a novel algorithm, Adaptive Probabilistic Ensemble Learning (APEL), which is an ensemble learner that combines a set of base reinforcement learners and leverages the strengths of the different base learners online, while remaining agnostic to the inner workings of the base learners, thereby allowing it to combine very different types of learners. The ensemble learner selects the base learners that perform best on average by keeping track of the performance of the base learners and then probabilistically selecting a base learner for each episode according the historical performance of the base learners. Along with a description of the proposed algorithm, we present a theoretical analysis of its behaviour and performance. We demonstrate the proposed ensemble learner’s ability to select the best base learner on av erage, combine the strengths of multiple base learners, including Q-learning, deep Q-network (DQN), Actor-Critic with Experience Replay (ACER), and learners with different state repre sentations, as well as its ability to adapt to changes in base learner performance on grid world navigation tasks, the Cartpole domain, and the Atari Breakout domain. The effect that the en semble learner’s hyperparameter has on its behaviour and performance is also quantified through different experiments.Online resource (56 leaves)enReinforcement learningMachine learningImproving reinforcement learning with ensembles of different learnersThesis