Improving reinforcement learning with ensembles of different learners

dc.contributor.authorCrafford, Gerrie
dc.date.accessioned2021-12-17T16:00:18Z
dc.date.available2021-12-17T16:00:18Z
dc.date.issued2021
dc.descriptionA dissertation submitted in partial fulfilment for the degree of Master of Science to the Faculty of Science, School of Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg, 2021en_ZA
dc.description.abstractDifferent reinforcement learning methods exist to address the problem of combining multiple dif ferent learners to generate a superior learner, from ensemble methods to policy reuse methods. These methods usually assume that each learner uses the same algorithm and/or state represen tation and often require learners to be pre-trained. This assumption prevents very different types of learners, that can potentially complement each other well, from being used together. We propose a novel algorithm, Adaptive Probabilistic Ensemble Learning (APEL), which is an ensemble learner that combines a set of base reinforcement learners and leverages the strengths of the different base learners online, while remaining agnostic to the inner workings of the base learners, thereby allowing it to combine very different types of learners. The ensemble learner selects the base learners that perform best on average by keeping track of the performance of the base learners and then probabilistically selecting a base learner for each episode according the historical performance of the base learners. Along with a description of the proposed algorithm, we present a theoretical analysis of its behaviour and performance. We demonstrate the proposed ensemble learner’s ability to select the best base learner on av erage, combine the strengths of multiple base learners, including Q-learning, deep Q-network (DQN), Actor-Critic with Experience Replay (ACER), and learners with different state repre sentations, as well as its ability to adapt to changes in base learner performance on grid world navigation tasks, the Cartpole domain, and the Atari Breakout domain. The effect that the en semble learner’s hyperparameter has on its behaviour and performance is also quantified through different experiments.en_ZA
dc.description.librarianTL (2021)en_ZA
dc.facultyFaculty of Scienceen_ZA
dc.format.extentOnline resource (56 leaves)
dc.identifier.citationCrafford, Gerrie (2021) Improving reinforcement learning with ensembles of different learners, University of the Witwatersrand, Johannesburg, <http://hdl.handle.net/10539/32383>
dc.identifier.urihttps://hdl.handle.net/10539/32383
dc.language.isoenen_ZA
dc.schoolSchool of Computer Science and Applied Mathematicsen_ZA
dc.subject.lcshReinforcement learning
dc.subject.lcshMachine learning
dc.titleImproving reinforcement learning with ensembles of different learnersen_ZA
dc.typeThesisen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Gerrie Crafford_gj_2288909_thesis_final_submission.pdf
Size:
3.32 MB
Format:
Adobe Portable Document Format
Description:
Main Work
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections