Progressive skill extraction with a curriculum of tasks

Mawjee, Shahil
Journal Title
Journal ISSN
Volume Title
When using reinforcement learning to acquire behaviours, solving complex tasks is often in-feasible due to the curse of dimensionality and the need to explore the search space fully. An approach humans use to address this is making use of structured curricula which proceed from simple tasks to more complex tasks, allowing the learner to leverage knowledge gained in more straight forward tasks when solving more complex tasks. However, this problem not yet been solved for artificial learning agents. In order to achieve this and be fully autonomous, the agent needs to be able to learn to select related tasks, learn how tasks are related and have a mechanism to transfer knowledge between tasks. In this dissertation, we present a training strategy for a reinforcement learning agent to progressively and autonomously extract skills (as options, from hierarchical reinforcement learning) for a given curriculum of tasks. With each iteration, the library of options is refined to be more robust for transfer among the tasks from the curriculum. Our training strategy makes use of the NPBRS algorithm (non-parametric Bayesian reward segmentation) a recently developed technique for extracting skills from demonstrations. We extend the NPBRS algorithm to operate in an iterative manner, so that we can generalise skill policies across the range of tasks and prevent recovering duplicate skill policies. After that, we augment the NPBRS output to include an inferred initiation set and termination condition. This transforms the trajectory segments and skill policies into usable options. In our benchmark task in the code-game-world environment, our reinforcement learning agent was able to learn the target task using around 71.32% fewer samples when given access to a library of options learned using a well-designed curriculum
A dissertation submitted in fulfilment of the requirements for the degree Master of Science in the Faculty of Science, School of Computer Science and Applied Mathematics, University of the Witwatersrand, 2020