A task algebra for agents in reinforcement learning

Tasse, Geraud Nangue

A task algebra for agents in reinforcement learning

Files

Tasse.pdf (12.51 MB)

Date

2020

Authors

Tasse, Geraud Nangue

Abstract

A necessary property for generally intelligent or lifelong-learning agents is the ability to reuse the knowledge learned from old tasks to solve new tasks. This knowledge reuse can come in the form of zero-shot learning- where it is sufficient to immediately solve new tasks-or few-shot learning- where some additional learning needs to be done to solve new tasks. Of particular interest is the class of tasks that can solved via zero-learning since it leads to a direct way of generalising over a problem space. One such class of tasks seems to be tasks specified by the arbitrary logical composition of already solved tasks. That is tasks specified by arbitrary disjunction (union), conjunction (intersection), and complement (negation) of learned tasks. The potential for zero-shot learning here stems from the intuitive understanding that humans seem to have of the union, intersection, and negation of tasks that they know. This zero-shot learning problem is yet to be solved despite the general success of reinforcement learning in the past decade and the current success of transfer learning methods that involve policies or value functions composition. One possible cause of this is that there is no unifying formalism for the disjunction, conjunction, and negation of tasks. This work addresses the problem first by formally defining the composition of tasks as operators acting on a set of tasks in an algebraic structure. This provides a structured way of doing task compositions and a theoretically rigorous way of studying them. We propose a framework for defining lattice algebras and Boolean algebras in particular over the space of tasks. This allows us to formulate new tasks in terms of the negation, disjunction, and conjunction of a set of base tasks. We then show that by learning a new type of goal-oriented value functions and restricting the rewards of the tasks, an agent can solve composite tasks with no further learning. We verify our approach in two domains- including a high-dimensional video game environment requiring function approximation- where an agent first learns a set of base skills, and then composes them to solve a super-exponential number of new tasks

Description

A thesis submitted to the Faculty of Science, University of Witwatersrand, in fulfillment of the requirements for the degree of Master of Science, 2020

URI

https://hdl.handle.net/10539/31411

Collections

ETD Collection

Full item page