Spectral reinforcement learning
No Thumbnail Available
Date
2019-03
Authors
Earle, Adam C.
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The widespread success of deep learning techniques in recent years has been nothing short of remarkable. These techniques have revolutionized many diverse fields, such as machine vision and natural language processing; driving breakthrough technologies such as driverless cars and language translation. Despite the broad applicability of deep learning methods, successes have been largely confined to perceptual domains. The application of deep learning techniques to control problems on the other hand, has been met with temperate success. Although deep reinforcement learning (RL) methods have certainly yielded a number of impressive results in specific domains such as chess,
go, and StarCraft, these successes have been equally revealing of the limitations of these approaches. Current state of the art methods are known to be extremely sample inefficient, often requiring millions of times more samples than their human counterparts to solve these problems, and are often limited to solving one task at a time.
At the heart of the success of deep learning techniques is the recursive composition of basic features in the construction of ever more abstract representations. Yet this key insight is utilized only in a limited way in current RL methods; typically in the approximation of an auxiliary value function. Crucially, the action selection protocol for these methods remains ultimately flat. A richer manifestation of the key insights provided by deep learning, would see the agent invoke a hierarchical actionselection protocol in which primitive actions are recursively composed to yield complex behavioural
abstractions. In this thesis I propose a new RL method wherein the agent executes a recursive compositional action selection protocol. This is achieved through the construction of a hierarchical control architecture with a bidirectional execution paradigm in which higher level behaviours are derived from the recursively composition of lower level controllers, and inversely primitive actions arise from the concurrent execution of multiple high-level controllers. In addition, I develop a fully autonomous procedure for uncovering the hierarchical architecture directly from the agent’s experience in the domain. The novel control architecture allows the agent to uncover and exploit the multiscale structure of real world tasks, demonstrating powerful multitasking capabilities.
The theory is developed in four phases. Firstly I present a formalism of multitask learning with compositional policies in the offline setting; highlighting the powerful transfer capabilities of the agent. Secondly, this methodology is extended to the online setting through the construction of an integrated learning problem in which the agent periodically adjusts the relative influence of its constituent policies in response to new experiences. The extension to the online setting requires a mapping between the base layer states and the constituent policies. Thirdly, I propose an autonomous procedure by which this mapping may be uncovered. This sets the stage for the recursive application
of the abstraction procedure. Finally I define the recursive procedure by which the deep hierarchical control architecture may be constructed, and the corresponding execution model through which it is ultimately actioned.
Description
A thesis submitted to the Faculty of Science in fulfilment of the requirement for the degree Doctor of Philosophy (PhD) in Applied Mathematics, School of Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg, 2019