Multi-pass deep Q-networks for reinforcement learning with parameterised action spaces

Bester, Craig James

Multi-pass deep Q-networks for reinforcement learning with parameterised action spaces

Files

cbester_msc_dissertation.pdf (3.34 MB)

Date

2019

Authors

Bester, Craig James

Abstract

Parameterised actions in reinforcement learning are composed of discrete actions with continuous actionparameters. This provides a framework capable of solving complex domains that require learning highlevel action policies with flexible control. Recently, deep Q-networks have been extended to learn over such action spaces with the P-DQN algorithm. However, the method treats all action-parameters as a single joint input to the Q-network, invalidating its theoretical foundations. We demonstrate the disadvantages of this approach and propose two solutions: using split Q-networks, and a novel multi-pass technique. We also propose a weighted-indexed action-parameter loss function to address issues related to the imbalance of sampling and exploration between different parameterised actions. We empirically demonstrate that both our multi-pass algorithm and weighted-indexed loss significantly outperform P-DQN and other previous algorithms in terms of data efficiency and converged policy performance on the Platform, Robot Soccer Goal, and Half Field Offense domains.

Description

dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Master of Science, Johannesburg June 2019

Citation

Bester, Craig James, (2019) Multi-pass deep Q-networks for reinforcement learning with parameterised action spaces, University of the Witwatersrand, Johannesburg, https://hdl.handle.net/10539/29568

URI

https://hdl.handle.net/10539/29568

Collections

ETD Collection

Full item page