Multi-pass deep Q-networks for reinforcement learning with parameterised action spaces

Bester, Craig James2020-09-092020-09-092019Bester, Craig James, (2019) Multi-pass deep Q-networks for reinforcement learning with parameterised action spaces, University of the Witwatersrand, Johannesburg, https://hdl.handle.net/10539/29568https://hdl.handle.net/10539/29568dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Master of Science, Johannesburg June 2019Parameterised actions in reinforcement learning are composed of discrete actions with continuous actionparameters. This provides a framework capable of solving complex domains that require learning highlevel action policies with flexible control. Recently, deep Q-networks have been extended to learn over such action spaces with the P-DQN algorithm. However, the method treats all action-parameters as a single joint input to the Q-network, invalidating its theoretical foundations. We demonstrate the disadvantages of this approach and propose two solutions: using split Q-networks, and a novel multi-pass technique. We also propose a weighted-indexed action-parameter loss function to address issues related to the imbalance of sampling and exploration between different parameterised actions. We empirically demonstrate that both our multi-pass algorithm and weighted-indexed loss significantly outperform P-DQN and other previous algorithms in terms of data efficiency and converged policy performance on the Platform, Robot Soccer Goal, and Half Field Offense domains.Online resource (viii, 100 leaves)enMachine learningComputer multitaskingMulti-pass deep Q-networks for reinforcement learning with parameterised action spacesThesis