The effect of simulation bias on action selection in Monte Carlo Tree Search
James, Steven Doron
Monte Carlo Tree Search (MCTS) is a family of directed search algorithms that has gained widespread attention in recent years. It combines a traditional tree-search approach with Monte Carlo simulations, using the outcome of these simulations (also known as playouts or rollouts) to evaluate states in a look-ahead tree. That MCTS does not require an evaluation function makes it particularly well-suited to the game of Go — seen by many to be chess’s successor as a grand challenge of artificial intelligence — with MCTS-based agents recently able to achieve expert-level play on 19×19 boards. Furthermore, its domain-independent nature also makes it a focus in a variety of other fields, such as Bayesian reinforcement learning and general game-playing. Despite the vast amount of research into MCTS, the dynamics of the algorithm are still not yet fully understood. In particular, the effect of using knowledge-heavy or biased simulations in MCTS still remains unknown, with interesting results indicating that better-informed rollouts do not necessarily result in stronger agents. This research provides support for the notion that MCTS is well-suited to a class of domain possessing a smoothness property. In these domains, biased rollouts are more likely to produce strong agents. Conversely, any error due to incorrect bias is compounded in non-smooth domains, and in particular for low-variance simulations. This is demonstrated empirically in a number of single-agent domains.
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, in fulfilment of the requirements for the degree of Master of Science. August 2016.
James, Steven Doron (2016) The effect of simulation bias on action selection in Monte Carlo Tree Search, University of Witwatersrand, Johannesburg, <http://wiredspace.wits.ac.za/handle/10539/21673>