Monte-Carlo Tree Search Parallelisation International Go Symposium - - PowerPoint PPT Presentation
Monte-Carlo Tree Search Parallelisation International Go Symposium - - PowerPoint PPT Presentation
Monte-Carlo Tree Search Parallelisation International Go Symposium 2012 Francois van Niekerk francoisvn@ml.sun.ac.za August 2012 Collaborators: Steve Kroon Gert-Jan van Rooyen Cornelia Inggs This work was partially supported by the National
Collaborators: Steve Kroon Gert-Jan van Rooyen Cornelia Inggs This work was partially supported by the National Research Foundation of South Africa.
Outline
1
Introduction
2
Background Computer Go Monte-Carlo Tree Search Parallelisation
3
Implementation
4
Testing and Results Multi-Core Parallelisation Cluster Parallelisation
5
New Developments
6
Conclusions
Introduction
- Top Go programs are currently about 5 dan KGS.
- Monte-Carlo Tree Search (MCTS) is dominant Computer
Go algorithm.
- MCTS parallelisation possible on multi-core and cluster
systems.
Computer Go
- Tree for moves and their follow-ups.
- Exponential tree growth means brute-force is infeasible.
- Evaluation function is used to avoid growing tree too far.
Classical Methods
- Emulate humans with expert knowledge.
- Difficult to assimilate new knowledge into an already large
body.
- Top strength in SDKs, far from pros.
Monte-Carlo Tree Search
- Monte-Carlo methods — stochastic simulations (playouts).
- Winrate of playouts starting from a position is the value of
the position.
- Playouts are used in a tree to form Monte-Carlo Tree
Search (MCTS).
- MCTS can be broken into four parts: selection, expansion,
simulation and backpropagation.
Monte-Carlo Tree Search
4/9 1/3 0/1 1/1 0/1 3/5 2/3 1/1 0/1 0/1 Selection
Monte-Carlo Tree Search
4/9 1/3 0/1 1/1 0/1 3/5 2/3 1/1 0/1 0/1 Expansion
Monte-Carlo Tree Search
4/9 1/3 0/1 1/1 0/1 3/5 2/3 1/1 0/1 0/1 W Simulation (playout)
Monte-Carlo Tree Search
4/9 1/3 0/1 1/1 0/1 3/5 2/3 1/1 0/1 0/1 1/1 Backpropagation
Monte-Carlo Tree Search
4/9 1/3 0/1 1/1 0/1 3/5 2/3 1/1 0/1 1/2 1/1 Backpropagation
Monte-Carlo Tree Search
4/9 1/3 0/1 1/1 0/1 4/6 2/3 1/1 0/1 1/2 1/1 Backpropagation
Monte-Carlo Tree Search
5/10 1/3 0/1 1/1 0/1 4/6 2/3 1/1 0/1 1/2 1/1 Backpropagation
Parallelisation
- Improve MCTS: improve algorithm or increase playouts.
- Increasing number of playouts increases playing strength.
- Increase playouts: increase thinking time or playout rate.
- Parallelisation: use parallel hardware to increase playout
rate and therefore strength.
- Three parallelisation methods for MCTS: tree, leaf, and
root.
Tree Parallelisation
- Single shared tree.
- Well-suited to shared-memory
systems, such as multi-core systems.
Leaf Parallelisation
master: slaves:
- Master and slave nodes.
- Only one tree, on the master.
- Slaves are playout workers.
Root Parallelisation
- Each execution
node maintains a tree.
- Each node performs
MCTS.
- Periodic sharing of
information.
Parallel Effect
- Strength penalty for parallelisation.
- Due to change from sequential to parallel execution.
- More pronounced if the playout updates are delayed, for
example in root vs. multi-core parallelisation.
Implementation
- Oakfoam is an open-source
cross-platform MCTS engine for Computer Go.
- Tree parallelisation for
multi-core systems.
- Root parallelisation for cluster
systems.
Testing and Results
- Test for playout rate increase.
- If increase found, test for strength penalty.
- If strength penalty found, test for overall strength increase.
Multi-Core Parallelisation Results
1 2 4 8 1 2 4 8 Cores Speedup Ideal No additions Virtual Loss Lock-free Both additions
Speedup on 9x9
1 2 4 8 1 2 4 8 Cores Speedup Ideal No additions Virtual Loss Lock-free Both additions
Speedup on 19x19
Cluster Parallelisation Results
1 2 4 8 16 50 60 70 80 90 100 Cores/Periods Winrate vs. 1-Core [%]
Baseline 10s/move 10s/move p = 0.1 10s/move p = 0.2 10s/move p = 0.05 2s/move p = 0.1 2s/move p = 0.2 2s/move p = 0.05
Strength Comparison on 9x9
1 2 4 8 16 32 64 50 60 70 80 90 100 Cores/Periods Winrate vs. 1-Core [%] Baseline 10s/move 10s/move p = 0.1 2s/move p = 0.1
Strength Comparison on 19x19
Overview of Results
- Multi-core: tree parallelisation showed linear scaling up to
eight cores (physical limit in these tests).
- Cluster: root parallelisation for 19x19 showed scaling up to
eight nodes, where it had a four-core ideal strength improvement.
New Developments
- Pachi uses virtual wins and losses to improve cluster
scaling.
- Depth-First UCT changes MCTS from a best-first to a
depth-first search.
- Distributed UCT, and Distributed Depth-First UCT use
Transposition-table Driven Scheduling to break up the tree across nodes.
- UCT-Treesplit uses Transposition-table Driven Scheduling
to break up the MCTS work across nodes.
- Only virtual wins and losses applied to Computer Go so far.
Conclusions
- MCTS is dominant algorithm for Computer Go.
- Parallelisation on multi-core systems scales well.
- Parallelisation on cluster systems possible, but still room
for improvement.
- Future of cluster parallelisation holds possibilities.