Foundations of Artificial Intelligence
- 44. Monte-Carlo Tree Search: Advanced Topics
Malte Helmert and Gabriele R¨
- ger
University of Basel
Foundations of Artificial Intelligence 44. Monte-Carlo Tree Search: - - PowerPoint PPT Presentation
Foundations of Artificial Intelligence 44. Monte-Carlo Tree Search: Advanced Topics Malte Helmert and Gabriele R oger University of Basel May 22, 2017 Optimality Tree Policy Other Techniques Summary Board Games: Overview chapter
University of Basel
Optimality Tree Policy Other Techniques Summary
Optimality Tree Policy Other Techniques Summary
Optimality Tree Policy Other Techniques Summary
selection: traverse tree expansion: grow tree simulation: play game to final position backpropagation: update utility estimates
Optimality Tree Policy Other Techniques Summary
2 2 1
Optimality Tree Policy Other Techniques Summary
Optimality Tree Policy Other Techniques Summary
Optimality Tree Policy Other Techniques Summary
every position is expanded eventually and visited infinitely often (given that the game tree is finite) after a finite number of iterations, only true utility values are used in backups
the probability that the optimal move is selected converges to 1 in the limit, backups based on iterations where only an optimal policy is followed dominate suboptimal backups
Optimality Tree Policy Other Techniques Summary
Optimality Tree Policy Other Techniques Summary
Optimality Tree Policy Other Techniques Summary
Optimality Tree Policy Other Techniques Summary
3 5
Optimality Tree Policy Other Techniques Summary
ε = 0.2
2.7 2.3 2.8
2 3.5 10 1
Optimality Tree Policy Other Techniques Summary
k
Optimality Tree Policy Other Techniques Summary
50 49
Optimality Tree Policy Other Techniques Summary
ˆ Q(n) τ
Optimality Tree Policy Other Techniques Summary
50 49
Optimality Tree Policy Other Techniques Summary
Optimality Tree Policy Other Techniques Summary
a1 a2 a3 ˆ Qk P
Optimality Tree Policy Other Techniques Summary
a1 a2 a3 ˆ Qk P a1 a2 a3 ˆ Qk+1 P
Optimality Tree Policy Other Techniques Summary
Optimality Tree Policy Other Techniques Summary
Optimality Tree Policy Other Techniques Summary
N(n′)
gives the probability that a sampled value (here: ˆ Q(n′)) is far from its true expected value (here: Q∗(n′)) in dependence of the number of samples (here: (N(n′))
Optimality Tree Policy Other Techniques Summary
Optimality Tree Policy Other Techniques Summary
Optimality Tree Policy Other Techniques Summary
Optimality Tree Policy Other Techniques Summary
Optimality Tree Policy Other Techniques Summary
Optimality Tree Policy Other Techniques Summary
Optimality Tree Policy Other Techniques Summary
Optimality Tree Policy Other Techniques Summary
Optimality Tree Policy Other Techniques Summary
Optimality Tree Policy Other Techniques Summary
allows deep lookaheads needs more memory needs initial utility estimate for all children
Optimality Tree Policy Other Techniques Summary
Optimality Tree Policy Other Techniques Summary
ǫ-greedy favors the greedy move and treats all other equally Boltzmann exploration selects moves proportionally to their utility estimates UCB1 favors moves that were successful in the past or have been explored rarely