The Gizmo Player Simon Doll Jan Kopcsek Alper Tunga Dresden, - - PowerPoint PPT Presentation

▶

May 31, 2023 370 likes •615 views

Fakulttsname Informatik Fachrichtung Informatik Institutsname Intelligente Systeme The Gizmo Player Simon Doll Jan Kopcsek Alper Tunga Dresden, 13.02.2008 Finding a heuristic function Two ways for learning a heuristic function:

SLIDE 1

The Gizmo Player

Fakultätsname Informatik Fachrichtung Informatik Institutsname Intelligente Systeme

Dresden, 13.02.2008

Simon Dollé Jan Kopcsek Alper Tunga

SLIDE 2

TU Dresden, 13.02.2008 Gizmo Player Slide 2 of 10

Finding a heuristic function

Two ways for learning a heuristic function:

Deductive

– Analyzing the rules – Identify common elements like game boards or pieces – Finding patterns

Inductive

– Playing and learning from experience – Monte Carlo strategy

SLIDE 3

Play random games
Compute the means of scores

for each move

Use them as a

heuristic function

TU Dresden, 13.02.2008 Gizmo Player Slide 3 of 10

Monte Carlo Strategy

SLIDE 4

Play random games
Compute the means of scores

for each move

Use them as a

heuristic function

TU Dresden, 13.02.2008 Gizmo Player Slide 3 of 10

Monte Carlo Strategy

SLIDE 5

Play random games
Compute the means of scores

for each move

Use them as a

heuristic function

TU Dresden, 13.02.2008 Gizmo Player Slide 3 of 10

Monte Carlo Strategy

SLIDE 6

Play random games
Compute the means of scores

for each move

Use them as a

heuristic function

TU Dresden, 13.02.2008 Gizmo Player Slide 3 of 10

Monte Carlo Strategy

SLIDE 7

Play random games
Compute the means of scores

for each move

Use them as a

heuristic function

TU Dresden, 13.02.2008 Gizmo Player Slide 3 of 10

Monte Carlo Strategy

SLIDE 8

TU Dresden, 13.02.2008 Gizmo Player Slide 4 of 10

Monte Carlo Strategy

Problem:
Same effort spend on interesting moves and uninteresting moves
Equivalent to play against a dummy player
UCT Algorithm (Upper Confidence Bound for Trees):
An algorithm to balance:
Exploration of interesting parts of the graph
Exploration of new parts
Make random games more realistic

SLIDE 9

As long as there are unexplored

moves from our current state, explore them

TU Dresden, 13.02.2008 Gizmo Player Slide 5 of 10

UCT Algorithm

SLIDE 10

TU Dresden, 13.02.2008 Gizmo Player Slide 5 of 10

UCT Algorithm

As long as there are unexplored

moves from our current state, explore them

SLIDE 11

TU Dresden, 13.02.2008 Gizmo Player Slide 5 of 10

UCT Algorithm

As long as there are unexplored

moves from our current state, explore them

SLIDE 12

TU Dresden, 13.02.2008 Gizmo Player Slide 5 of 10

UCT Algorithm

As long as there are unexplored

moves from our current state, explore them

SLIDE 13

TU Dresden, 13.02.2008 Gizmo Player Slide 5 of 10

UCT Algorithm

As long as there are unexplored

moves from our current state, explore them

SLIDE 14

TU Dresden, 13.02.2008 Gizmo Player Slide 5 of 10

UCT Algorithm

As long as there are unexplored

moves from our current state, explore them

SLIDE 15

As long as there are unexplored

moves from our current state, explore them

Otherwise, choose the one with

the highest score using

TU Dresden, 13.02.2008 Gizmo Player Slide 6 of 10 h : the heuristic value n : the number of games through the parent node ni : the number of games through the node

UCT Algorithm

SLIDE 16

TU Dresden, 13.02.2008 Gizmo Player Slide 6 of 10

UCT Algorithm

As long as there are unexplored

moves from our current state, explore them

Otherwise, choose the one with

the highest score using

h : the heuristic value n : the number of games through the parent node ni : the number of games through the node

SLIDE 17

TU Dresden, 13.02.2008 Gizmo Player Slide 6 of 10

UCT Algorithm

As long as there are unexplored

moves from our current state, explore them

Otherwise, choose the one with

the highest score using

h : the heuristic value n : the number of games through the parent node ni : the number of games through the node

SLIDE 18

TU Dresden, 13.02.2008 Gizmo Player Slide 6 of 10

UCT Algorithm

As long as there are unexplored

moves from our current state, explore them

Otherwise, choose the one with

the highest score using

h : the heuristic value n : the number of games through the parent node ni : the number of games through the node

SLIDE 19

TU Dresden, 13.02.2008 Gizmo Player Slide 6 of 10

UCT Algorithm

As long as there are unexplored

moves from our current state, explore them

Otherwise, choose the one with

the highest score using

h : the heuristic value n : the number of games through the parent node ni : the number of games through the node

SLIDE 20

TU Dresden, 13.02.2008 Gizmo Player Slide 7 of 10

UCT Algorithm

Which move to play?
The one with the highest

heuristic value

In multiplayer games:
Store the heuristic value

for each player

SLIDE 21

TU Dresden, 13.02.2008 Gizmo Player Slide 8 of 10

Good points

Heuristic directly linked to the final score
Heuristic converges to min-max values
Time scalable
Easily parallelisable

SLIDE 22

TU Dresden, 13.02.2008 Gizmo Player Slide 9 of 10

Problems

Simultaneous moves:

– What rule to choose to explore the nodes? – Which move to play?

Long games and loops:

– Depth first search problem

SLIDE 23

TU Dresden, 13.02.2008 Gizmo Player Slide 10 of 10

The Gizmo Player

Dresden, 13.02.2008

Finding a heuristic function

Monte Carlo Strategy

Monte Carlo Strategy

Monte Carlo Strategy

Monte Carlo Strategy

Monte Carlo Strategy

Monte Carlo Strategy

UCT Algorithm

UCT Algorithm

UCT Algorithm

UCT Algorithm

UCT Algorithm

UCT Algorithm

UCT Algorithm

UCT Algorithm

UCT Algorithm

UCT Algorithm

UCT Algorithm

UCT Algorithm

Good points

Problems

Thank you for your attention And good luck to your players