[PPT] - Minimax strategies, alpha beta pruning Lirong Xia How to find good PowerPoint Presentation

SLIDE 1

Lirong Xia

Minimax strategies, alpha beta pruning

SLIDE 2

ØNo really mechanical way

§ art more than science

ØGeneral guideline: relaxing constraints

§ e.g. Pacman can pass through the walls

ØMimic what you would do

1

How to find good heuristics?

SLIDE 3

Arc Consistency of a CSP

2

Ø A simple form of propagation makes sure all arcs are consistent: Ø If V loses a value, neighbors of V need to be rechecked! Ø Arc consistency detects failure earlier than forward checking Ø Can be run as a preprocessor or after each assignment Ø Might be time-consuming

Delete from tail! X X X

SLIDE 4

Limitations of Arc Consistency

3

ØAfter running arc consistency:

§ Can have one solution left § Can have multiple solutions left § Can have no solutions left (and not know it)

SLIDE 5

“Sum to 2” game

Ø Player 1 moves, then player 2, finally player 1 again Ø Move = 0 or 1 Ø Player 1 wins if and only if all moves together sum to 2

Player 1 Player 2 Player 2 Player 1

1

Player 1 Player 1 Player 1

1 1 1 1 1 1 1

1
1

1

1

1 1

1

Player 1’s utility is in the leaves; player 2’s utility is the negative of this

SLIDE 6

ØAdversarial game ØMinimax search ØAlpha-beta pruning algorithm

5

Today’s schedule

SLIDE 7

Adversarial Games

6

Ø Deterministic, zero-sum games:

§ Tic-tac-toe, chess, checkers § The MAX player maximizes result § The MIN player minimizes result

Ø Minimax search:

§ A search tree § Players alternate turns § Each node has a minimax value: best achievable utility against a rational adversary

SLIDE 8

Computing Minimax Values

7

Ø This is DFS Ø Two recursive functions:

§ max-value maxes the values of successors § min-value mins the values of successors

Ø Def value (state):

If the state is a terminal state: return the state’s utility If the agent at the state is MAX: return max-value(state) If the agent at the state is MIN: return min-value(state)

Ø Def max-value(state):

Initialize max = -∞ For each successor of state: Compute value(successor) Update max accordingly return max

Ø Def min-value(state): similar to max-value

SLIDE 9

Minimax Example

8

3 2 2 3

SLIDE 10

Tic-tac-toe Game Tree

9

SLIDE 11

10

Renju

15*15
5 horizontal, vertical, or

diagonal in a row win

no double-3 or double-4

moves for black

otherwise black’s winning

strategy was computed

– L. Victor Allis 1994 (PhD thesis)

SLIDE 12

Minimax Properties

11

Ø Time complexity?

§

Ø Space complexity?

§

Ø For chess,

§ Exact solution is completely infeasible § But, do we need to explore the whole tree?

( )

m

O b

( )

O bm

35, 100 b m » »

SLIDE 13

Resource Limits

12

Ø Cannot search to leaves Ø Depth-limited search

§ Instead, search a limited depth of tree § Replace terminal utilities with an evaluation function for non-terminal positions

Ø Guarantee of optimal play is gone

SLIDE 14

Evaluation Functions

13

Ø Functions which scores non-terminals Ø Ideal function: returns the minimax utility of the position Ø In practice: typically weighted linear sum of features: Ø e.g. , etc.

Evals s

( ) = w1 f1 s ( )+ w2 f2 s ( )++ wn fn s ( )

( ) ( )

1

# white queens - # black queens f s =

SLIDE 15

ØSuppose you are the MAX player ØGiven a depth d and current state ØCompute value(state, d) that reaches depth d

§ at depth d, use a evaluation function to estimate the value if it is non-terminal

14

Minimax with limited depth

SLIDE 16

15

Improving minimax: pruning

SLIDE 17

Pruning in Minimax Search

16

ØAn ancestor is a MAX node

§ already has an option than my current solution § my future solution can only be smaller

SLIDE 18

Alpha-beta pruning

ØPruning = cutting off parts of the search tree (because you realize you don’t need to look at them)

§ When we considered A* we also pruned large parts of the search tree

ØMaintain

§ α = value of the best option for the MAX player along the path so far § β = value of the best option for the MIN player along the path so far § Initialized to be α = -∞ and β = +∞

ØMaintain and update α and β for each node

§ α is updated at MAX player’s nodes § β is updated at MIN player’s nodes

SLIDE 19

Alpha-Beta Pruning

18

Ø General configuration

§ We’re computing the MIN-VALUE at n § We’re looping over n’s children § n’s value estimate is dropping § α is the best value that MAX can get at any choice point along the current path § If n becomes worse than α, MAX will avoid it, so can stop considering n’s other children § Define β similarly for MIN § α is usually smaller than β

Once α >= β, return to the upper

layer

SLIDE 20

Alpha-Beta Pruning Example

19

is MAX’s best alternative here or above is MIN’s best alternative here or above

a

b

SLIDE 21

Alpha-Beta Pruning Example

20

is MAX’s best alternative here or above is MIN’s best alternative here or above

a

b

starting / a b raising a raising a lowering b

+

a b = ¥ = ¥

+

a b = ¥ = ¥

+

a b = ¥ = ¥ 3 + a b = = ¥ 3 + a b = = ¥

+

a b = ¥ = ¥

3

a b = ¥ =

3

a b = ¥ =

3

a b = ¥ =

3

a b = ¥ = 8 3 a b = = 3 + a b = = ¥ 3 2 a b = = 3 + a b = = ¥ 3 14 a b = = 3 5 a b = = 3 1 a b = =

SLIDE 22

Alpha-Beta Pseudocode

21

SLIDE 23

Alpha-Beta Pruning Properties

22

Ø This pruning has no effect on final result at the root Ø Values of intermediate nodes might be wrong!

§ Important: children of the root may have the wrong value

Ø Good children ordering improves effectiveness of pruning Ø With “perfect ordering”:

§ Time complexity drops to O(bm/2) § Doubles solvable depth! § Your action looks smarter: more forward-looking with good evaluation function § Full search of, e.g. chess, is still hopeless…

Lirong Xia

Minimax strategies, alpha beta pruning

ØNo really mechanical way

§ art more than science

ØGeneral guideline: relaxing constraints

§ e.g. Pacman can pass through the walls

ØMimic what you would do

How to find good heuristics?

Arc Consistency of a CSP

Limitations of Arc Consistency

ØAfter running arc consistency:

§ Can have one solution left § Can have multiple solutions left § Can have no solutions left (and not know it)

“Sum to 2” game

Ø Player 1 moves, then player 2, finally player 1 again Ø Move = 0 or 1 Ø Player 1 wins if and only if all moves together sum to 2

1

1 1

ØAdversarial game ØMinimax search ØAlpha-beta pruning algorithm

Today’s schedule

Adversarial Games

Ø Deterministic, zero-sum games:

Ø Minimax search:

Computing Minimax Values

Ø This is DFS Ø Two recursive functions:

Ø Def value (state):

Ø Def max-value(state):

Ø Def min-value(state): similar to max-value

Minimax Example

3 2 2 3

Tic-tac-toe Game Tree

Renju

diagonal in a row win

moves for black

strategy was computed

– L. Victor Allis 1994 (PhD thesis)

Minimax Properties

Ø Time complexity?

Ø Space complexity?

Ø For chess,

( )

O b

( )

O bm

35, 100 b m » »

Resource Limits

Ø Cannot search to leaves Ø Depth-limited search

Ø Guarantee of optimal play is gone

Evaluation Functions

Ø Functions which scores non-terminals Ø Ideal function: returns the minimax utility of the position Ø In practice: typically weighted linear sum of features: Ø e.g. , etc.

Evals s

( ) = w1 f1 s ( )+ w2 f2 s ( )++ wn fn s ( )

( ) ( )

# white queens - # black queens f s =

ØSuppose you are the MAX player ØGiven a depth d and current state ØCompute value(state, d) that reaches depth d

§ at depth d, use a evaluation function to estimate the value if it is non-terminal

Minimax with limited depth

Improving minimax: pruning

Pruning in Minimax Search

ØAn ancestor is a MAX node

§ already has an option than my current solution § my future solution can only be smaller

Alpha-beta pruning

ØPruning = cutting off parts of the search tree (because you realize you don’t need to look at them)

§ When we considered A* we also pruned large parts of the search tree

ØMaintain

§ α = value of the best option for the MAX player along the path so far § β = value of the best option for the MIN player along the path so far § Initialized to be α = -∞ and β = +∞

ØMaintain and update α and β for each node

§ α is updated at MAX player’s nodes § β is updated at MIN player’s nodes

Alpha-Beta Pruning

Ø General configuration

Alpha-Beta Pruning Example

a

b

Alpha-Beta Pruning Example

a

b

starting / a b raising a raising a lowering b

Alpha-Beta Pseudocode

Alpha-Beta Pruning Properties

Ø This pruning has no effect on final result at the root Ø Values of intermediate nodes might be wrong!

Ø Good children ordering improves effectiveness of pruning Ø With “perfect ordering”:

ØQ1: write an evaluation function for (state,action) pairs