Minimax strategies, alpha beta pruning Lirong Xia How to find good - - PowerPoint PPT Presentation

minimax strategies alpha beta pruning
SMART_READER_LITE
LIVE PREVIEW

Minimax strategies, alpha beta pruning Lirong Xia How to find good - - PowerPoint PPT Presentation

Minimax strategies, alpha beta pruning Lirong Xia How to find good heuristics? No really mechanical way art more than science General guideline: relaxing constraints e.g. Pacman can pass through the walls Mimic what you would do 1


slide-1
SLIDE 1

Lirong Xia

Minimax strategies, alpha beta pruning

slide-2
SLIDE 2

ØNo really mechanical way

§ art more than science

ØGeneral guideline: relaxing constraints

§ e.g. Pacman can pass through the walls

ØMimic what you would do

1

How to find good heuristics?

slide-3
SLIDE 3

Arc Consistency of a CSP

2

Ø A simple form of propagation makes sure all arcs are consistent: Ø If V loses a value, neighbors of V need to be rechecked! Ø Arc consistency detects failure earlier than forward checking Ø Can be run as a preprocessor or after each assignment Ø Might be time-consuming

Delete from tail! X X X

slide-4
SLIDE 4

Limitations of Arc Consistency

3

ØAfter running arc consistency:

§ Can have one solution left § Can have multiple solutions left § Can have no solutions left (and not know it)

slide-5
SLIDE 5

“Sum to 2” game

Ø Player 1 moves, then player 2, finally player 1 again Ø Move = 0 or 1 Ø Player 1 wins if and only if all moves together sum to 2

Player 1 Player 2 Player 2 Player 1

  • 1

Player 1 Player 1 Player 1

1 1 1 1 1 1 1

  • 1
  • 1

1

  • 1

1 1

  • 1

Player 1’s utility is in the leaves; player 2’s utility is the negative of this

slide-6
SLIDE 6

ØAdversarial game ØMinimax search ØAlpha-beta pruning algorithm

5

Today’s schedule

slide-7
SLIDE 7

Adversarial Games

6

Ø Deterministic, zero-sum games:

§ Tic-tac-toe, chess, checkers § The MAX player maximizes result § The MIN player minimizes result

Ø Minimax search:

§ A search tree § Players alternate turns § Each node has a minimax value: best achievable utility against a rational adversary

slide-8
SLIDE 8

Computing Minimax Values

7

Ø This is DFS Ø Two recursive functions:

§ max-value maxes the values of successors § min-value mins the values of successors

Ø Def value (state):

If the state is a terminal state: return the state’s utility If the agent at the state is MAX: return max-value(state) If the agent at the state is MIN: return min-value(state)

Ø Def max-value(state):

Initialize max = -∞ For each successor of state: Compute value(successor) Update max accordingly return max

Ø Def min-value(state): similar to max-value

slide-9
SLIDE 9

Minimax Example

8

3 2 2 3

slide-10
SLIDE 10

Tic-tac-toe Game Tree

9

slide-11
SLIDE 11

10

Renju

  • 15*15
  • 5 horizontal, vertical, or

diagonal in a row win

  • no double-3 or double-4

moves for black

  • otherwise black’s winning

strategy was computed

– L. Victor Allis 1994 (PhD thesis)

slide-12
SLIDE 12

Minimax Properties

11

Ø Time complexity?

§

Ø Space complexity?

§

Ø For chess,

§ Exact solution is completely infeasible § But, do we need to explore the whole tree?

( )

m

O b

( )

O bm

35, 100 b m » »

slide-13
SLIDE 13

Resource Limits

12

Ø Cannot search to leaves Ø Depth-limited search

§ Instead, search a limited depth of tree § Replace terminal utilities with an evaluation function for non-terminal positions

Ø Guarantee of optimal play is gone

slide-14
SLIDE 14

Evaluation Functions

13

Ø Functions which scores non-terminals Ø Ideal function: returns the minimax utility of the position Ø In practice: typically weighted linear sum of features: Ø e.g. , etc.

Evals s

( ) = w1 f1 s ( )+ w2 f2 s ( )++ wn fn s ( )

( ) ( )

1

# white queens - # black queens f s =

slide-15
SLIDE 15

ØSuppose you are the MAX player ØGiven a depth d and current state ØCompute value(state, d) that reaches depth d

§ at depth d, use a evaluation function to estimate the value if it is non-terminal

14

Minimax with limited depth

slide-16
SLIDE 16

15

Improving minimax: pruning

slide-17
SLIDE 17

Pruning in Minimax Search

16

ØAn ancestor is a MAX node

§ already has an option than my current solution § my future solution can only be smaller

slide-18
SLIDE 18

Alpha-beta pruning

ØPruning = cutting off parts of the search tree (because you realize you don’t need to look at them)

§ When we considered A* we also pruned large parts of the search tree

ØMaintain

§ α = value of the best option for the MAX player along the path so far § β = value of the best option for the MIN player along the path so far § Initialized to be α = -∞ and β = +∞

ØMaintain and update α and β for each node

§ α is updated at MAX player’s nodes § β is updated at MIN player’s nodes

slide-19
SLIDE 19

Alpha-Beta Pruning

18

Ø General configuration

§ We’re computing the MIN-VALUE at n § We’re looping over n’s children § n’s value estimate is dropping § α is the best value that MAX can get at any choice point along the current path § If n becomes worse than α, MAX will avoid it, so can stop considering n’s other children § Define β similarly for MIN § α is usually smaller than β

  • Once α >= β, return to the upper

layer

slide-20
SLIDE 20

Alpha-Beta Pruning Example

19

is MAX’s best alternative here or above is MIN’s best alternative here or above

a

b

slide-21
SLIDE 21

Alpha-Beta Pruning Example

20

is MAX’s best alternative here or above is MIN’s best alternative here or above

a

b

starting / a b raising a raising a lowering b

  • +

a b = ¥ = ¥

  • +

a b = ¥ = ¥

  • +

a b = ¥ = ¥ 3 + a b = = ¥ 3 + a b = = ¥

  • +

a b = ¥ = ¥

  • 3

a b = ¥ =

  • 3

a b = ¥ =

  • 3

a b = ¥ =

  • 3

a b = ¥ = 8 3 a b = = 3 + a b = = ¥ 3 2 a b = = 3 + a b = = ¥ 3 14 a b = = 3 5 a b = = 3 1 a b = =

slide-22
SLIDE 22

Alpha-Beta Pseudocode

21

slide-23
SLIDE 23

Alpha-Beta Pruning Properties

22

Ø This pruning has no effect on final result at the root Ø Values of intermediate nodes might be wrong!

§ Important: children of the root may have the wrong value

Ø Good children ordering improves effectiveness of pruning Ø With “perfect ordering”:

§ Time complexity drops to O(bm/2) § Doubles solvable depth! § Your action looks smarter: more forward-looking with good evaluation function § Full search of, e.g. chess, is still hopeless…

slide-24
SLIDE 24

ØQ1: write an evaluation function for (state,action) pairs

§ the evaluation function is for this question only

ØQ2: minimax search with arbitrary depth and multiple MIN players (ghosts)

§ evaluation function on states has been implemented for you

ØQ3: alpha-beta pruning with arbitrary depth and multiple MIN players (ghosts)

23

Project 2