[PPT] - Adversarial Search CE417: Introduction to Artificial Intelligence PowerPoint Presentation

SLIDE 1

Soleymani

Adversarial Search

CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

“Artificial Intelligence: A Modern Approach”, 3rd Edition, Chapter 5 Most slides have been adopted from Klein and Abdeel, CS188, UC Berkeley.

SLIDE 2

Outline

 Game as a search problem  Minimax algorithm  𝛽-𝛾 Pruning: ignoring a portion of the search tree  Time limit problem

 Cut off & Evaluation function

2

SLIDE 3

Games as search problems

 Games

 Adversarial search problems (goals are in conflict)

 Competitive multi-agent environments

 Games in AI are a specialized kind of games (in the game

theory)

3

SLIDE 4

Adversarial Games

4

SLIDE 5

 Many different kinds of games!  Axes:

 Deterministic or stochastic?  One, two, or more players?  Zero sum?  Perfect information (can you see the state)?

 Want algorithms for calculating a strategy (policy) which recommends a

move from each state

Types of Games

5

SLIDE 6

Zero-Sum Games

 Zero-Sum Games



Agents have opposite utilities (values on outcomes)



Lets us think of a single value that one maximizes and the other minimizes



Adversarial, pure competition

 General Games



Agents have independent utilities (values on outcomes)



Cooperation, indifference, competition, and more are all possible



More later on non-zero-sum games

6

SLIDE 7

Primary assumptions

 We start with these games:

 T

wo-player

 Turn taking

 agents act alternately

 Zero-sum

 agents’ goals are in conflict: sum of utility values at the end of the

game is zero or constant

 Deterministic  Perfect information

 fully observable

7

Examples:

Tic-tac-toe, chess, checkers

SLIDE 8

Deterministic Games

 Many possible formalizations, one is:

 States: S (start at s0)  Players: P={1...N} (usually take turns)  Actions:A (may depend on player / state)  Transition Function: SxA  S  TerminalTest: S  {t,f}  Terminal Utilities: SxP  R

 Solution for a player is a policy: S  A

8

SLIDE 9

Single-Agent Trees

8 2 2 6 4 6 … …

9

SLIDE 10

Value of a State

Non-Terminal States: 8 2 2 6 4 6 … … Terminal States: Value of a state: The best achievable outcome (utility) from that state

10

SLIDE 11

Adversarial Search

11

SLIDE 12

Adversarial Game Trees

20
8
18
5
10

+4 … …

20

+8

12

SLIDE 13

Minimax Values

+8

10
5
8

States Under Agent’s Control: Terminal States: States Under Opponent’s Control:

13

SLIDE 14

Tic-Tac-Toe Game Tree

14

SLIDE 15

Game tree (tic-tac-toe)

15

 Two players: 𝑄

1 and 𝑄2 (𝑄 1 is now searching to find a good move)

 Zero-sum games: 𝑄

1 gets 𝑉(𝑢), 𝑄2 gets 𝐷 − 𝑉(𝑢) for terminal node 𝑢

Utilities from the point of view of 𝑄

1

1-ply = half move 𝑄

1

𝑄2 𝑄

1

𝑄2 𝑄

1: 𝑌

𝑄2: 𝑃

SLIDE 16

Optimal play

16

 Opponent is assumed optimal  Minimax function is used to find the utility of each state.

 MAX/MIN wants to maximize/minimize the terminal payoff

MAX gets 𝑉(𝑢) for terminal node 𝑢

SLIDE 17

Adversarial Search (Minimax)

 Minimax search:

 A state-space search tree  Players alternate turns  Compute each node’s minimax value:

the best achievable utility against a rational (optimal) adversary

8 2 5 6 max min 2 5 5 Terminal values: part of the game

17

SLIDE 18

Minimax

 𝑁𝐽𝑂𝐽𝑁𝐵𝑌(𝑡) shows the best achievable outcome of being in

state 𝑡 (assumption: optimal opponent)

18

𝑁𝐽𝑂𝐽𝑁𝐵𝑌 𝑡 = 𝑉𝑈𝐽𝑀𝐽𝑈𝑍(𝑡, 𝑁𝐵𝑌) 𝑗𝑔 𝑈𝐹𝑆𝑁𝐽𝑂𝐵𝑀_𝑈𝐹𝑇𝑈(𝑡) 𝑛𝑏𝑦𝑏∈𝐵𝐷𝑈𝐽𝑃𝑂𝑇 𝑡 𝑁𝐽𝑂𝐽𝑁𝐵𝑌(𝑆𝐹𝑇𝑉𝑀𝑈(𝑡, 𝑏)) 𝑄𝑀𝐵𝑍𝐹𝑆 𝑡 = 𝑁𝐵𝑌 𝑛𝑗𝑜𝑏∈𝐵𝐷𝑈𝐽𝑃𝑂𝑇 𝑡 𝑁𝐽𝑂𝐽𝑁𝐵𝑌(𝑆𝐹𝑇𝑉𝑀𝑈 𝑡, 𝑏 ) 𝑄𝑀𝐵𝑍𝐹𝑆 𝑡 = 𝑁𝐽𝑂 Utility of being in state s 3 2 2 3

SLIDE 19

Minimax (Cont.)

19

 Optimal strategy: move to the state with highest minimax

value

 Best achievable payoff against best play

 Maximizes the worst-case outcome for MAX

 It works for zero-sum games

SLIDE 20

Minimax Properties

Optimal against a perfect player. Otherwise?

10 10 9 100 max min

20

SLIDE 21

Minimax Implementation

def min-value(state): initialize v = +∞ for each successor of state: v = min(v, max-value(successor)) return v def max-value(state): initialize v = -∞ for each successor of state: v = max(v, min-value(successor)) return v

21

SLIDE 22

Minimax Implementation (Dispatch)

def value(state): if the state is a terminal state: return the state’s utility if the next agent is MAX: return max-value(state) if the next agent is MIN: return min-value(state) def min-value(state): initialize v = +∞ for each successor of state: v = min(v, value(successor)) return v def max-value(state): initialize v = -∞ for each successor of state: v = max(v, value(successor)) return v

22

SLIDE 23

Minimax algorithm

23

Depth first search function 𝑁𝐽𝑂𝐽𝑁𝐵𝑌_𝐸𝐹𝐷𝐽𝑇𝐽𝑃𝑂(𝑡𝑢𝑏𝑢𝑓) returns 𝑏𝑜 𝑏𝑑𝑢𝑗𝑝𝑜 return max

𝑏∈𝐵𝐷𝑈𝐽𝑃𝑂𝑇(𝑡𝑢𝑏𝑢𝑓) 𝑁𝐽𝑂_𝑊𝐵𝑀𝑉𝐹(𝑆𝐹𝑇𝑉𝑀𝑈(𝑡𝑢𝑏𝑢𝑓, 𝑏))

function 𝑁𝐵𝑌_𝑊𝐵𝑀𝑉𝐹(𝑡𝑢𝑏𝑢𝑓) returns 𝑏 𝑣𝑢𝑗𝑚𝑗𝑢𝑧 𝑤𝑏𝑚𝑣𝑓 if 𝑈𝐹𝑆𝑁𝐽𝑂𝐵𝑀_𝑈𝐹𝑇𝑈(𝑡𝑢𝑏𝑢𝑓) then return 𝑉𝑈𝐽𝑀𝐽𝑈𝑍(𝑡𝑢𝑏𝑢𝑓) 𝑤 ← −∞ for each 𝑏 in 𝐵𝐷𝑈𝐽𝑃𝑂𝑇(𝑡𝑢𝑏𝑢𝑓) do 𝑤 ← 𝑁𝐵𝑌(𝑤, 𝑁𝐽𝑂_𝑊𝐵𝑀𝑉𝐹(𝑆𝐹𝑇𝑉𝑀𝑈𝑇(𝑡𝑢𝑏𝑢𝑓, 𝑏))) return 𝑤 function 𝑁𝐽𝑂_𝑊𝐵𝑀𝑉𝐹(𝑡𝑢𝑏𝑢𝑓) returns 𝑏 𝑣𝑢𝑗𝑚𝑗𝑢𝑧 𝑤𝑏𝑚𝑣𝑓 if 𝑈𝐹𝑆𝑁𝐽𝑂𝐵𝑀_𝑈𝐹𝑇𝑈(𝑡𝑢𝑏𝑢𝑓) then return 𝑉𝑈𝐽𝑀𝐽𝑈𝑍(𝑡𝑢𝑏𝑢𝑓) 𝑤 ← ∞ for each 𝑏 in 𝐵𝐷𝑈𝐽𝑃𝑂𝑇(𝑡𝑢𝑏𝑢𝑓) do 𝑤 ← 𝑁𝐽𝑂(𝑤, 𝑁𝐵𝑌_𝑊𝐵𝑀𝑉𝐹(𝑆𝐹𝑇𝑉𝑀𝑈𝑇(𝑡𝑢𝑏𝑢𝑓, 𝑏))) return 𝑤

SLIDE 24

Properties of minimax

 Complete?Yes (when tree is finite)  Optimal?Yes (against an optimal opponent)  Time complexity: 𝑃(𝑐𝑛)  Space complexity: 𝑃(𝑐𝑛) (depth-first exploration)  For chess, 𝑐 ≈ 35, 𝑛 > 50 for reasonable games

 Finding exact solution is completely infeasible

24

SLIDE 25

Game Tree Pruning

25

SLIDE 26

Pruning

26

 Correct minimax decision without looking at every node

in the game tree

 α-β pruning

 Branch & bound algorithm  Prunes away branches that cannot influence the final decision

SLIDE 27

α-β pruning example

27

SLIDE 28

α-β pruning example

28

SLIDE 29

α-β pruning example

29

SLIDE 30

α-β pruning example

30

SLIDE 31

α-β pruning example

31

SLIDE 32

α-β pruning

32

 Assuming depth-first generation of tree

 We prune node 𝑜 when player has a better choice 𝑛 at

(parent or) any ancestor of 𝑜

 Two types of pruning (cuts):

 pruning of max nodes (α-cuts)  pruning of min nodes (β-cuts)

SLIDE 33

Alpha-Beta Pruning

 General configuration (MIN version)



We’re computing the MIN-VALUE at some node n



We’re looping over n’s children



Who cares about n’s value? MAX



Let a be the best value that MAX can get at any choice point along the current path from the root



If n becomes worse than a, MAX will avoid it, so we can stop considering n’s other children (it’s already bad enough that it won’t be played)

 MAX version is symmetric

MAX MIN MAX MIN

a n

33

SLIDE 34

α-β pruning (an other example)

34

1 5 ≤2 3 ≥5 2 3

SLIDE 35

Why is it called α-β?

 α:

Value of the best (highest) choice found so far at any choice point along the path for MAX

 𝛾: Value of the best (lowest) choice found so far at any choice

point along the path for MIN

 Updating α and 𝛾 during the search process  For a MAX node once the value of this node is known to be more

than the current 𝛾 (v ≥ 𝛾), its remaining branches are pruned.

 For a MIN node once the value of this node is known to be less

than the current 𝛽 (v ≤ 𝛽), its remaining branches are pruned.

35

SLIDE 36

Alpha-Beta Implementation

def min-value(state , α, β): initialize v = +∞ for each successor of state: v = min(v, value(successor, α, β)) if v ≤ α return v β = min(β, v) return v def max-value(state, α, β): initialize v = -∞ for each successor of state: v = max(v, value(successor, α, β)) if v ≥ β return v α = max(α, v) return v α: MAX’s best option on path to root β: MIN’s best option on path to root

36

SLIDE 37

37

function 𝐵𝑀𝑄𝐼𝐵_𝐶𝐹𝑈𝐵_𝑇𝐹𝐵𝑆𝐷𝐼(𝑡𝑢𝑏𝑢𝑓) returns 𝑏𝑜 𝑏𝑑𝑢𝑗𝑝𝑜 𝑤 ← 𝑁𝐵𝑌_𝑊𝐵𝑀𝑉𝐹(𝑡𝑢𝑏𝑢𝑓, −∞, +∞) return the 𝑏𝑑𝑢𝑗𝑝𝑜 in 𝐵𝐷𝑈𝐽𝑃𝑂𝑇(𝑡𝑢𝑏𝑢𝑓) with value 𝑤 function 𝑁𝐵𝑌_𝑊𝐵𝑀𝑉𝐹(𝑡𝑢𝑏𝑢𝑓, 𝛽, 𝛾) returns 𝑏 𝑣𝑢𝑗𝑚𝑗𝑢𝑧 𝑤𝑏𝑚𝑣𝑓 if 𝑈𝐹𝑆𝑁𝐽𝑂𝐵𝑀_𝑈𝐹𝑇𝑈(𝑡𝑢𝑏𝑢𝑓) then return 𝑉𝑈𝐽𝑀𝐽𝑈𝑍(𝑡𝑢𝑏𝑢𝑓) 𝑤 ← −∞ for each 𝑏 in 𝐵𝐷𝑈𝐽𝑃𝑂𝑇(𝑡𝑢𝑏𝑢𝑓) do 𝑤 ← 𝑁𝐵𝑌(𝑤, 𝑁𝐽𝑂_𝑊𝐵𝑀𝑉𝐹(𝑆𝐹𝑇𝑉𝑀𝑈𝑇(𝑡𝑢𝑏𝑢𝑓, 𝑏), 𝛽, 𝛾)) if 𝑤 ≥ 𝛾 then return 𝑤 𝛽 ← 𝑁𝐵𝑌(𝛽, 𝑤) return 𝑤 function 𝑁𝐽𝑂_𝑊𝐵𝑀𝑉𝐹(𝑡𝑢𝑏𝑢𝑓, 𝛽, 𝛾) returns 𝑏 𝑣𝑢𝑗𝑚𝑗𝑢𝑧 𝑤𝑏𝑚𝑣𝑓 if 𝑈𝐹𝑆𝑁𝐽𝑂𝐵𝑀_𝑈𝐹𝑇𝑈(𝑡𝑢𝑏𝑢𝑓) then return 𝑉𝑈𝐽𝑀𝐽𝑈𝑍(𝑡𝑢𝑏𝑢𝑓) 𝑤 ← +∞ for each 𝑏 in 𝐵𝐷𝑈𝐽𝑃𝑂𝑇(𝑡𝑢𝑏𝑢𝑓) do 𝑤 ← 𝑁𝐽𝑂(𝑤, 𝑁𝐵𝑌_𝑊𝐵𝑀𝑉𝐹(𝑆𝐹𝑇𝑉𝑀𝑈𝑇(𝑡𝑢𝑏𝑢𝑓, 𝑏), 𝛽, 𝛾)) if 𝑤 ≤ 𝛽 then return 𝑤 𝛾 ← 𝑁𝐽𝑂(𝛾, 𝑤) return 𝑤

SLIDE 38

α-β progress

38

SLIDE 39

Order of moves

 Good move ordering improves effectiveness of pruning  Best order: time complexity is 𝑃(𝑐

𝑛 2) 39

?

SLIDE 40

Computational time limit (example)

40

 100 secs is allowed for each move (game rule)  104 nodes/sec (processor speed)  We can explore just 106 nodes for each move

 bm = 106, b=35 ⟹ m=4

 (4-ply look-ahead is a hopeless chess player!)

 - reaches about depth 8 – decent chess program

 It is still hopeless

SLIDE 41

Resource Limits

41

SLIDE 42

Resource Limits

 Problem: In realistic games, cannot search

to leaves!

 Solution: Depth-limited search

 Instead, search only to a limited depth in the tree  Replace terminal utilities with an evaluation

function for non-terminal positions

 Guarantee of optimal play is gone  More plies makes a BIG difference  Use iterative deepening for an anytime

algorithm

? ? ? ?

1
2

4 9 4 min max

2

4

42

SLIDE 43

Computational time limit: Solution

43

 We must make a decision even when finding the optimal

move is infeasible.

 Cut off the search and apply a heuristic evaluation

function

 cutoff test: turns non-terminal nodes into terminal leaves

 Cut off test instead of terminal test (e.g., depth limit)

 evaluation function: estimated desirability of a state

 Heuristic function evaluation instead of utility function

 This approach does not guarantee optimality.

SLIDE 44

Depth Matters

 Evaluation functions are always imperfect  The deeper in the tree the evaluation

function is buried, the less the quality of the evaluation function matters

 An important example of the tradeoff

between complexity

f

features and complexity of computation

44

SLIDE 45

Heuristic minimax

45

𝐼𝑁𝐽𝑂𝐽𝑁𝐵𝑌 𝑡,𝑒 = 𝐹𝑊𝐵𝑀(𝑡, 𝑁𝐵𝑌) 𝑗𝑔 𝐷𝑉𝑈𝑃𝐺𝐺_𝑈𝐹𝑇𝑈(𝑡, 𝑒) 𝑛𝑏𝑦𝑏∈𝐵𝐷𝑈𝐽𝑃𝑂𝑇 𝑡 𝐼_𝑁𝐽𝑂𝐽𝑁𝐵𝑌(𝑆𝐹𝑇𝑉𝑀𝑈 𝑡, 𝑏 , 𝑒 + 1) 𝑄𝑀𝐵𝑍𝐹𝑆 𝑡 = MAX 𝑛𝑗𝑜𝑏∈𝐵𝐷𝑈𝐽𝑃𝑂𝑇 𝑡 𝐼_𝑁𝐽𝑂𝐽𝑁𝐵𝑌(𝑆𝐹𝑇𝑉𝑀𝑈 𝑡, 𝑏 , 𝑒 + 1) 𝑄𝑀𝐵𝑍𝐹𝑆 𝑡 = MIN

SLIDE 46

Evaluation Functions

46

SLIDE 47

Evaluation functions

47

 For terminal states, it should order them in the same way

as the true utility function.

 For non-terminal states, it should be strongly correlated

with the actual chances of winning.

 It must not need high computational cost.

SLIDE 48

Evaluation functions based on features

48

 Example: features for evaluation of the chess states

 Number of each kind of piece: number of white pawns, black

pawns, white queens, black queens, etc

 King safety  Good pawn structure  …

SLIDE 49

Evaluation Functions

 Evaluation functions score non-terminals in depth-limited search  Ideal function: returns the actual minimax value of the position  In practice: typically weighted linear sum of features:  e.g. f1(s) = (num white queens – num black queens), etc.

49

SLIDE 50

Cutting off search: simple depth limit

50

 Problem1: non-quiescent positions

 Few more plies make big difference in

evaluation value

 Problem 2: horizon effect

 Delaying tactics against opponent’s move

that causes serious unavoidable damage (because of pushing the damage beyond the horizon that the player can see)

SLIDE 51

Evaluation for Pacman

[Demo: thrashing d=2, thrashing d=2 (fixed evaluation function), smart ghosts coordinate (L6D6,7,8,10)]

51

SLIDE 52

Why Pacman Starves

 A danger of replanning agents!



He knows his score will go up by eating the dot now (west, east)



He knows his score will go up just as much by eating the dot later (east, west)



There are no point-scoring opportunities after eating the dot (within the horizon, two here)



Therefore, waiting seems just as good as eating: he may go east, then back west in the next round of replanning!



A better evaluation function can solve this problem

52

SLIDE 53

Uncertain Outcomes

53

SLIDE 54

Worst-Case vs. Average Case (One player Game)

10 10 9 100 max min

Idea: Uncertain outcomes controlled by chance, not an adversary!

54

SLIDE 55

Expectimax Search (One Player Game)

 Why wouldn’t we know what the result of an action will be?

 Explicit randomness: rolling dice  Unpredictable opponents: the ghosts respond randomly  Actions can fail: when moving a robot, wheels might slip

 Values

should now reflect average-case (expectimax)

utcomes, not worst-case outcomes

 Expectimax search: compute the average score under

ptimal play

 Max nodes as in minimax search  Chance nodes are like min nodes but the outcome is uncertain  Calculate their expected utilities

 i.e. take weighted average (expectation) of children

 Later, we’ll learn how to formalize the underlying uncertain-

result problems as Markov Decision Processes

10 4 5 7 max chance 10 10 9 100

55

SLIDE 56

Probabilities

56

SLIDE 57

Reminder: Probabilities



A random variable represents an event whose outcome is unknown



A probability distribution is an assignment of weights to outcomes



Example:Traffic on freeway



Random variable:T = whether there’s traffic



Outcomes:T in {none, light, heavy}



Distribution: P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25



Some laws of probability (more later):



Probabilities are always non-negative



Probabilities over all possible outcomes sum to one



As we get more evidence, probabilities may change:



P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60



We’ll talk about methods for reasoning and updating probabilities later

0.25 0.50 0.25

57

SLIDE 58

 The expected value of a function of a random variable is the

average, weighted by the probability distribution

ver
utcomes

 Example: How long to get to the airport?

Reminder: Expectations

0.25 0.50 0.25 Probability: 20 min 30 min 60 min Time:

35 min

x x x

+ +

58

SLIDE 59

 In expectimax search, we have a probabilistic model

f how the opponent (or environment) will behave in

any state



Model could be a simple uniform distribution (roll a die)



Model could be sophisticated and require a great deal of computation



We have a chance node for any outcome out of our control:

pponent or environment



The model might say that adversarial actions are likely!

 For now, assume each chance node magically comes

along with probabilities that specify the distribution

ver its outcomes

What Probabilities to Use?

Having a probabilistic belief about another agent’s action does not mean that the agent is flipping any coins!

SLIDE 60

 In expectimax search, we have a probabilistic model

f how the opponent (or environment) will behave in

any state



Model could be a simple uniform distribution (roll a die)



Model could be sophisticated and require a great deal of computation



We have a chance node for any outcome out of our control:

pponent or environment



The model might say that adversarial actions are likely!

 For now, assume each chance node magically comes

along with probabilities that specify the distribution

ver its outcomes

What Probabilities to Use?

Having a probabilistic belief about another agent’s action does not mean that the agent is flipping any coins!

SLIDE 61

Expectimax Pseudocode: One Player

def value(state): if the state is a terminal state: return the state’s utility if the next agent is MAX: return max-value(state) if the next agent is EXP: return exp-value(state) def exp-value(state): initialize v = 0 for each successor of state: p = probability(successor) v += p * value(successor) return v def max-value(state): initialize v = -∞ for each successor of state: v = max(v, value(successor)) return v

61

SLIDE 62

Expectimax Pseudocode

def exp-value(state): initialize v = 0 for each successor of state: p = probability(successor) v += p * value(successor) return v 5 7 8 24

12

1/2 1/3 1/6

v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10

62

SLIDE 63

Expectimax Example

12 9 6 3 2 15 4 6

63

SLIDE 64

Expectimax Pruning?

12 9 3 2

64

SLIDE 65

Depth-Limited Expectimax

… … 492 362 … 400 300 Estimate of true expectimax value (which would require a lot of work to compute)

65

SLIDE 66

Modeling Assumptions

66

SLIDE 67

The Dangers of Optimism and Pessimism

Dangerous Optimism Assuming chance when the world is adversarial Dangerous Pessimism Assuming the worst case when it’s not likely

67

SLIDE 68

Assumptions vs. Reality

Adversarial Ghost Random Ghost Minimax Pacman Won 5/5

Avg. Score: 483

Won 5/5

Avg. Score: 493

Expectimax Pacman Won 1/5

Avg. Score: -303

Won 5/5

Avg. Score: 503

Results from playing 5 games Pacman used depth 4 search with an eval function that avoids trouble Ghost used depth 2 search with an eval function that seeks Pacman

68

SLIDE 69

Mixed Layer Types

 E.g. Backgammon  Expectiminimax

 Environment is an extra “random agent”

player that moves after each min/max agent

 Each

node computes the appropriate combination of its children

69

SLIDE 70

Stochastic games: Backgammon

70

SLIDE 71

Stochastic games

71

 Expected utility: Chance nodes take average (expectation) over

all possible outcomes.

 It is consistent with the definition of rational agents trying to

maximize expected utility.

2 3 1 4 2.1 1.3 average of the values weighted by their probabilities

SLIDE 72

Stochastic games

72

𝐹𝑌𝑄𝐹𝐷𝑈_𝑁𝐽𝑂𝐽𝑁𝐵𝑌(𝑡) = 𝑉𝑈𝐽𝑀𝐽𝑈𝑍(𝑡, 𝑁𝐵𝑌) 𝑗𝑔 𝑈𝐹𝑆𝑁𝐽𝑂𝐵𝑀_𝑈𝐹𝑇𝑈(𝑡) 𝑛𝑏𝑦𝑏∈𝐵𝐷𝑈𝐽𝑃𝑂𝑇 𝑡 𝐹𝑌𝑄𝐹𝐷𝑈_𝑁𝐽𝑂𝐽𝑁𝐵𝑌(𝑆𝐹𝑇𝑉𝑀𝑈(𝑡, 𝑏)) 𝑄𝑀𝐵𝑍𝐹𝑆 𝑡 = MAX 𝑛𝑗𝑜𝑏∈𝐵𝐷𝑈𝐽𝑃𝑂𝑇 𝑡 𝐹𝑌𝑄𝐹𝐷𝑈_𝑁𝐽𝑂𝐽𝑁𝐵𝑌(𝑆𝐹𝑇𝑉𝑀𝑈(𝑡, 𝑏)) 𝑄𝑀𝐵𝑍𝐹𝑆 𝑡 = MIN

𝑠

𝑄(𝑠) 𝐹𝑌𝑄𝐹𝐷𝑈_𝑁𝐽𝑂𝐽𝑁𝐵𝑌(𝑆𝐹𝑇𝑉𝑀𝑈 𝑡, 𝑠 ) 𝑄𝑀𝐵𝑍𝐹𝑆 𝑡 = CHANCE

SLIDE 73

What Utilities to Use?

 For worst-case minimax reasoning, terminal function scale doesn’t matter  We just want better states to have higher evaluations (get the ordering right)  We call this insensitivity to monotonic transformations  For average-case expectimax reasoning, we need magnitudes to be meaningful

40 20 30 x

2

0 1600 400 900

73

SLIDE 74

Evaluation functions for stochastic games

74

 An order preserving transformation on leaf values is not

a sufficient condition.

 Evaluation

function must be a positive linear transformation of the expected utility of a position.

SLIDE 75

Properties of search space for stochastic games

75

 𝑃(𝑐𝑛𝑜𝑛)

 Backgammon: 𝑐 ≈ 20 (can be up to 4000 for double dice rolls), 𝑜 = 21

(no. of different possible dice rolls)

 3-plies is manageable (≈ 108 nodes)

 Probability of reaching a given node decreases enormously by

increasing the depth (multiplying probabilities)

 Forming detailed plans of actions may be pointless

 Limiting depth is not such damaging particularly when the probability values (for

each non-deterministic situation) are close to each other

 But pruning is not straightforward.

SLIDE 76

Example: Backgammon

 Dice rolls increase b: 21 possible rolls with 2 dice

 Backgammon  20 legal moves  Depth 2 = 20 x (21 x 20)3 = 1.2 x 109

 As depth increases, probability of reaching a given

search node shrinks

 So usefulness of search is diminished  So limiting depth is less damaging  But pruning is trickier…

 Historic AI: TDGammon uses depth-2 search + very

good evaluation function + reinforcement learning

Image: Wikipedia

76

1st AI world champion in any game!

SLIDE 77

Other Game Types

77

SLIDE 78

Multi-Agent Utilities

 What if the game is not zero-sum, or has multiple players?  Generalization of minimax:



Terminals have utility tuples



Node values are also utility tuples



Each player maximizes its own component



Can give rise to cooperation and competition dynamically…

1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5

78

SLIDE 79

State-of-the-art game programs

 Chess (𝑐 ≈ 35)

 In 1997, Deep Blue defeated Kasparov.

 ran on a parallel computer doing alpha-beta search.  reaches depth 14 plies routinely.

 techniques to extend the effective search depth

 Hydra: Reaches depth 18 plies using more heuristics.

 Checkers (𝑐 < 10)

 Chinook (ran on a regular PC and uses alpha-beta search) ended 40-

year-reign of human world championTinsley in 1994.

 Since 2007, Chinook has been able to play perfectly by using alpha-beta

search combined with a database of 39 trillion endgame positions.

79

SLIDE 80

State-of-the-art game programs (Cont.)

80

 Othello (𝑐 is usually between 5 to 15)

 Logistello defeated the human world champion by

six games to none in 1997.

 Human champions are no match for computers at

Othello.

 Go (𝑐 > 300)

 Human

champions refuse to compete against computers (current programs are still at advanced amateur level).

 MOGO avoided alpha-beta search and used Monte

Carlo rollouts.

 AlphaGo (2016) has beaten professionals without

handicaps.

SLIDE 81

State-of-the-art game programs (Cont.)

81

 Backgammon (stochastic)

 TD-Gammon (1992) was competitive with top

human players.

 Depth 2 or 3 search along with a good evaluation

function developed by learning methods

 Bridge (partially observable, multiplayer)

 In 1998, GIB was 12th in a filed of 35 in the par

contest at human world championship.

 In 2005, Jack defeated three out of seven top

champions pairs. Overall, it lost by a small margin.

 Scrabble (partially observable & stochastic)

 In 2006, Quackle defeated the former world