[PDF] - I nt roduct ion So f ar we have st udied environment s where t PDF Document

SLIDE 1

1

Adversarial Search

CS 486 / 686 May 19, 2005 Univer sit y of Wat erloo

2

I nt roduct ion

So f ar we have st udied environment s

where t here is only a single-agent

Today we look at what happens if we are

in a set t ing where t her e ar e mult iple agent s planning against each ot her

– Game t heory: zero sum games

3

Out line

Games
Minimax search
Evaluat ion f unct ions
Alpha-bet a pruning
Coping wit h chance
Game programs

4

Games

Games are one of t he oldest , most well-st udied

domains in AI

Why?

– They are f un – Games are usually easy t o represent and t he rules are clear – St at e spaces can be very large (so more challenging t han “t oy problems”)

I n chess t he search t ree has ~10154 nodes

– Like t he “real world” in t hat decisions have t o be made and t ime is vit ally import ant – Easy t o det ermine when a program is doing well

i.e. it wins

5

Types of games

Perf ect vs imperf ect inf or mat ion

– Perf ect inf o means t hat you can see t he ent ire st at e of t he game – Chess, checkers, ot hello, go,… – I mperf ect inf o games include scrabble, poker, most card games

Det er minist ic vs st ochast ic

– Chess is det erminist ic – Backgammon is st ochast ic

6

Games as search problems

Consider a 2-player perf ect inf ormat ion game

– State: board conf igurat ion plus t he player who’s t urn it is t o move – Successor f unction: given a st at e ret urns a list of (move,st at e) pairs, indicat ing a legal move and t he result ing board – Terminal state: st at es where t here is a win/ loss/ draw – Utilit y f unct ion: assigns a numerical value t o t erminal st at es (e.g. I n chess +1 f or a win, -1 f or a loss, 0 f or a draw) – Solution: a st rat egy (way of picking moves) t hat wins t he game

SLIDE 2

2

7

Game search challenge

What makes game search challenging?

– There is an opponent ! – The opponent is malicious – it want s t o win (i.e. it is t rying t o make you lose) – We need t o t ake t his int o account when choosing moves

Simulat e t he opponent ’s behaviour in our search
Not at ion: One player is called MAX (who

want s t o maximize it s ut ilit y) and one player is called MI N (who want s t o minimize it s ut ilit y)

8

Example: Tic-Tac-Toe

X X X X X X X X X MAX (X) MIN (O) X X O O O X O O O O O O O MAX (X) X O X O X O X X X X X X X MIN (O) X O X X O X X O X . . . . . . . . . . . . . . . . . . . . . TERMINAL X X −1 +1 Utility

MAX’s j ob is t o use t he search t ree t o det ermine t he best move

9

Opt imal st rat egies

I n st andar d search t he opt imal solut ion is

a sequence of moves leading t o a winning t erminal st at e

But MI N has somet hing t o say about t his
Strategy (f rom MAX’s perspect ive):

– Specif y a move f or t he init ial st at e, specif y a move f or all possible st at es arising f rom MI N’s response, t hen all possible responses t o all of MI N’s responses t o MAX’s previous move… ..

10

Opt imal st rat egies

Want t o f ind t he opt imal st rat egy

ply

12

Minimax algorit hm

Ret urns act ion corresponding t o best possible move

SLIDE 3

3

13

Propert ies of Minimax

Complet e if t ree is f init e
Time complexit y: O(bm)
Space complexit y: O(bm) (it is DFS)
Opt imal against an opt imal opponent

m is dept h of t he t ree

14

Propert ies of Minimax

Complet e if t ree is f init e
Time complexit y: O(bm)
Space complexit y: O(bm) (it is DFS)
Opt imal against an opt imal opponent

– I f MI N does not play opt imally t hen we might be able t o do bet t er f ollowing a dif f erent st rat egy

m is dept h of t he t ree

15

Minimax and mult i-player games

to move A B C A ( 1, 2, 6) ( 4, 2, 3) ( 6, 1, 2) ( 7, 4,−1) ( 5,−1,−1) (−1, 5, 2) (7, 7,−1) ( 5, 4, 5) ( 1, 2, 6) ( 6, 1, 2) (−1, 5, 2) ( 5, 4, 5) ( 1, 2, 6) (−1, 5, 2) ( 1, 2, 6) X

Can not handle alliances, sidepayment s… .

16

Can we now writ e a program t hat will

play chess r easonably well?

17

Can we now writ e a program t hat will

play chess r easonably well?

– For chess b~35 and m~100 – Do we really need t o look at all t hose nodes?

18

Alpha-Bet a Pruning

No!

– I f we are smart (and lucky) we can do pruning

Eliminat e large part s of t he t ree f rom

considerat ion

Alpha-Bet a pruning applied t o a minimax

t ree

– Ret urns t he same decision as minimax – Prunes branches t hat cannot inf luence f inal decision

SLIDE 4

4

19

Alpha-Bet a Pruning

Alpha:

– Value of best (highest value) choice we have f ound so f ar on t he pat h f or MAX

Bet a:

– Value of best (lowest value) choice we have f ound so f ar on pat h f or MI N

Updat e alpha and bet a as sear ch cont inues
Prune as soon as t he value of t he current node

Propert ies of Alpha-Bet a

Pruning does not af f ect t he f inal result

– You prune part s of t he t ree t hat you would never reach in act ual play

The order in which moves are evaluat ed

are import ant

– Wit h bad move ordering will prune not hing – Wit h perf ect node ordering can reduce t ime complexit y t o O(bm/ 2)

29

Real-t ime decisions

Alpha-bet a can be a huge improvement
ver minimax

– St ill not good enough as we need t o search all t he way t o t erminal st at es f or at least part of sear ch space – Need t o make a decision about a move quickly

Heurist ic evaluat ion f unct ion + cut of f

t est

30

Evaluat ion f unct ions

Apply an evaluat ion f unct ion t o a st at e

– I f t erminal st at e, f unct ion ret urns act ual ut ilit y – I f non-t erminal, f unct ion ret ur ns est imat e

f t he expect ed ut ilit y (i.e. t he chance of

winning f rom t hat st at e) – Funct ion must be f ast t o comput e

SLIDE 6

6

31

Evaluat ion f unct ions

Evaluat ion f unct ions can be given by t he

designer of t he program (using expert knowledge) or lear ned f rom experience

I f f eat ures can be j udged independent ly,

a weight ed linear f unct ion is good

– w1f 1(s)+w2f 2(s)+… +wnf n(s) wit h s as board st at e

32

Cut t ing of f search

I nst ead of searching unt il we f ind a

t er minal st at e, we can cut search sooner and apply t he evaluat ion f unct ion

When?

– Arbit rarily (but deeper is bet t er) – Quiescent st at es

St at es t hat are “st able” – not going t o change

value (by a lot ) in t he near f ut ure

– Singular ext ensions

Searching deeper when you have a move t hat is

“clearly bet t er” (i.e. moving t he king out of check)

Can be used t o avoid t he horizon ef f ect

33

Cut t ing of f search

How deep do we need t o search?

– Novice chess human player

5-ply (minimax)

– Mast er chess human player

10-ply (alpha-bet a)

– Grandmast er chess human player

14-ply + a f ant ast ic evaluat ion f unct ion, opening

and endgame dat abases,… , special purpose hardware would be nice but is no longer really needed (Frit z)

34

St ochast ic games

I n games like Backgammon chance plays

a roll

CHANCE MIN MAX CHANCE MAX . . . . . .

B

2 1 −1 1 −1 . . .

1,1 1/36 1,2 1/18

TERMINAL

1,2 1/18 ... ... ... ... ... ... ... 1,1 1/36 ... ... ... ... ... ... C

. . .

1/18 6,5 6,6 1/36 1/18 6,5 6,6 1/36

35

St ochast ic games

Need t o consider best / worst cases +

probabilit y t hey will occur

Recall: Expect ed value of a random

variable x

E[x]=∑x∈ X P(x)x

Expect iminimax is like minimax but

at chance nodes comput e t he expect ed value

36

Expect iminimax

MIN MAX

2

CHANCE

4 7 4 6 5 −2 2 4 −2 0.5 0.5 0.5 0.5 3 −1

SLIDE 7

7

37

Expect iminimax

DICE MIN MAX

2 2 3 3 1 1 4 4 2 3 1 4 .9 .1 .9 .1 2.1 1.3 20 20 30 30 1 1 400 400 20 30 1 400 .9 .1 .9 .1 21 40.9

WARNI NG: exact values do mat t er! Order-preserving t ransf ormat ions of t he evaluat ion f unct ion can change t he choice

f moves. Must have posit ive linear t ransf ormat ions only

38

Some Game Programs

39

Checkers: Tinsley vs. Chinook

Mr. Tinsley suf f ered his 4t h and 5t h losses ever

against Chinook

40

Checkers

Chinook: ht t p:/ / www.cs.ualbert a.ca/ ~chinook

– World Man-Machine Checkers Champion – Alpha-bet a search – Opening dat abase – I t s secret weapon: Endgame dat abase

Precomput ed dat abase of all 444 billion

posit ions wit h 8 or f ewer pieces, each wit h perf ect win/ loss/ draw inf o

Perf ect knowledge int o t he search

– Checkers is now dominat ed by comput ers

41

Chess: Kasparov vs. Deep Blue

Kasparov 5’10” 176 lbs 34 years 50 billion neurons 2 pos/ sec Ext ensive Elect rical/ chemical Enormous Height Weight Age Comput ers Speed Knowledge Power Source Ego Deep Blue 6’ 5” 2,400 lbs 4 years 32 RI SC processors + 256 VLSI chess engines 200,000,000 pos/ sec Primit ive Elect rical None

J onat han Schaef f er

1997: Deep Blue wins by 3 wins, 1 loss, and 2 draws

42

Chess

I t s secret :

– Specialized chess processor + special- purpose memory opt imizat ion – Very sophist icat ed evaluat ion f unct ion

Expert f eat ures and hand-t uned weight s

– Opening and closing books – Alpha-bet a + improvement s (searching up t o 40 ply deep!) – Search over 200 million posit ions per second (t hough lot s of t hese possible moves are silly moves by human st andards… )

SLIDE 8

8

43

Chess

There are now programs running on PCs

t hat ar e on par wit h human champions

– Deep J unior vs Kasparov in 2003: 3/ 3 t ie – Deep J unior: 8 CPU, 8GB RAM, Windows 2000, 2000000 pos/ second

I s Chess st ill a human game or have

comput er s conquered it ?

44

Backgammon

TD-Gammon (Gerry Tesauro at I BM)
One of t he t op players in t he world
But only searches t wo moves ahead!
I t s secret : One amazing evaluat ion f unct ion

– Neural net work t rained wit h reinf or cement learning during ~1million games played against it self – Humans play backgammon dif f erent ly now, based on what TD-Gammon learned about t he game – Very cool AI ☺

45

Othello: Murakami vs. Logistello

Takeshi Murakami World Ot hello Champion 1997: The Logist ello sof t ware crushed Murakami by 6 games t o 0

Jonathan Schaeffer

46

Ot hello/ Reversi

Logist ello (Michael Bur o f r om U of Albert a)
Human world champion crushed by t he

progr am

– Humans no mat ch f or machine

I t s secret : Evaluat ion f unct ion

– Aut omat ically discovered and t uned knowledge

Samples pat t erns t o see if it s presence in a posit ion can

be correlat ed wit h success

Tuned 1.5 million paramet ers using self -play games wit h

f eedback

47

Bridge

GI B (Mat t Ginsber g – U of Oregon)

– World’s f irst expert level bridge playing progr am (Finished 12t h in human world championship in 1998) – Humans are st ill doing bet t er, but t he gap is narr owing quickly

I t s secret s:

– Does simulat ions f or each decision

Deals cards t o opponent s consist ent wit h

available inf ormat ion

Chooses act ion t hat maximizes expect ed ret urn
Plus ot her t ricks…

48

Go: Goemate vs. ??

Name: Chen Zhixing Prof ession: Ret ired Comput er skills: self -t aught programmer Aut hor of Goemat e (one of t he best Go program available t oday) Gave Goemat e a 9 st one handicap and st ill easily beat t he program, t hereby winning $15,000

J onat han Schaef f er

SLIDE 9

9

49

Go: Goemate vs. ??

Name: Chen Zhixing Prof ession: Ret ired Comput er skills: self -t aught programmer Aut hor of Goemat e (arguably t he st rongest Go programs) Gave Goemat e a 9 st one handicap and st ill easily beat t he program, t hereby winning $15,000

J onat han Schaef f er

Go has t oo high a branching f act or f or exist ing search t echniques (b~100) Current and f ut ure sof t ware must rely on huge dat abases and pat t ern-recognit ion t echniques Need t o make st rat egic decisions – Which bat t le is wort h f ight ing? Go has t oo high a branching f act or f or exist ing search t echniques (b~100) Current and f ut ure sof t ware must rely on huge dat abases and pat t ern-recognit ion t echniques Need t o make st rat egic decisions – Which bat t le is wort h f ight ing?

50

Summary

Games pose lot s of f ascinat ing

challenges f or AI researchers

Minimax search allows us t o play
pt imally against an opt imal opponent
Alpha-bet a pruning allows use t o r educe

t he search space

A good evaluat ion f unct ion is key t o

doing well

Games are f un

51

Next class

We will begin reasoning under