I nt roduct ion So f ar we have st udied environment s where t - - PDF document

i nt roduct ion
SMART_READER_LITE
LIVE PREVIEW

I nt roduct ion So f ar we have st udied environment s where t - - PDF document

I nt roduct ion So f ar we have st udied environment s where t here is only a single-agent Adversarial Search Today we look at what happens if we are in a set t ing where t her e ar e mult iple CS 486 / 686 agent s planning against


slide-1
SLIDE 1

1

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

1

Adversarial Search

CS 486 / 686 May 19, 2005 Univer sit y of Wat erloo

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

2

I nt roduct ion

  • So f ar we have st udied environment s

where t here is only a single-agent

  • Today we look at what happens if we are

in a set t ing where t her e ar e mult iple agent s planning against each ot her

– Game t heory: zero sum games

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

3

Out line

  • Games
  • Minimax search
  • Evaluat ion f unct ions
  • Alpha-bet a pruning
  • Coping wit h chance
  • Game programs

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

4

Games

  • Games are one of t he oldest , most well-st udied

domains in AI

  • Why?

– They are f un – Games are usually easy t o represent and t he rules are clear – St at e spaces can be very large (so more challenging t han “t oy problems”)

  • I n chess t he search t ree has ~10154 nodes

– Like t he “real world” in t hat decisions have t o be made and t ime is vit ally import ant – Easy t o det ermine when a program is doing well

  • i.e. it wins

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

5

Types of games

  • Perf ect vs imperf ect inf or mat ion

– Perf ect inf o means t hat you can see t he ent ire st at e of t he game – Chess, checkers, ot hello, go,… – I mperf ect inf o games include scrabble, poker, most card games

  • Det er minist ic vs st ochast ic

– Chess is det erminist ic – Backgammon is st ochast ic

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

6

Games as search problems

  • Consider a 2-player perf ect inf ormat ion game

– State: board conf igurat ion plus t he player who’s t urn it is t o move – Successor f unction: given a st at e ret urns a list of (move,st at e) pairs, indicat ing a legal move and t he result ing board – Terminal state: st at es where t here is a win/ loss/ draw – Utilit y f unct ion: assigns a numerical value t o t erminal st at es (e.g. I n chess +1 f or a win, -1 f or a loss, 0 f or a draw) – Solution: a st rat egy (way of picking moves) t hat wins t he game

slide-2
SLIDE 2

2

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

7

Game search challenge

  • What makes game search challenging?

– There is an opponent ! – The opponent is malicious – it want s t o win (i.e. it is t rying t o make you lose) – We need t o t ake t his int o account when choosing moves

  • Simulat e t he opponent ’s behaviour in our search
  • Not at ion: One player is called MAX (who

want s t o maximize it s ut ilit y) and one player is called MI N (who want s t o minimize it s ut ilit y)

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

8

Example: Tic-Tac-Toe

X X X X X X X X X MAX (X) MIN (O) X X O O O X O O O O O O O MAX (X) X O X O X O X X X X X X X MIN (O) X O X X O X X O X . . . . . . . . . . . . . . . . . . . . . TERMINAL X X −1 +1 Utility

MAX’s j ob is t o use t he search t ree t o det ermine t he best move

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

9

Opt imal st rat egies

  • I n st andar d search t he opt imal solut ion is

a sequence of moves leading t o a winning t erminal st at e

  • But MI N has somet hing t o say about t his
  • Strategy (f rom MAX’s perspect ive):

– Specif y a move f or t he init ial st at e, specif y a move f or all possible st at es arising f rom MI N’s response, t hen all possible responses t o all of MI N’s responses t o MAX’s previous move… ..

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

10

Opt imal st rat egies

  • Want t o f ind t he opt imal st rat egy

– One t hat leads t o out comes at least as good as any ot her st rat egy, given t hat MI N is playing opt imally – Equilibr ium (game t heory) – Zero-sum games of perf ect inf ormat ion are “easy games” f rom a game t heoret ic perspect ive

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

11

Minimax Value

MI NI MAX-VALUE(n) = Ut ilit y(n) if n is a t erminal st at e Max s ∈ Succ(n) MI NI MAX-VALUE(s) if n is a MAX node Mins ∈ Succ(n) MI NI MAX-VALUE(s) is n is a MI N node MAX MIN 3 12 8 2 14 5 2 6 4 3 2 2 3

a 1 a 2 a 3

2

b

3

b

1

b

2

c

1

c

3

c

2

d

1

d

3

d

A B C D

ply

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

12

Minimax algorit hm

Ret urns act ion corresponding t o best possible move

slide-3
SLIDE 3

3

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

13

Propert ies of Minimax

  • Complet e if t ree is f init e
  • Time complexit y: O(bm)
  • Space complexit y: O(bm) (it is DFS)
  • Opt imal against an opt imal opponent

m is dept h of t he t ree

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

14

Propert ies of Minimax

  • Complet e if t ree is f init e
  • Time complexit y: O(bm)
  • Space complexit y: O(bm) (it is DFS)
  • Opt imal against an opt imal opponent

– I f MI N does not play opt imally t hen we might be able t o do bet t er f ollowing a dif f erent st rat egy

m is dept h of t he t ree

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

15

Minimax and mult i-player games

to move A B C A ( 1, 2, 6) ( 4, 2, 3) ( 6, 1, 2) ( 7, 4,−1) ( 5,−1,−1) (−1, 5, 2) (7, 7,−1) ( 5, 4, 5) ( 1, 2, 6) ( 6, 1, 2) (−1, 5, 2) ( 5, 4, 5) ( 1, 2, 6) (−1, 5, 2) ( 1, 2, 6) X

Can not handle alliances, sidepayment s… .

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

16

  • Can we now writ e a program t hat will

play chess r easonably well?

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

17

  • Can we now writ e a program t hat will

play chess r easonably well?

– For chess b~35 and m~100 – Do we really need t o look at all t hose nodes?

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

18

Alpha-Bet a Pruning

  • No!

– I f we are smart (and lucky) we can do pruning

  • Eliminat e large part s of t he t ree f rom

considerat ion

  • Alpha-Bet a pruning applied t o a minimax

t ree

– Ret urns t he same decision as minimax – Prunes branches t hat cannot inf luence f inal decision

slide-4
SLIDE 4

4

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

19

Alpha-Bet a Pruning

  • Alpha:

– Value of best (highest value) choice we have f ound so f ar on t he pat h f or MAX

  • Bet a:

– Value of best (lowest value) choice we have f ound so f ar on pat h f or MI N

  • Updat e alpha and bet a as sear ch cont inues
  • Prune as soon as t he value of t he current node

is known t o be worse t han current alpha or bet a values f or MAX or MI N

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

20

Alpha-Bet a example

MAX MI N

[-inf, inf] 3 [-inf, 3]

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

21

Alpha-Bet a example

MAX MI N

3 12 [-inf,3] [-inf,inf]

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

22

Alpha-Bet a example

MAX MI N

3 12 8 [3,3] [3,inf]

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

23

Alpha-Bet a example

MAX MI N

3 12 8 [3,3] [3,inf] 2 [-inf,2]

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

24

Alpha-Bet a example

MAX MI N

3 12 8 [3,3] [3,inf] 2 [-inf,2] P r une r emaining children

slide-5
SLIDE 5

5

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

25

Alpha-Bet a example

MAX MI N

3 12 8 [3,3] 2 [-inf,2] 14 [-inf,14] [3,14]

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

26

Alpha-Bet a example

MAX MI N

3 12 8 [3,3] 2 [-inf,2] 14 [-inf,5] [3,5] 5

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

27

Alpha-Bet a example

MAX MI N

3 12 8 [3,3] 2 [-inf,2] 14 [2,2] [3,3] 5 2

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

28

Propert ies of Alpha-Bet a

  • Pruning does not af f ect t he f inal result

– You prune part s of t he t ree t hat you would never reach in act ual play

  • The order in which moves are evaluat ed

are import ant

– Wit h bad move ordering will prune not hing – Wit h perf ect node ordering can reduce t ime complexit y t o O(bm/ 2)

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

29

Real-t ime decisions

  • Alpha-bet a can be a huge improvement
  • ver minimax

– St ill not good enough as we need t o search all t he way t o t erminal st at es f or at least part of sear ch space – Need t o make a decision about a move quickly

  • Heurist ic evaluat ion f unct ion + cut of f

t est

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

30

Evaluat ion f unct ions

  • Apply an evaluat ion f unct ion t o a st at e

– I f t erminal st at e, f unct ion ret urns act ual ut ilit y – I f non-t erminal, f unct ion ret ur ns est imat e

  • f t he expect ed ut ilit y (i.e. t he chance of

winning f rom t hat st at e) – Funct ion must be f ast t o comput e

slide-6
SLIDE 6

6

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

31

Evaluat ion f unct ions

  • Evaluat ion f unct ions can be given by t he

designer of t he program (using expert knowledge) or lear ned f rom experience

  • I f f eat ures can be j udged independent ly,

a weight ed linear f unct ion is good

– w1f 1(s)+w2f 2(s)+… +wnf n(s) wit h s as board st at e

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

32

Cut t ing of f search

  • I nst ead of searching unt il we f ind a

t er minal st at e, we can cut search sooner and apply t he evaluat ion f unct ion

  • When?

– Arbit rarily (but deeper is bet t er) – Quiescent st at es

  • St at es t hat are “st able” – not going t o change

value (by a lot ) in t he near f ut ure

– Singular ext ensions

  • Searching deeper when you have a move t hat is

“clearly bet t er” (i.e. moving t he king out of check)

  • Can be used t o avoid t he horizon ef f ect

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

33

Cut t ing of f search

  • How deep do we need t o search?

– Novice chess human player

  • 5-ply (minimax)

– Mast er chess human player

  • 10-ply (alpha-bet a)

– Grandmast er chess human player

  • 14-ply + a f ant ast ic evaluat ion f unct ion, opening

and endgame dat abases,… , special purpose hardware would be nice but is no longer really needed (Frit z)

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

34

St ochast ic games

  • I n games like Backgammon chance plays

a roll

CHANCE MIN MAX CHANCE MAX . . . . . .

B

2 1 −1 1 −1 . . .

1,1 1/36 1,2 1/18

TERMINAL

1,2 1/18 ... ... ... ... ... ... ... 1,1 1/36 ... ... ... ... ... ... C

. . .

1/18 6,5 6,6 1/36 1/18 6,5 6,6 1/36

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

35

St ochast ic games

  • Need t o consider best / worst cases +

probabilit y t hey will occur

  • Recall: Expect ed value of a random

variable x

E[x]=∑x∈ X P(x)x

  • Expect iminimax is like minimax but

at chance nodes comput e t he expect ed value

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

36

Expect iminimax

MIN MAX

2

CHANCE

4 7 4 6 5 −2 2 4 −2 0.5 0.5 0.5 0.5 3 −1

slide-7
SLIDE 7

7

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

37

Expect iminimax

DICE MIN MAX

2 2 3 3 1 1 4 4 2 3 1 4 .9 .1 .9 .1 2.1 1.3 20 20 30 30 1 1 400 400 20 30 1 400 .9 .1 .9 .1 21 40.9

WARNI NG: exact values do mat t er! Order-preserving t ransf ormat ions of t he evaluat ion f unct ion can change t he choice

  • f moves. Must have posit ive linear t ransf ormat ions only

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

38

Some Game Programs

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

39

Checkers: Tinsley vs. Chinook

  • Mr. Tinsley suf f ered his 4t h and 5t h losses ever

against Chinook

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

40

Checkers

  • Chinook: ht t p:/ / www.cs.ualbert a.ca/ ~chinook

– World Man-Machine Checkers Champion – Alpha-bet a search – Opening dat abase – I t s secret weapon: Endgame dat abase

  • Precomput ed dat abase of all 444 billion

posit ions wit h 8 or f ewer pieces, each wit h perf ect win/ loss/ draw inf o

  • Perf ect knowledge int o t he search

– Checkers is now dominat ed by comput ers

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

41

Chess: Kasparov vs. Deep Blue

Kasparov 5’10” 176 lbs 34 years 50 billion neurons 2 pos/ sec Ext ensive Elect rical/ chemical Enormous Height Weight Age Comput ers Speed Knowledge Power Source Ego Deep Blue 6’ 5” 2,400 lbs 4 years 32 RI SC processors + 256 VLSI chess engines 200,000,000 pos/ sec Primit ive Elect rical None

J onat han Schaef f er

1997: Deep Blue wins by 3 wins, 1 loss, and 2 draws

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

42

Chess

  • I t s secret :

– Specialized chess processor + special- purpose memory opt imizat ion – Very sophist icat ed evaluat ion f unct ion

  • Expert f eat ures and hand-t uned weight s

– Opening and closing books – Alpha-bet a + improvement s (searching up t o 40 ply deep!) – Search over 200 million posit ions per second (t hough lot s of t hese possible moves are silly moves by human st andards… )

slide-8
SLIDE 8

8

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

43

Chess

  • There are now programs running on PCs

t hat ar e on par wit h human champions

– Deep J unior vs Kasparov in 2003: 3/ 3 t ie – Deep J unior: 8 CPU, 8GB RAM, Windows 2000, 2000000 pos/ second

  • I s Chess st ill a human game or have

comput er s conquered it ?

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

44

Backgammon

  • TD-Gammon (Gerry Tesauro at I BM)
  • One of t he t op players in t he world
  • But only searches t wo moves ahead!
  • I t s secret : One amazing evaluat ion f unct ion

– Neural net work t rained wit h reinf or cement learning during ~1million games played against it self – Humans play backgammon dif f erent ly now, based on what TD-Gammon learned about t he game – Very cool AI ☺

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

45

Othello: Murakami vs. Logistello

Takeshi Murakami World Ot hello Champion 1997: The Logist ello sof t ware crushed Murakami by 6 games t o 0

Jonathan Schaeffer

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

46

Ot hello/ Reversi

  • Logist ello (Michael Bur o f r om U of Albert a)
  • Human world champion crushed by t he

progr am

– Humans no mat ch f or machine

  • I t s secret : Evaluat ion f unct ion

– Aut omat ically discovered and t uned knowledge

  • Samples pat t erns t o see if it s presence in a posit ion can

be correlat ed wit h success

  • Tuned 1.5 million paramet ers using self -play games wit h

f eedback

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

47

Bridge

  • GI B (Mat t Ginsber g – U of Oregon)

– World’s f irst expert level bridge playing progr am (Finished 12t h in human world championship in 1998) – Humans are st ill doing bet t er, but t he gap is narr owing quickly

  • I t s secret s:

– Does simulat ions f or each decision

  • Deals cards t o opponent s consist ent wit h

available inf ormat ion

  • Chooses act ion t hat maximizes expect ed ret urn
  • Plus ot her t ricks…

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

48

Go: Goemate vs. ??

Name: Chen Zhixing Prof ession: Ret ired Comput er skills: self -t aught programmer Aut hor of Goemat e (one of t he best Go program available t oday) Gave Goemat e a 9 st one handicap and st ill easily beat t he program, t hereby winning $15,000

J onat han Schaef f er

slide-9
SLIDE 9

9

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

49

Go: Goemate vs. ??

Name: Chen Zhixing Prof ession: Ret ired Comput er skills: self -t aught programmer Aut hor of Goemat e (arguably t he st rongest Go programs) Gave Goemat e a 9 st one handicap and st ill easily beat t he program, t hereby winning $15,000

J onat han Schaef f er

Go has t oo high a branching f act or f or exist ing search t echniques (b~100) Current and f ut ure sof t ware must rely on huge dat abases and pat t ern-recognit ion t echniques Need t o make st rat egic decisions – Which bat t le is wort h f ight ing? Go has t oo high a branching f act or f or exist ing search t echniques (b~100) Current and f ut ure sof t ware must rely on huge dat abases and pat t ern-recognit ion t echniques Need t o make st rat egic decisions – Which bat t le is wort h f ight ing?

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

50

Summary

  • Games pose lot s of f ascinat ing

challenges f or AI researchers

  • Minimax search allows us t o play
  • pt imally against an opt imal opponent
  • Alpha-bet a pruning allows use t o r educe

t he search space

  • A good evaluat ion f unct ion is key t o

doing well

  • Games are f un

CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart

51

Next class

  • We will begin reasoning under

uncert aint y

– Chapt er 13