[PPT] - CS344M Autonomous Multiagent Systems Todd Hester Department or PowerPoint Presentation

SLIDE 1

CS344M Autonomous Multiagent Systems

Todd Hester Department or Computer Science The University of Texas at Austin

SLIDE 2

Good Afternoon, Colleagues

Are there any questions?

Todd Hester

SLIDE 3

Logistics

Readings

Todd Hester

SLIDE 4

Logistics

Readings

– Specify which papers you read!

Todd Hester

SLIDE 5

Logistics

Readings

– Specify which papers you read! – 2 case studies and 1 TDP

Todd Hester

SLIDE 6

Logistics

Readings

– Specify which papers you read! – 2 case studies and 1 TDP

How to read a research paper

Todd Hester

SLIDE 7

Logistics

Readings

– Specify which papers you read! – 2 case studies and 1 TDP

How to read a research paper

– Some have too few details...

Todd Hester

SLIDE 8

Logistics

Readings

– Specify which papers you read! – 2 case studies and 1 TDP

How to read a research paper

– Some have too few details... – Others have too many.

Todd Hester

SLIDE 9

Logistics

Readings

– Specify which papers you read! – 2 case studies and 1 TDP

How to read a research paper

– Some have too few details... – Others have too many.

Next week’s readings posted

Todd Hester

SLIDE 10

Logistics

Readings

– Specify which papers you read! – 2 case studies and 1 TDP

How to read a research paper

– Some have too few details... – Others have too many.

Next week’s readings posted
Use the undergrad writing center!

– Friday afternoon workshops (3 p.m.)

Todd Hester

SLIDE 11

Overview of the Readings

Darwin: genetic programming approach

Todd Hester

SLIDE 12

Overview of the Readings

Darwin: genetic programming approach
Stone and McAllester: Architecture for action selection

Todd Hester

SLIDE 13

Overview of the Readings

Darwin: genetic programming approach
Stone and McAllester: Architecture for action selection
Riley et al: Coach competition, extracting models

Todd Hester

SLIDE 14

Overview of the Readings

Darwin: genetic programming approach
Stone and McAllester: Architecture for action selection
Riley et al: Coach competition, extracting models
Kuhlmann et al: Learning for coaching

Todd Hester

SLIDE 15

Overview of the Readings

Darwin: genetic programming approach
Stone and McAllester: Architecture for action selection
Riley et al: Coach competition, extracting models
Kuhlmann et al: Learning for coaching
Withopf and Riedmiller: Reinforcement learning

Todd Hester

SLIDE 16

Overview of the Readings

Darwin: genetic programming approach
Stone and McAllester: Architecture for action selection
Riley et al: Coach competition, extracting models
Kuhlmann et al: Learning for coaching
Withopf and Riedmiller: Reinforcement learning
MacAlpine et al: UT Austin Villa 2011

Todd Hester

SLIDE 17

Overview of the Readings

Darwin: genetic programming approach
Stone and McAllester: Architecture for action selection
Riley et al: Coach competition, extracting models
Kuhlmann et al: Learning for coaching
Withopf and Riedmiller: Reinforcement learning
MacAlpine et al: UT Austin Villa 2011
Barrett et al: SPL Kicking strategy

Todd Hester

SLIDE 18

Evolutionary Computation

Motivated by biological evolution: GA, GP

Todd Hester

SLIDE 19

Evolutionary Computation

Motivated by biological evolution: GA, GP
Search through a space

Todd Hester

SLIDE 20

Evolutionary Computation

Motivated by biological evolution: GA, GP
Search through a space

− Need a representation, fitness function − Probabilistically apply search operators to set of points in search space

Todd Hester

SLIDE 21

Evolutionary Computation

Motivated by biological evolution: GA, GP
Search through a space

− Need a representation, fitness function − Probabilistically apply search operators to set of points in search space

Randomized, parallel hill-climbing through space

Todd Hester

SLIDE 22

Evolutionary Computation

Motivated by biological evolution: GA, GP
Search through a space

− Need a representation, fitness function − Probabilistically apply search operators to set of points in search space

Randomized, parallel hill-climbing through space
Learning is an optimization problem (fitness)

Todd Hester

SLIDE 23

Evolutionary Computation

Motivated by biological evolution: GA, GP
Search through a space

− Need a representation, fitness function − Probabilistically apply search operators to set of points in search space

Randomized, parallel hill-climbing through space
Learning is an optimization problem (fitness)

Some slides from Machine Learning [Mitchell, 1997]

Todd Hester

SLIDE 24

Darwin United

More ambitious follow-up to Luke, 97 (made 2nd round)

Todd Hester

SLIDE 25

Darwin United

More ambitious follow-up to Luke, 97 (made 2nd round)
Motivated in part by Peter’s detailed team construction

Todd Hester

SLIDE 26

Darwin United

More ambitious follow-up to Luke, 97 (made 2nd round)
Motivated in part by Peter’s detailed team construction
Evolves whole teams — lexicographic fitness function

Todd Hester

SLIDE 27

Darwin United

More ambitious follow-up to Luke, 97 (made 2nd round)
Motivated in part by Peter’s detailed team construction
Evolves whole teams — lexicographic fitness function
Evolved on huge (at the time) hypercube

Todd Hester

SLIDE 28

Darwin United

More ambitious follow-up to Luke, 97 (made 2nd round)
Motivated in part by Peter’s detailed team construction
Evolves whole teams — lexicographic fitness function
Evolved on huge (at the time) hypercube
Lots of spinning, but figured out dribbling, offsides

Todd Hester

SLIDE 29

Darwin United

More ambitious follow-up to Luke, 97 (made 2nd round)
Motivated in part by Peter’s detailed team construction
Evolves whole teams — lexicographic fitness function
Evolved on huge (at the time) hypercube
Lots of spinning, but figured out dribbling, offsides
1-1-1 record. Tied a good team, but didn’t advance

Todd Hester

SLIDE 30

Darwin United

More ambitious follow-up to Luke, 97 (made 2nd round)
Motivated in part by Peter’s detailed team construction
Evolves whole teams — lexicographic fitness function
Evolved on huge (at the time) hypercube
Lots of spinning, but figured out dribbling, offsides
1-1-1 record. Tied a good team, but didn’t advance
Success of the method, but not pursued

Todd Hester

SLIDE 31

Architecture for Action Selection

(other slides, video)

Todd Hester

SLIDE 32

Architecture for Action Selection

(other slides, video)
downsides

Todd Hester

SLIDE 33

Architecture for Action Selection

(other slides, video)
downsides
Keepaway

Todd Hester

SLIDE 34

Coaching

Learn best strategy to play a fixed team

Todd Hester

SLIDE 35

Coaching

Learn best strategy to play a fixed team
Give high level advice to players at low frequency

Todd Hester

SLIDE 36

Coaching

Learn best strategy to play a fixed team
Give high level advice to players at low frequency
Focus on learning formations

Todd Hester

SLIDE 37

Coaching

Learn best strategy to play a fixed team
Give high level advice to players at low frequency
Focus on learning formations
Learn when successful teams passed/kicked

Todd Hester

SLIDE 38

Coaching

Learn best strategy to play a fixed team
Give high level advice to players at low frequency
Focus on learning formations
Learn when successful teams passed/kicked
Learn when opponent will pass and try to block

Todd Hester

SLIDE 39

Coaching

Learn best strategy to play a fixed team
Give high level advice to players at low frequency
Focus on learning formations
Learn when successful teams passed/kicked
Learn when opponent will pass and try to block
What if players switch roles?

Todd Hester

SLIDE 40

Coaching

Learn best strategy to play a fixed team
Give high level advice to players at low frequency
Focus on learning formations
Learn when successful teams passed/kicked
Learn when opponent will pass and try to block
What if players switch roles?
Why just imitate another team?

Todd Hester

SLIDE 41

Coaching

Learn best strategy to play a fixed team
Give high level advice to players at low frequency
Focus on learning formations
Learn when successful teams passed/kicked
Learn when opponent will pass and try to block
What if players switch roles?
Why just imitate another team?
Other slides

Todd Hester

SLIDE 42

Reinforcement Learning

RL Slides

Todd Hester

SLIDE 43

Reinforcement Learning

RL Slides
Extend to grid soccer

Todd Hester

SLIDE 44

Reinforcement Learning

RL Slides
Extend to grid soccer
Large state space, joint actions

Todd Hester

SLIDE 45

Reinforcement Learning

RL Slides
Extend to grid soccer
Large state space, joint actions

Todd Hester

SLIDE 46

UT Austin Villa 2011

Other slides

Todd Hester

SLIDE 47

UT Austin Villa 2011

Other slides
Why not use CMA-ES on role positions as well?

Todd Hester

SLIDE 48

UT Austin Villa 2011

Other slides
Why not use CMA-ES on role positions as well?
Changes for 2012?

Todd Hester

SLIDE 49

Kicking Under Uncertainty

Used by our SPL team

Todd Hester

SLIDE 50

Kicking Under Uncertainty

Used by our SPL team
Kick engine to kick at various distances/headings

Todd Hester

SLIDE 51

Kicking Under Uncertainty

Used by our SPL team
Kick engine to kick at various distances/headings
Adjust to seen ball location

Todd Hester

SLIDE 52

Kicking Under Uncertainty

Used by our SPL team
Kick engine to kick at various distances/headings
Adjust to seen ball location
Select first kick that moves ball up field

Todd Hester

SLIDE 53

Kicking Under Uncertainty

Used by our SPL team
Kick engine to kick at various distances/headings
Adjust to seen ball location
Select first kick that moves ball up field
Figure

Todd Hester

SLIDE 54

Kicking Under Uncertainty

Used by our SPL team
Kick engine to kick at various distances/headings
Adjust to seen ball location
Select first kick that moves ball up field
Figure
Emphasis on quickness

Todd Hester

SLIDE 55

Kicking Under Uncertainty

Used by our SPL team
Kick engine to kick at various distances/headings
Adjust to seen ball location
Select first kick that moves ball up field
Figure
Emphasis on quickness
Now: Better model of opponents -> Know if we have more

time

Todd Hester

SLIDE 56

Kicking Under Uncertainty

Used by our SPL team
Kick engine to kick at various distances/headings
Adjust to seen ball location
Select first kick that moves ball up field
Figure
Emphasis on quickness
Now: Better model of opponents -> Know if we have more

time

Todd Hester

SLIDE 57

Learning Commentary

David Chen and Ray Mooney

Todd Hester

SLIDE 58

Coordination Graphs

n agents, each choose an action Ai

Todd Hester

SLIDE 59

Coordination Graphs

n agents, each choose an action Ai
A = A1 × . . . × An

Todd Hester

SLIDE 60

Coordination Graphs

n agents, each choose an action Ai
A = A1 × . . . × An
Ri(A) → IR

Todd Hester

SLIDE 61

Coordination Graphs

n agents, each choose an action Ai
A = A1 × . . . × An
Ri(A) → IR
Coordination problem: R1 = . . . = Rn = R

Todd Hester

SLIDE 62

Coordination Graphs

n agents, each choose an action Ai
A = A1 × . . . × An
Ri(A) → IR
Coordination problem: R1 = . . . = Rn = R
Nash equilibrium: no agent could do better given what
thers are doing.

Todd Hester

SLIDE 63

Coordination Graphs

n agents, each choose an action Ai
A = A1 × . . . × An
Ri(A) → IR
Coordination problem: R1 = . . . = Rn = R
Nash equilibrium: no agent could do better given what
thers are doing.
May be more than one (chicken)

Todd Hester

SLIDE 64

Example from the paper

Understand the rule syntax

Todd Hester

SLIDE 65

Example from the paper

Understand the rule syntax
Form the coordination graph

Todd Hester

SLIDE 66

Example from the paper

Understand the rule syntax
Form the coordination graph
First eliminate rules based on context

Todd Hester

SLIDE 67

Example from the paper

Understand the rule syntax
Form the coordination graph
First eliminate rules based on context
What does it mean for G3 to collect all relevant rules?

Todd Hester

SLIDE 68

Example from the paper

Understand the rule syntax
Form the coordination graph
First eliminate rules based on context
What does it mean for G3 to collect all relevant rules?
What does it mean for G3 to maximize over all actions of

a1 and a2?

Todd Hester

SLIDE 69

Example from the paper

Understand the rule syntax
Form the coordination graph
First eliminate rules based on context
What does it mean for G3 to collect all relevant rules?
What does it mean for G3 to maximize over all actions of

a1 and a2?

How are the results propagated back?

Todd Hester

SLIDE 70

Example from the paper

Understand the rule syntax
Form the coordination graph
First eliminate rules based on context
What does it mean for G3 to collect all relevant rules?
What does it mean for G3 to maximize over all actions of

a1 and a2?

How are the results propagated back?
Let’s try again with G1 eliminated first

Todd Hester

SLIDE 71

Application to soccer

Make the world discrete by assigning roles, using high-

level predicates

Todd Hester

SLIDE 72

Application to soccer

Make the world discrete by assigning roles, using high-

level predicates

Assume global state information

Todd Hester

SLIDE 73

Application to soccer

Make the world discrete by assigning roles, using high-

level predicates

Assume global state information
Finds pass sequences and starts players moving ahead of

time.

Todd Hester

SLIDE 74

Application to soccer

Make the world discrete by assigning roles, using high-

level predicates

Assume global state information
Finds pass sequences and starts players moving ahead of

time.

Note the results: with and without coordination.

Todd Hester

SLIDE 75

Reactive Deliberation

A hybrid approach
Executor: carry out reactive behaviors
Deliberator:

evaluate possible high-level schema with parameters; generate bids

Deliberator takes time, but something keeps happening

always.

In effect: deliberator commits to schema for some time

Todd Hester