Modular mul*task reinforcement learning with policy sketches Jacob - - PowerPoint PPT Presentation

modular mul task reinforcement learning with policy
SMART_READER_LITE
LIVE PREVIEW

Modular mul*task reinforcement learning with policy sketches Jacob - - PowerPoint PPT Presentation

Modular mul*task reinforcement learning with policy sketches Jacob Andreas, Sergey Levine and Dan Klein The learning problem make planks 2 The learning problem make planks make sticks 3 Learning from sketches get wood get wood use saw


slide-1
SLIDE 1

Modular mul*task reinforcement learning with policy sketches

Jacob Andreas, Sergey Levine and Dan Klein

slide-2
SLIDE 2

The learning problem

2

make planks

slide-3
SLIDE 3

The learning problem

3

make planks make sticks

slide-4
SLIDE 4

Learning from sketches

4

use saw get wood use axe get wood

slide-5
SLIDE 5

The op*ons framework

5

slide-6
SLIDE 6

The op*ons framework

6

+1

slide-7
SLIDE 7

The op*ons framework

7

+1

slide-8
SLIDE 8

The op*ons framework

8

[SuCon et al. 99, Bacon & Precup 16]

slide-9
SLIDE 9

Learning from intermediate rewards

9

[Kearns & Singh 02, Kulkarni et al. 16]

r r

slide-10
SLIDE 10

Learning from demonstra*ons

10

[Stolle & Precup 02, Fox & Krishnan et al. 16]

Ï

slide-11
SLIDE 11

Learning from policy sketches

11

Ï get wood use saw

slide-12
SLIDE 12

Why sketches?

12

Easy to collect Portable

Crafting environment make plank get wood use toolshed make stick get wood use workbench make cloth get grass use factory make rope get grass use toolshed make bridge get iron get wood make bed∗ get wood use toolshed make axe∗ get wood use workbench make shears get wood use workbench get gold get iron get wood get gem get wood use workbench

slide-13
SLIDE 13

Learning from policy sketches

slide-14
SLIDE 14

Learning from policy sketches

14

get wood use saw

make planks

slide-15
SLIDE 15

Learning from policy sketches

15

get wood use axe

make sticks

slide-16
SLIDE 16

Learning from policy sketches

16

get wood use saw get wood use axe

πa πb

[e.g. Branavan et al. 09, Oh et al. 17, Hermann et al. 17]

slide-17
SLIDE 17

Learning from policy sketches

17

get wood use saw get wood use axe

slide-18
SLIDE 18

`

18

get wood use saw get wood use axe

π1 π2 π1 π3

slide-19
SLIDE 19

`

19

get wood use saw get wood use axe

π1 π2 π1 π3

slide-20
SLIDE 20

`

get wood

π1

slide-21
SLIDE 21

Policy representa*on

21

π1

get wood

slide-22
SLIDE 22

Policy representa*on

22

π1

get wood

???

slide-23
SLIDE 23

Policy representa*on

23

slide-24
SLIDE 24

Policy representa*on

24

slide-25
SLIDE 25

Policy representa*on

25

slide-26
SLIDE 26

Policy representa*on

26

π1

get wood

Ac*on probabili*es

slide-27
SLIDE 27

( )

Policy search

27

∇ log π( | ) (rt - b)

tasks steps

Σ Σ

ac*on state reward baseline

slide-28
SLIDE 28

( )

Policy search

28

∇ log π( | ) (rt - b)

tasks steps

Σ Σ

get wood

slide-29
SLIDE 29

( )

Policy search

29

∇ log π( | ) (rt - b)

tasks steps

Σ Σ

use axe

slide-30
SLIDE 30

( )

Policy search

30

∇ log π( | ) (rt - b)

tasks steps

Σ Σ

SUBPOLICY

Reward .40

slide-31
SLIDE 31

Improving policy search

31

slide-32
SLIDE 32

( )

Improving policy search

32

∇ log π( | ) (rt - b)

tasks steps

Σ Σ

ac*on state reward baseline

slide-33
SLIDE 33

∇ log π( | ) (rt - ) ∇ log π( | ) (rt - ) ∇ log π( | ) (rt - ) ∇ log π( | ) (rt - )

Improving policy search

33

∇ log π( | ) (rt - )

( )

use saw make planks

( )

use saw make nails

∇ log π( | ) (rt - )

( )

use axe make planks

( )

use axe make nails

∇ log π( | ) (rt - )

( )

get iron make planks

( )

get iron make nails

∇ log π( | ) (rt - )

( )

get wood make planks

( )

get wood make nails

slide-34
SLIDE 34

( )

Improving policy search

34

∇ log π( | ) (rt - )

tasks steps

Σ Σ

SUBPOLICY

Reward .40

TASK

.89

slide-35
SLIDE 35

Do sketches help?

slide-36
SLIDE 36

The maze naviga*on task

36

slide-37
SLIDE 37

The maze naviga*on task

37

slide-38
SLIDE 38

The maze naviga*on task

38

x 106 episodes 1 2 3 Reward Unsupervised Sketches: modular Sketches: joint

slide-39
SLIDE 39

The mini-crag task

39

slide-40
SLIDE 40

The mini-crag task

40

slide-41
SLIDE 41

The mini-crag task

41

x 106 episodes 1 2 3 Reward Unsupervised Sketches: modular Sketches: joint

slide-42
SLIDE 42

The cliff-walking task

42

slide-43
SLIDE 43

The cliff-walking task

43

x 108 *mesteps 1 2 3 log Reward Unsupervised Sketches: modular Sketches: joint

slide-44
SLIDE 44

Zero-shot generaliza*on

44

What if I see a sketch I’ve never seen before?

get iron use axe

slide-45
SLIDE 45

Zero-shot generaliza*on

45

What if I see a sketch I’ve never seen before?

25 50 75 100 Mul*task Zero-shot

Modular Joint

49 89 77 1

slide-46
SLIDE 46

Zero-shot generaliza*on

46

What if I see a sketch I’ve never seen before?

25 50 75 100 Mul*task Zero-shot

Modular Joint

49 89 77 1

slide-47
SLIDE 47

Fast adapta*on

47

What if I don’t get a sketch at test *me?

???

slide-48
SLIDE 48

Fast adapta*on

48

25 50 75 100 Mul*task Adapta*on

Sketches Unsupervised

47 89 77 1 What if I don’t get a sketch at test *me?

slide-49
SLIDE 49

Fast adapta*on

49

25 50 75 100 Mul*task Adapta*on

47 89 76 42

Sketches Unsupervised

What if I don’t get a sketch at test *me?

slide-50
SLIDE 50

Conclusions

slide-51
SLIDE 51

A *ny bit of data goes a long way

51

Crafting environment make plank get wood use toolshed make stick get wood use workbench make cloth get grass use factory make rope get grass use toolshed make bridge get iron get wood use factory make bed∗ get wood use toolshed get grass use workbench make axe∗ get wood use workbench get iron use toolshed make shears get wood use workbench get iron use workbench get gold get iron get wood use factory use bridge get gem get wood use workbench get iron use toolshed use axe

slide-52
SLIDE 52

Crafting environment make plank get wood use toolshed make stick get wood use workbench make cloth get grass use factory make rope get grass use toolshed make bridge get iron get wood use factory make bed∗ get wood use toolshed get grass use workbench make axe∗ get wood use workbench get iron use toolshed make shears get wood use workbench get iron use workbench get gold get iron get wood use factory use bridge get gem get wood use workbench get iron use toolshed use axe

A *ny bit of data goes a long way

52

slide-53
SLIDE 53

Thank you!

https://github.com/jacobandreas/psketch