Modular mul*task reinforcement learning with policy sketches
Jacob Andreas, Sergey Levine and Dan Klein
Modular mul*task reinforcement learning with policy sketches Jacob - - PowerPoint PPT Presentation
Modular mul*task reinforcement learning with policy sketches Jacob Andreas, Sergey Levine and Dan Klein The learning problem make planks 2 The learning problem make planks make sticks 3 Learning from sketches get wood get wood use saw
Modular mul*task reinforcement learning with policy sketches
Jacob Andreas, Sergey Levine and Dan Klein
The learning problem
2
make planks
The learning problem
3
make planks make sticks
Learning from sketches
4
use saw get wood use axe get wood
The op*ons framework
5
The op*ons framework
6
+1
The op*ons framework
7
+1
The op*ons framework
8
[SuCon et al. 99, Bacon & Precup 16]
Learning from intermediate rewards
9
[Kearns & Singh 02, Kulkarni et al. 16]
r r
Learning from demonstra*ons
10
[Stolle & Precup 02, Fox & Krishnan et al. 16]
Ï
Learning from policy sketches
11
Ï get wood use saw
Why sketches?
12
Easy to collect Portable
Crafting environment make plank get wood use toolshed make stick get wood use workbench make cloth get grass use factory make rope get grass use toolshed make bridge get iron get wood make bed∗ get wood use toolshed make axe∗ get wood use workbench make shears get wood use workbench get gold get iron get wood get gem get wood use workbench
Learning from policy sketches
Learning from policy sketches
14
get wood use saw
make planks
Learning from policy sketches
15
get wood use axe
make sticks
Learning from policy sketches
16
get wood use saw get wood use axe
[e.g. Branavan et al. 09, Oh et al. 17, Hermann et al. 17]
Learning from policy sketches
17
get wood use saw get wood use axe
`
18
get wood use saw get wood use axe
`
19
get wood use saw get wood use axe
`
get wood
Policy representa*on
21
get wood
Policy representa*on
22
get wood
Policy representa*on
23
Policy representa*on
24
Policy representa*on
25
Policy representa*on
26
get wood
Policy search
27
∇ log π( | ) (rt - b)
tasks steps
ac*on state reward baseline
Policy search
28
∇ log π( | ) (rt - b)
tasks steps
get wood
Policy search
29
∇ log π( | ) (rt - b)
tasks steps
use axe
Policy search
30
∇ log π( | ) (rt - b)
tasks steps
SUBPOLICY
Reward .40
Improving policy search
31
Improving policy search
32
∇ log π( | ) (rt - b)
tasks steps
ac*on state reward baseline
∇ log π( | ) (rt - ) ∇ log π( | ) (rt - ) ∇ log π( | ) (rt - ) ∇ log π( | ) (rt - )
Improving policy search
33
∇ log π( | ) (rt - )
( )
use saw make planks
( )
use saw make nails
∇ log π( | ) (rt - )
( )
use axe make planks
( )
use axe make nails
∇ log π( | ) (rt - )
( )
get iron make planks
( )
get iron make nails
∇ log π( | ) (rt - )
( )
get wood make planks
( )
get wood make nails
Improving policy search
34
∇ log π( | ) (rt - )
tasks steps
SUBPOLICY
Reward .40
TASK
.89
Do sketches help?
The maze naviga*on task
36
The maze naviga*on task
37
The maze naviga*on task
38
x 106 episodes 1 2 3 Reward Unsupervised Sketches: modular Sketches: joint
The mini-crag task
39
The mini-crag task
40
The mini-crag task
41
x 106 episodes 1 2 3 Reward Unsupervised Sketches: modular Sketches: joint
The cliff-walking task
42
The cliff-walking task
43
x 108 *mesteps 1 2 3 log Reward Unsupervised Sketches: modular Sketches: joint
Zero-shot generaliza*on
44
What if I see a sketch I’ve never seen before?
get iron use axe
Zero-shot generaliza*on
45
What if I see a sketch I’ve never seen before?
25 50 75 100 Mul*task Zero-shot
Modular Joint
49 89 77 1
Zero-shot generaliza*on
46
What if I see a sketch I’ve never seen before?
25 50 75 100 Mul*task Zero-shot
Modular Joint
49 89 77 1
Fast adapta*on
47
What if I don’t get a sketch at test *me?
???
Fast adapta*on
48
25 50 75 100 Mul*task Adapta*on
Sketches Unsupervised
47 89 77 1 What if I don’t get a sketch at test *me?
Fast adapta*on
49
25 50 75 100 Mul*task Adapta*on
47 89 76 42
Sketches Unsupervised
What if I don’t get a sketch at test *me?
Conclusions
A *ny bit of data goes a long way
51
Crafting environment make plank get wood use toolshed make stick get wood use workbench make cloth get grass use factory make rope get grass use toolshed make bridge get iron get wood use factory make bed∗ get wood use toolshed get grass use workbench make axe∗ get wood use workbench get iron use toolshed make shears get wood use workbench get iron use workbench get gold get iron get wood use factory use bridge get gem get wood use workbench get iron use toolshed use axe
Crafting environment make plank get wood use toolshed make stick get wood use workbench make cloth get grass use factory make rope get grass use toolshed make bridge get iron get wood use factory make bed∗ get wood use toolshed get grass use workbench make axe∗ get wood use workbench get iron use toolshed make shears get wood use workbench get iron use workbench get gold get iron get wood use factory use bridge get gem get wood use workbench get iron use toolshed use axe
A *ny bit of data goes a long way
52
Thank you!
https://github.com/jacobandreas/psketch