Learning to Plan with Logical Automata Brandon Araki 1 *, Kiran - - PowerPoint PPT Presentation

learning to plan with logical automata
SMART_READER_LITE
LIVE PREVIEW

Learning to Plan with Logical Automata Brandon Araki 1 *, Kiran - - PowerPoint PPT Presentation

Learning to Plan with Logical Automata Brandon Araki 1 *, Kiran Vodrahalli 2 *, Thomas Leech 1,3 , Mark Donahue 3 , Cristian-Ioan Vasile 1 , Daniela Rus 1 1 Massachusetts Institute of Technology 2 Columbia University 3 MIT Lincoln Laboratory


slide-1
SLIDE 1

Learning to Plan with Logical Automata

Brandon Araki1*, Kiran Vodrahalli2*, Thomas Leech1,3, Mark Donahue3, Cristian-Ioan Vasile1, Daniela Rus1

1Massachusetts Institute of Technology 2Columbia University 3MIT Lincoln Laboratory

*Equal contributors

1

slide-2
SLIDE 2

2

slide-3
SLIDE 3

Goals

Learn to plan in an environment with rules

  • 1. Learn the rules in a way that they can be easily interpreted by humans
  • 2. Incorporate the rules into planning so that modifying the rules results in

predictable changes in behavior

3

slide-4
SLIDE 4

Packing a Lunchbox

Pack a burger or a sandwich; then pack a banana

4

slide-5
SLIDE 5

Goal 1 – Interpretability

Pack a burger or a sandwich; then pack a banana

5

Initial State Picked up

  • r

Packed

  • r

Picked up Packed

GOAL! Rules Finite State Automaton

slide-6
SLIDE 6

6

Low-level MDP High-level MDP

Pack sandwich or burger; Then pack banana Avoid obstacles

Factoring the Environment

slide-7
SLIDE 7

7

Discrete 2D gridworld Finite state automaton

Representing the Environment

Pack sandwich or burger; Then pack banana Avoid obstacles

Initial State Picked up

  • r

Packed

  • r

Picked up Packed

slide-8
SLIDE 8

Goal 2 – Manipulability

Incorporate FSA into planning

8

Initial State Picked up

  • r

Packed

  • r

Picked up Packed

S0 S1 S2 S3 G S0

  • Ø

S0 S1 S2 S3 G T

slide-9
SLIDE 9

Learn reward Learn transitions Learn transitions of FSA

9

One VIN for each FSA state

Differentiable Recursive Planning

Based on Tamar, Aviv, et al. "Value iteration networks." Advances in Neural Information Processing Systems. 2016.

slide-10
SLIDE 10

Experiments - Interpretability

10

Propositions FSA States

S0

  • Ø

S0 S1 S2 S3 G T

slide-11
SLIDE 11

S0

  • Ø

S0 S1 S2 S3 G T

Experiments - Interpretability

11

Picking up the sandwich or the hamburger causes a transition to the next state

slide-12
SLIDE 12

Experiments – Manipulability

We can modify the FSA so that it will only pick up the burger and not the sandwich.

12

Initial State Picked up

  • r

Packed

  • r

Picked up Packed

slide-13
SLIDE 13

Experiments – Manipulability

We can modify the FSA so that it will only pick up the burger and not the sandwich.

13

Initial State Picked up

  • r

Packed

  • r

Picked up Packed

slide-14
SLIDE 14

Experiments – Manipulability

We can modify the FSA so that it will only pick up the burger and not the sandwich.

14

S0

  • Ø

S0 S1 S2 S3 G T

slide-15
SLIDE 15

S0

  • Ø

S0 S1 S2 S3 G T

Experiments – Manipulability

We can modify the FSA so that it will only pick up the burger and not the sandwich.

15

slide-16
SLIDE 16

S0

  • Ø

S0 S1 S2 S3 G T

Experiments – Manipulability

We can modify the FSA so that it will only pick up the burger and not the sandwich.

16

slide-17
SLIDE 17

Learning to Plan with Logical Automata

Brandon Araki1*, Kiran Vodrahalli2*, Thomas Leech1,3, Mark Donahue3, Cristian-Ioan Vasile1, Daniela Rus1

1Massachusetts Institute of Technology 2Columbia University 3MIT Lincoln Laboratory

*Equal contributors

17