Target-driven Visual Navigation in Indoor Scenes Using Deep - - PowerPoint PPT Presentation

target driven visual navigation in indoor scenes using
SMART_READER_LITE
LIVE PREVIEW

Target-driven Visual Navigation in Indoor Scenes Using Deep - - PowerPoint PPT Presentation

REINFORCEMENT LEARNING FOR ROBOTICS Target-driven Visual Navigation in Indoor Scenes Using Deep Reinforcement Learning [Zhu et al. 2017] A. James E. Cagalawan james.cagalawan@gmail.com University of Waterloo June 27, 2018 (Note: Videos in


slide-1
SLIDE 1

Target-driven Visual Navigation in Indoor Scenes Using Deep Reinforcement Learning [Zhu et al. 2017]

  • A. James E. Cagalawan

james.cagalawan@gmail.com University of Waterloo

June 27, 2018

(Note: Videos in the slide deck used in presentation have been replaced with YouTube links).

REINFORCEMENT LEARNING FOR ROBOTICS

slide-2
SLIDE 2

TARGET-DRIVEN VISUAL NAVIGATION IN INDOOR SCENES USING DEEP REINFORCEMENT LEARNING – Paper Authors

Yuke Zhu1 Roozbeh Mottaghi2 Eric Kolve2 Joseph J. Lim1 Abhinav Gupta2,3 Li Fei-Fei1 Ali Farhadi2,4

1Stanford University 2Allen Institute for AI 3Carnegie Mellon University 4University of Washington

slide-3
SLIDE 3

MOTIVATION – Navigating the Grid World

Free Space: Occupied Space: Agent: Goal: Danger:

slide-4
SLIDE 4

MOTIVATION – Navigating the Grid World – Assign Numerical Rewards

Free Space: Occupied Space: Agent: Goal: Danger: Can assign numerical rewards to the goal and the dangerous grid cells.

+1

  • 100
slide-5
SLIDE 5

MOTIVATION – Navigating the Grid World – Learn a Policy

Free Space: Occupied Space: Agent: Goal: Danger: Can assign numerical rewards to the goal and the dangerous grid cells. Then learn a policy.

slide-6
SLIDE 6

MOTIVATION – Visual Navigation Applications: Treasure Hunting

  • Robotic visual navigation

using robots has many applications.

  • Treasure hunting.
slide-7
SLIDE 7

MOTIVATION – Visual Navigation Applications: Robot Soccer

  • Robotic visual navigation

using robots has many applications.

  • Treasure hunting.
  • Soccer robots getting to

the ball first.

slide-8
SLIDE 8

MOTIVATION – Visual Navigation Applications: Pizza Delivery

  • Robotic visual navigation

using robots has many applications.

  • Treasure hunting.
  • Soccer robots getting to

the ball first.

  • Drones delivering pizzas

to your lecture hall.

slide-9
SLIDE 9

MOTIVATION – Visual Navigation Applications: Self-Driving Cars

  • Robotic visual navigation

using robots has many applications.

  • Treasure hunting.
  • Soccer robots getting to

the ball first.

  • Drones delivering pizzas

to your lecture hall.

  • Autonomous cars driving

people to their homes.

slide-10
SLIDE 10

MOTIVATION – Visual Navigation Applications: Search and Rescue

  • Robotic visual navigation

using robots has many applications.

  • Treasure hunting.
  • Soccer robots getting to

the ball first.

  • Drones delivering pizzas

to your lecture hall.

  • Autonomous cars driving

people to their homes.

  • Search and rescue robots

finding missing people.

slide-11
SLIDE 11

MOTIVATION – Visual Navigation Applications: Domestic Robots

  • Robotic visual navigation

using robots has many applications.

  • Treasure hunting.
  • Soccer robots getting to

the ball first.

  • Drones delivering pizzas

to your lecture hall.

  • Autonomous cars driving

people to their homes.

  • Search and rescue robots

finding missing people.

  • Domestic robots

navigating their way around houses to help with chores.

slide-12
SLIDE 12

PROBLEM – Domestic Robot: Setup

  • Multiple locations in an

indoor scene that our robot must navigate to.

  • Actions consist of moving

forwards and backwards and turning left and right.

slide-13
SLIDE 13

PROBLEM – Domestic Robot: Problems

  • Multiple locations in an

indoor scene that our robot must navigate to.

  • Actions consist of moving

forwards and backwards and turning left and right.

  • Problem 1: Navigating to

multiple targets.

  • Problem 2: Using high-

dimensional visual inputs is challenging and time consuming to train.

  • Problem 3: Training on a

real robot is expensive.

slide-14
SLIDE 14

PROBLEM – Multi Target: Can’t We Just Use Q-Learning?

  • We can already navigate

grid mazes using Q- learning by assigning rewards for finding a target.

  • Assigning rewards to

multiple locations on the grid does not allow specification of different targets.

slide-15
SLIDE 15

PROBLEM – Multi Target: Can’t We Just Use Q-Learning? Let’s Try It

  • We can already navigate

grid mazes using Q- learning by assigning rewards for finding a target.

  • Assigning rewards to

multiple locations on the grid does not allow specification of different targets.

+1 +1 +1

slide-16
SLIDE 16

PROBLEM – Multi Target: Can’t We Just Use Q-Learning? Uh-oh

  • We can already navigate

grid mazes using Q- learning by assigning rewards for finding a target.

  • Assigning rewards to

multiple locations on the grid does not allow specification of different targets.

  • Would end up at a target,

but not any specific target.

+1 +1 +1

slide-17
SLIDE 17

PROBLEM – Multi Target: A Policy for Every Target

  • We can already navigate

grid mazes using Q- learning by assigning rewards for finding a target.

  • Assigning rewards to

multiple locations on the grid does not allow specification of different targets.

  • Would end up at a target,

but not any specific target.

  • Could train multiple

policies, but that wouldn’t scale with the number of targets.

slide-18
SLIDE 18

PROBLEM – From Navigation to Visual Navigation

Image Action

Sensor Image Target Image Step and Turn

slide-19
SLIDE 19

Image Planning World Modelling Perception Action

PROBLEM – Visual Navigation Decomposition: Overview

“Autonomous Mobile Robots” by Roland Siegwart and Illah R. Nourbakhsh (2011)

The visual navigation problem can be broken up into pieces with specialized algorithms solving each piece.

slide-20
SLIDE 20

PROBLEM – Visual Navigation Decomposition: Image

Image Planning World Modelling Perception Action

slide-21
SLIDE 21

PROBLEM – Visual Navigation Decomposition: Perception – Localization and Mapping

Image Planning World Modelling Perception Action

slide-22
SLIDE 22

PROBLEM – Visual Navigation Decomposition: Perception – Object Detection

Image Planning World Modelling Perception Action

slide-23
SLIDE 23

PROBLEM – Visual Navigation Decomposition: World Modelling

Image Planning World Modelling Perception Action

slide-24
SLIDE 24

PROBLEM – Visual Navigation Decomposition: Planning – Choosing the End Position

Image Planning World Modelling Perception Action

slide-25
SLIDE 25

PROBLEM – Visual Navigation Decomposition: Planning – Searching for a Path

Image Planning World Modelling Perception Action

slide-26
SLIDE 26

PROBLEM – Visual Navigation Decomposition: Action

Image Planning World Modelling Perception Action

slide-27
SLIDE 27

PROBLEM – Visual Navigation Decomposition: Result

Image Planning World Modelling Perception Action

This decomposition is effective but each step requires a different algorithm.

slide-28
SLIDE 28

PROBLEM – Visual Navigation with Reinforcement Learning

Design a deep reinforcement learning architecture to handle visual navigation from raw pixels.

Image Reinforcement Learning Action

slide-29
SLIDE 29

PROBLEM – Robot Learning: Reinforcement Learning with a Robot

Image Reinforcement Learning Action

slide-30
SLIDE 30

PROBLEM – Robot Learning: Data Efficiency and Transfer Learning

Idea: Train in simulation first, then fine tune learned policy on real robot.

Image Reinforcement Learning Action

slide-31
SLIDE 31

PROBLEM – Robot Learning: Goal

Design a deep reinforcement learning architecture to handle visual navigation from raw pixels with high data-efficiency.

Image Reinforcement Learning Action

slide-32
SLIDE 32

IMPLEMENTATION DETAILS – Architecture Overview

  • Similar to actor-critic A3C

method which outputs a policy and value running multiple threads.

  • Train a different target on

each thread rather than copies of same target.

  • Use fixed ResNet-50

pretrained on ImageNet to generate embedding for

  • bservation and target.
  • Fuse the embeddings into

a feature vector to get an action and a value.

slide-33
SLIDE 33

IMPLEMENTATION DETAILS – AI2-THOR Simulation Environment

  • The House Of

inteRactions (THOR)

  • 32 scenes of household

environments.

  • Can freely move around a

3D environment.

  • High fidelity compared to
  • ther simulators.

AI2-THOR Virtual KITTI Synthia

Video: https://youtu.be/SmBxMDiOrvs?t=18

slide-34
SLIDE 34

IMPLEMENTATION DETAILS – Handling Multiple Scenes

  • Single layer for all scenes

might not generalize as well.

  • Train a different set of

weights for every different scene in a vine-like model.

  • Discretized the scenes

with constant step length

  • f 0.5 m and turning angle
  • f 90°, effectively a grid

when training and running for simplicity.

slide-35
SLIDE 35

EXPERIMENTS – Navigation Results: Performance Of Our Method Against Baselines

TABLE I

slide-36
SLIDE 36

EXPERIMENTS – Navigation Results: Data Efficiency – Target-driven Method is More Data Efficient

slide-37
SLIDE 37

EXPERIMENTS – Navigation Results: t-SNE Embedding – It Learned an Implicit Map of the Environment

Bird’s eye view t-SNE Visualization of Embedding Vector

slide-38
SLIDE 38

EXPERIMENTS – Generalization Across Targets: The Method Generalizes Across Targets

slide-39
SLIDE 39

EXPERIMENTS – Generalization Across Scenes: Training On More Scenes Reduces Trajectory Length Over Training Frames

slide-40
SLIDE 40

EXPERIMENTS – Method Works In Continuous World

  • The method was evaluated

with continuous action spaces.

  • Robot could now collide

with things and its movement had noise no longer always aligning with a grid.

  • Result: Required 50M

more training frames to train on a single target.

  • Could reach door in

average of 15 steps.

  • Random agent took

719 steps.

Video: https://youtu.be/SmBxMDiOrvs?t=169

slide-41
SLIDE 41

EXPERIMENTS – Robot Experiment: Video

Video: https://youtu.be/SmBxMDiOrvs?t=180

slide-42
SLIDE 42

EXPERIMENTS – Summary of Contributions and Results

  • Designed new deep

reinforcement learning architecture using siamese network with scene- specific layers.

  • Generalizes to

multiple scenes and targets.

  • Works with

continuous actions.

  • The trained policy in

simulation can be run in the real world.

Video: https://youtu.be/SmBxMDiOrvs?t=98

slide-43
SLIDE 43

CONCLUSIONS – Future Work

  • Physical interaction

and object manipulation.

  • Situation: Moving
  • bstacles out of path.
  • Situation: Opening

containers for objects.

  • Situation: Turning on

the lights when the robot enters a dark room and can’t see.

slide-44
SLIDE 44

CREDIT

Thanks to “Twemoji” from Twitter used under license CC-BY 4.0.

  • “School” image changed to carry University of Waterloo logo.
  • “Quadcopter” created from “Helicopter” and “Robot Face” images.
  • “Goose” created from “Duck” image.
  • “Red Robot” created from “Robot” image.
  • “Dumbbell” created from “Nuts and Bolt” image.

Thank You

slide-45
SLIDE 45

RECAP

  • Motivation
  • Navigating the Grid World, Assign Numerical Rewards, Extract a Policy
  • Applications: Treasure Hunting, Robot Soccer, Pizza Delivery, Self-Driving Cars, Search and Rescue,

Domestic Robots

  • Problem
  • Domestic Robot
  • Multi Target: Can’t We Just Use Q-Learning?
  • From Navigation to Visual Navigation
  • Visual Navigation Decomposition: Image, Perception, World Modelling, Planning, Action, Results
  • Goal of Visual Navigation
  • Robot Learning Considerations
  • Neural Network Architecture
  • Embedding of Scene and Target using ResNet
  • Siamese Network
  • Scene-Specific Layers
  • AI2-THOR Simulator
  • 32 household scenes with interactions, Trained on discretized representation with no interaction required
  • Experiments
  • Shorter Average Trajectory Length, More Data Efficient, Training on More Targets Improves Success,

Training on More Scenes Reduces Trajectory Length, Works in Continuous World, Transfer Learning Works

  • Future Work
  • More Sophisticated Interaction