Scene Navigation by Knowledge Graph and Interaction Mohammad - PowerPoint PPT Presentation
Scene Navigation by Knowledge Graph and Interaction Mohammad Rastegari ICCV, Oct, 2019 Task Navigate to Television Television Television Television Television Move Move Rotate Done Forward Forward Right 120 Scenes Room
Scene Navigation by Knowledge Graph and Interaction Mohammad Rastegari ICCV, Oct, 2019
Task Navigate to Television … Television Television Television Television Move Move Rotate Done Forward Forward Right
• 120 Scenes • Room types • Kitchen • Living room • Bed room • Bath room • Each room class has 30 scenes • Training : 20 rooms/class • Testing: 5 rooms/class
Challenges • Normally we relocate a seen object in a seen scene • The main challenges are: • Generalizing to unseen scene • Generalizing to unseen object
Using Prior Knowledge Apple Coffee machine Cup Mango
Knowledge Graph
Scene Prior Plate Table Sand- Sink wich next to/on on Painting Remote Coffe Cabinet Machine TV Mug Bowl Table next to next to Cabinet Counter Micro- Laptop wave Box Toaster
Scene Prior Graph Remote n e x t t o Television
Architecture Flow History frames ! " Actor-Critic Model Environment ResNet-50 FC (512) # " Value Word MLP “ Television ” Embedding Action Sampler Policy Remote FC (512) Graph n Convolutional e x t Network t o FC (512) Joint Television Embedding
Architecture Flow with Scene Prior Graph History frames ! " Actor-Critic Model Environment ResNet-50 FC (512) # " Value Word MLP “ Television ” Embedding Action Sampler Policy Remote FC (512) Graph n Convolutional e x t Network t o FC (512) Joint Television Embedding
Architecture Flow with Scene Prior Graph History frames ! " Actor-Critic Model Environment ResNet-50 FC (512) # " Value Word MLP “ Television ” Embedding Action Sampler Policy Remote FC (512) Graph n Convolutional e x t Network t o FC (512) Joint Television Embedding
Graph Convolutional Network (GCN) H ( l +1) = f ( b AH ( l ) W ( l ) ) f ( b : Normalized Adjacency Matrix AH : Node features at the l th layer b AH ( l ) l ) W ( l ) ) : Learnable parameters at the l th Layer : Activation Function (e.g. ReLU) f
GCN for Scene Navigation * + “Fridge” $% ) ' ) ) $% & ' & ) !( # !( # FC (512) 1000 class score ResNet-50 … 512 512 “Toaster” concat 3 Layers The knowledge graph is updated over time according to the recent observations
Action Space • Move Ahead • Move Back • Rotate Right • Rotate Left • Stop We consider the stop action and expect the agent to issue this action when it reaches the target. This makes the learning challenging.
Seen Scenes, No Novel Objects
Bedroom | Mi Mirr rror or
Livingroom | Pa Painting
Kitchen | To Toaster
Kitchen | Mi Microwave
een Scenes, Known Objects Un Unseen
Bathroom | Soa Soap
Bedroom | La Lamp mp
Bedroom | Li Light S Switch ch
Kitchen | Ca Cabinet
een Scenes, No Novel Objects Un Unseen
Bathroom | To Towel
Kitchen | Mi Microwave
Evaluation Metrics • S uccess R ate (SR) • The ratio of successful navigations toward the object over N episodes • S uccess weighted by P ath L ength (SPL) • The ratio of successful navigations toward the object weighted by the path length over N episodes considering both Success Rate and P N as 1 L i i =1 S i max ( P i ,L i ) , N episode i , P represents
(SPL / SR) without STOP action (250 episods) Kitchen Living room Bedroom Bathroom Avg. Random 17.9 / 33.1 12.1 / 30.5 16.8 / 51.2 24.5 / 34.6 17.8 / 37.3 Seen scenes, A3C 79.9 / 86.7 38.8 / 57.6 87.8 / 89.5 93.7 / 96.6 75.0 / 82.5 Known objects Ours 83.5 / 88.2 46.4 / 64.4 90.6 / 92.7 93.6 / 96.5 78.5 / 85.5 Random 10.0 / 23.1 8.0 / 18.5 17.3 / 35.2 11.2 / 32.2 11.6 / 27.2 Seen scenes, A3C 20.2 / 38.8 24.2 / 46.5 23.5 / 35.8 50.2 / 74.6 29.5 / 48.9 Novel objects Ours 22.9 / 53.6 39.5 / 66.5 26.1 / 38.9 50.5 / 78.6 34.7 / 59.4 Random 27.3 / 45.2 5.6 / 16.6 13.1 / 34.5 36.0 / 49.1 20.5 / 36.3 Unseen scenes, A3C 39.5 / 56.2 12.0 / 31.8 22.5 / 49.2 47.4 / 60.2 30.3 / 49.3 Known objects Ours 46.2 / 62.5 13.8 / 40.6 26.5 / 58.6 51.5 / 65.8 34.5 / 56.9 Random 21.3 / 44.3 3.3 / 22.9 25.8 / 47.8 25.5 / 48.9 19.0 / 41.0 Unseen scenes, A3C 26.1 / 56.3 9.4 / 25.1 28.2 / 54.0 33.8 / 90.7 24.4 / 56.5 Novel objects Ours 38.5 / 62.5 13.7 / 40.3 30.1 / 63.1 39.2 / 93.6 30.4 / 64.9 Table 2: Results without termination (stop) action. SPL / Success rate ( ) is shown. We compare
(SPL / SR) with STOP action Kitchen Living room Bedroom Bathroom Avg. Random 2.4 / 3.5 1.1 / 1.7 1.8 / 2.7 3.2 / 4.8 2.1 / 3.1 Seen scenes, A3C 38.5 / 51.0 9.7 / 15.1 6.8 / 11.5 69.1 / 81.0 31.1 / 39.6 Known objects Ours 58.6 / 72.7 12.4 / 18.6 41.6 / 52.4 71.3 / 83.0 46.0 / 56.7 Random 0.9 / 1.3 0.8 / 1.2 2.3 / 3.4 1.4 / 2.1 1.4 / 2.0 Seen scenes, A3C 2.1 / 4.9 3.2 / 4.8 0.5 / 1.7 17.1 / 28.5 5.7 / 9.9 Novel objects Ours 3.2 / 6.1 9.8 / 16.2 6.2 / 8.6 24.7 / 37.3 11.0 / 17.1 Unseen scenes, Random 4.1 / 5.9 0.9 / 1.3 1.6 / 2.4 4.2 / 6.2 2.7 / 3.9 A3C 11.5 / 18.8 0.5 / 2.5 2.2 / 3.8 8.6 / 18.7 5.7 / 10.4 Known objects Ours 12.7 / 20.5 1.0 / 4.0 4.5 / 11.0 8.7 / 21.1 6.7 / 13.4 Random 2.0 / 2.8 0.6 / 1.0 2.0 / 2.8 2.7 / 3.9 1.8 / 2.6 Unseen scenes, A3C 2.2 / 7.5 2.5 / 4.4 1.3 / 4.4 3.4 / 9.3 2.4 / 5.9 Novel objects 3.3 / 12.7 2.8 / 5.3 2.0 / 6.3 4.1 / 12.2 3.1 / 8.5 Ours able 1: Results using termination (stop) action. SPL / Success rate ( ) is shown. We compare
Traditional Training Learning to Adapt Adaptation During Traditional Inference Inference
Initial Model Parameters Compute Adapted Parameters Initialize Model Complete Navigation Episode Take k steps Compute Compute Self- Supervised Supervised Navigation Loss Interaction Loss Backprop to Update Initialization
Learning to Learn Inference how to Learn Navigation Gradient (supervised) Learned Interaction Gradient (self-supervised)
Initial Model Parameters Compute Adapted Parameters Initialize Model Complete Navigation Episode Take k steps Compute Compute Self- Supervised Compute Self- Loss Supervised Navigation Loss Supervised Parameters Interaction Loss Interaction Loss via Neural Network
Navigation-Gradient (Training only) Forward Pass Interaction-Gradient (Training and Inference) 1D Temporal ResNet18 (Frozen) Conv Current Turn Look Move observation Image Down Forward Left Pointwise Feature Conv … 0 1 2 $ Pointwise Conv ,/×.×. ()*×.×. LSTM LSTM LSTM Target Glove Embedding Object Class Tile Laptop FC Concatenated 1 ×"## ,/×.×. policy and $ = # hidden states $ = ) &×(()* + ,) $ = *
Re Results Handcrafted Loss Handcrafted Loss Learned Loss Learned Loss Baseline Baseline SPL Success Training Scenes: 80 Validation Scenes: 20 Test Scenes: 20 Equal Split of Kitchen, Living Room, Bedroom, Bathroom
Goal: Navigate to Book
Thank you !!!!!
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.