Capsule Architectures Sara Sabour Google Brain, University of - - PowerPoint PPT Presentation

capsule architectures
SMART_READER_LITE
LIVE PREVIEW

Capsule Architectures Sara Sabour Google Brain, University of - - PowerPoint PPT Presentation

Neural Architects Workshop 28th October, ICCV 2019 Capsule Architectures Sara Sabour Google Brain, University of Toronto Joint work with Geoff Hinton @Google brain Nicholas Frosst @Google brain Adam Kosiorek @Oxford University


slide-1
SLIDE 1

Capsule Architectures

Sara Sabour

Google Brain, University of Toronto

Neural Architects Workshop 28th October, ICCV 2019

slide-2
SLIDE 2

Joint work with

  • Geoff Hinton

@Google brain

  • Nicholas Frosst

@Google brain

  • Adam Kosiorek

@Oxford University

  • Yee Whye Teh

@Oxford & Deepmind

slide-3
SLIDE 3

Idea Why How

Agreement Viewpoint Iterative algorithm

  • r

Optimization

slide-4
SLIDE 4

Idea 101: Agreement and Capsules

4

slide-5
SLIDE 5

Close look at a typical non-linearity

5

1. Each neuron is multiplied by a trainable parameter.

slide-6
SLIDE 6

Close look at a typical non-linearity

6

1. Each neuron is multiplied by a trainable parameter.

slide-7
SLIDE 7

Close look at a typical non-linearity

7

1. Each neuron is multiplied by a trainable parameter. 2. The incoming votes are summed.

slide-8
SLIDE 8

Close look at a typical non-linearity

8

1. Each neuron is multiplied by a trainable parameter. 2. The incoming votes are summed. 3. A nonlinearity (ReLU) is applied where a higher sum means more activated.

slide-9
SLIDE 9

Close look at a typical non-linearity

9

1 1 2 1 1 10 1 2 3 4 1 2 2 2 2 Consider these three cases: 1. Each neuron is multiplied by a trainable parameter. 2. The incoming votes are summed. 3. A nonlinearity (ReLU) is applied where a higher sum means more activated.

slide-10
SLIDE 10

Close look at a typical non-linearity

10

1 1 2 1 1 10 1 2 3 4 1 2 2 2 2 Consider these three cases:

Dictatorship Support comes from a confident shouter!

1. Each neuron is multiplied by a trainable parameter. 2. The incoming votes are summed. 3. A nonlinearity (ReLU) is applied where a higher sum means more activated. 10 1 2 3 4 1 2 2 2 2

8 9 20 SUM

slide-11
SLIDE 11

Agreement Invariance

11

1. Each neuron is multiplied by a trainable parameter. 2. Do they agree with each

  • ther.

1 1 2 1 1 10 1 2 3 4 1 2 2 2 2

Democracy Support comes from coordinated mass!

1 1 Agree?

SUM + ReLU -------------> Count

5 5 10 5 5 50 5 10 15 20 5 10 10 10 10

slide-12
SLIDE 12

Agreement, enhanced Invariance Equivarience

12

1. Each neuron is multiplied by a trainable parameter. 2. Do they agree with each

  • ther.

3. What are they agreeing upon. 1 1 2 1 1 10 1 2 3 4 1 2 2 2 2

No loss of information!

If 5 is multiplied to everything, what they are agreeing upon will be multiplied by 5.

1 1 Agree? 1 2 On?

slide-13
SLIDE 13

Agreement, what we get? Invariance Equivarience

13

1. Each neuron is multiplied by a trainable parameter. 2. Do they agree with each

  • ther.

3. What are they agreeing upon. 1 1 2 1 1 10 1 2 3 4 1 2 2 2 2

1 1 Agree? 1 2 On?

Training with this non-linearity

  • Counting: Non-differentiable
  • Similarity function: differentiable
slide-14
SLIDE 14

Multi Dimension Enhanced Agreement Stronger Invariance Stronger Equivarience

14

(1,0) (1,1) (2,8) (1,1) (1,2) (1,0) (2,1) (2,1) (2,1) (2,1) (10,0) (1,1) (2,2) (3,3) (4,4)

Stronger and more robust agreement finding.

1. Each neuron is multiplied by a trainable parameter. 2. Do they agree with each

  • ther.

3. What are they agreeing upon.

slide-15
SLIDE 15

Recap

15

  • Base idea

Agreement non-linearity How many are the same rather than who is larger

  • Enhancements

○ Presence + Value ○ Multi-Dimensional Value

(1,0) (1,1) (2,8) (1,1) (1,2) (1,0) (2,1) (2,1) (2,1) (2,1) (10,0) (1,1) (2,2) (3,3) (4,4)

New neurons: Capsules

slide-16
SLIDE 16

Recap: Capsules

16

  • Base idea

Agreement non-linearity How many are the same rather than who is larger

  • Enhancements

○ Presence + Value ○ Multi-Dimensional Value

(1,0) (1,1) (2,8) (1,1) (1,2) (1,0) (2,1) (2,1) (2,1) (2,1) (10,0) (1,1) (2,2) (3,3) (4,4) A network of Capsules

  • Each capsule has whether it is present and

how it is present.

  • Each capsule gets activated if incoming

votes agree.

slide-17
SLIDE 17

Use Case: Computer Vision

17

slide-18
SLIDE 18

Which one is a house?

slide-19
SLIDE 19

Which one is a house?

1. Both the parts should exist. ○ Image 1 is not a house. 2. How the roof and the walls exist should match a common house. ○ Image 2 & 3 are not houses.

1 2 3

slide-20
SLIDE 20

What stays constant?

The relation between a part and the whole stays constant. Camera Coordinate Frame

slide-21
SLIDE 21

What stays constant?

The relation between a part and the whole stays constant: Between the Roof arrows and the House arrows. Camera coordinate Frame

slide-22
SLIDE 22

What stays constant?

The relation between a part and the whole stays constant: Between the Roof arrows and the House arrows.

Given the Roof arrow transformation,

  • utput the House arrow transformations
slide-23
SLIDE 23

What stays constant?

The relation between a part and the whole stays constant: Between the Wall arrows and the House arrows. Given the Wall arrow T,

  • utput the House arrow T
slide-24
SLIDE 24

Recap

Input to the layer: How to transform the Camera arrows Into Roof and Wall arrows. Output of the layer: How to transform the Camera arrows Into House arrows. What we learn: How to transform the transformations.

slide-25
SLIDE 25

What stays constant?

The relation between a part and the whole stays constant: Between the part arrows and the House arrows. Compare the House arrow predictions.

slide-26
SLIDE 26

Network of Capsules for Computer Vision

26

Each Capsule represents a part or an

  • bject.

○ The presence of a capsule represents whether that entity exists in the image. ○ The value of a capsule carries the spatial position of how that entity

  • exists. I.e. the transformation

between the coordinate frame of camera and the entity. ○ The trainable parameter between two capsules is the transformation between their coordinate frame transformations as a part and a whole. (1,0) (1,1) (2,8) (1,1) (1,2) (5,2) (2,3) (2,2) (3,3) (3,2) (1,1)

slide-27
SLIDE 27

Capsule Network

Same trained transformation works for all viewpoints of input. ○ Input is transformed and so the value of the output capsule is transformed accordingly. Value is viewpoint equivariant. ○ The agreement of parts would not change. Presence is viewpoint invariant.

27

(1,0) (1,1) (2,8) (1,1) (1,2) (5,2) (2,3) (2,2) (3,3) (3,2) (1,1)

slide-28
SLIDE 28

How: Iterative routing

28

slide-29
SLIDE 29

EM routing for Gaussian Capsules

Geoff Hinton Nick Frosst Matrix Capsules with EM routing

2D capsules ○ Position shows their 2D value ○ Radius shows their presence ○ What is the value and presence

  • f next layer capsules?

Layer L Layer L+1

slide-30
SLIDE 30

Geoff Hinton Nick Frosst Matrix Capsules with EM routing

Transform

Transform Transform

Is there any Agreement?

slide-31
SLIDE 31

Geoff Hinton Nick Frosst Matrix Capsules with EM routing

Agreement (M step)

Euclidean Distance Find the clusters Expectation Maximization for fitting Mixture of Gaussians.

slide-32
SLIDE 32

Geoff Hinton Nick Frosst Matrix Capsules with EM routing

Agreement (M step)

Transform

Euclidean Distance

slide-33
SLIDE 33

Geoff Hinton Nick Frosst Matrix Capsules with EM routing

Assignment (E step)

Transform

slide-34
SLIDE 34

Geoff Hinton Nick Frosst Matrix Capsules with EM routing

Agreement (M step)

Transform

slide-35
SLIDE 35

Geoff Hinton Nick Frosst Matrix Capsules with EM routing

Agreement (M step)

Transform

slide-36
SLIDE 36

Routing in action

Iteration 1 Iteration 2 Iteration 3

slide-37
SLIDE 37

Train Test CNN vs Capsule 20% 13.5% 17.8% 12.3% Azimuth Elevation Test error %

Viewpoint generalization

Code available at: https://github.com/google-research/google-research/tre e/master/capsule_em

slide-38
SLIDE 38

W

Agreement Finding

Iterative Routing

  • Opt-Caps & SVD-Caps [1, 2]
  • G-Caps & SOVNET [3, 4]

○ Explicit group equivarience

  • EncapNet [5]

○ Sinkhorn iteration

[1]: Dilin Wang and Qiang Liu. An optimization view on dynamic routing between capsules. 2018. [2]: Mohammad Taha Bahadori. Spectral capsule networks. 2018 [3]: Jan Eric Lenssen, Matthias Fey, and Pascal Libuschewski. Group equivariant capsule networks, NIPS 2018 [4]: Anonymous ICLR 2020 submission. [5]: Hongyang Li, Xiaoyang Guo, Bo Dai, Wanli Ouyang, and Xiaogang Wang. Neural network encapsulation. ECCV, 2018.

slide-39
SLIDE 39

Can we learn a neural network to do the clustering rather than running explicit clustering algorithm?

39

slide-40
SLIDE 40

Neural Network

Learn a cluster finder

Previously: Now: It should still be true:

slide-41
SLIDE 41

Neural Network X X X X X X X X X X Linear Transform

Optimize mixture model log-likelihood.

Learn a cluster finder

  • Each Layer is an autoencoder with

a single linear decoder.

  • A whole capsule gives predictions

for its part capsules.

slide-42
SLIDE 42

Part Capsule Autoencoder Object Capsule Autoencoder

Stacked Capsule Autoencoder

42 predict

  • bjects

infer parts presence & values templates (learned) reassemble image likelihood part likelihood part is explained as a mixture of object predictions

Unsupervised!

Adam Kosiorek et al, Neurips 2019.

slide-43
SLIDE 43

SCAE on MNIST Unsupervised

Train with 24 object capsules. Cluster -> 98.7% Accuracy. No Image Augmentation. TSNE of Capsule Presences:

slide-44
SLIDE 44

MNIST: Part Capsules

44

rec learned templates affine-transformed templates

part caps rec

  • bj caps rec
  • verlap
slide-45
SLIDE 45

45

Error:

  • Best: 2.8%
  • Average: 4.0%
  • Baseline: 26.0%
  • Two squares and a triangle
  • Patterns might be absent
  • Visualizing the mixture model

assignments.

Finding Constellations

slide-46
SLIDE 46

Discussion & Future Work

  • Introduced Capsule Networks with agreement.
  • Capsule Networks can model viewpoint more efficiently.

○ Better viewpoint generalization. ○ Better unsupervised training.

  • Future directions

○ The background. ○ The texture.

46

slide-47
SLIDE 47

Questions