Capsule Networks - An Overview Luca Dombetzki July 13, 2018 - - PowerPoint PPT Presentation

capsule networks an overview
SMART_READER_LITE
LIVE PREVIEW

Capsule Networks - An Overview Luca Dombetzki July 13, 2018 - - PowerPoint PPT Presentation

Chair of Network Architectures and Services Department of Informatics Technical University of Munich Capsule Networks - An Overview Luca Dombetzki July 13, 2018 Advisor: Marton Kajo Chair of Network Architectures and Services Department of


slide-1
SLIDE 1

Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Capsule Networks - An Overview

Luca Dombetzki

July 13, 2018 Advisor: Marton Kajo Chair of Network Architectures and Services Department of Informatics Technical University of Munich

slide-2
SLIDE 2

Overview

Introduction Convolutional Neural Networks Capsule Networks Discussion Conclusion Bibliography Appendix

  • L. Dombetzki — Capsule Networks

2

slide-3
SLIDE 3

Introduction

Introduction Convolutional Neural Networks Capsule Networks Discussion Conclusion Bibliography Appendix

  • L. Dombetzki — Capsule Networks

3

slide-4
SLIDE 4

Introduction

Motivation

Figure 1: figure from [12]

Both images are seen as "face" by a typical Convolutional Neural Network ⇒ Capsule Networks

  • L. Dombetzki — Capsule Networks

4

slide-5
SLIDE 5

Introduction

Where does AI come from?

Figure 2: A neuron as part of a Multi Layer Neural Network [21]

Designed after human brain

  • Advancement in modeling with math
  • Performance gains with GPUs
  • Deep Learning - leverage both

BUT not like human brain anymore

  • Blackbox system
  • Requires huge amounts of data
  • Very probabilistic
  • L. Dombetzki — Capsule Networks

5

slide-6
SLIDE 6

Introduction

Who is Geoffrey E. Hinton? “The pooling operation used in convolutional neural networks is a big mistake and the fact that it works so well is a disaster.” - Geoffrey E. Hinton (2014) [7]

  • Professor at Toronto University
  • Working at Google Brain
  • Major advancements in AI [13]
  • Research on Capsule Networks:
  • Based on biological research
  • Understanding Human vision (1981) [9]
  • Talks explaining his motivation [8]
  • Dynamic Routing Between Capsules (2017) [19]
  • Matrix Capsules with EM-Routing (2018) [6]

Figure 3: Geoffrey E. Hinton [24]

  • L. Dombetzki — Capsule Networks

6

slide-7
SLIDE 7

Convolutional Neural Networks

Introduction Convolutional Neural Networks Capsule Networks Discussion Conclusion Bibliography Appendix

  • L. Dombetzki — Capsule Networks

7

slide-8
SLIDE 8

Convolutional Neural Networks

What are CNNs?

Figure 4: Typcial architecture of a CNN [16]

  • L. Dombetzki — Capsule Networks

8

slide-9
SLIDE 9

Convolutional Neural Networks

Convolution and kernels

Figure 5: Convolution operation [11]

  • L. Dombetzki — Capsule Networks

9

slide-10
SLIDE 10

Convolutional Neural Networks

Activation functions

Figure 6: Sigmoid and Rectified Linear Unit (ReLU) [20] Σ σ +1 x1 x2 x3 xn w0 w1 w2 w3 wn σ

  • w0 +

n

  • i=1

wixi

  • Figure 7: A single neuron [21]
  • L. Dombetzki — Capsule Networks

10

slide-11
SLIDE 11

Convolutional Neural Networks

Pooling as a form of routing Routing

  • find important nodes (inputs)
  • group together
  • give to next layer

Pooling

  • reduces input data
  • next layer can “see” more than

the previous

  • enables detecting full objects

through locational invariance

  • static routing

Figure 8: Max pooling example [2]

  • L. Dombetzki — Capsule Networks

11

slide-12
SLIDE 12

Convolutional Neural Networks

How CNNs see the world

Figure 9: Feature detections of a CNN [15]

  • L. Dombetzki — Capsule Networks

12

slide-13
SLIDE 13

Convolutional Neural Networks

Problems of pooling

Figure 10: Distorted face from [12]

Geoffrey E. Hinton’s arguments against pooling [8]

  • Unnatural
  • No use of the linear structure of vision
  • Static instead of dynamic routing
  • Invariance instead of Equivariance
  • L. Dombetzki — Capsule Networks

13

slide-14
SLIDE 14

Convolutional Neural Networks

What does a neuron represent?

Figure 11: Face detection with a CNN, from [10]

  • L. Dombetzki — Capsule Networks

14

slide-15
SLIDE 15

Capsule Networks

Introduction Convolutional Neural Networks Capsule Networks Discussion Conclusion Bibliography Appendix

  • L. Dombetzki — Capsule Networks

15

slide-16
SLIDE 16

Capsule Networks

Hinton’s idea

Figure 12: Hierarchical modeling in Computer Graphics [5]

Build a network to perform inverse graphics

  • propagate probability and pose of features
  • dynamic routing based on pose information
  • introduce concept of an entity into the network’s architecture

⇒ The capsule

  • L. Dombetzki — Capsule Networks

16

slide-17
SLIDE 17

Capsule Networks

An abstract view on capsules

Figure 13: Capsule face detection, from [10]

  • L. Dombetzki — Capsule Networks

17

slide-18
SLIDE 18

Capsule Networks

The capsule - a group of neurons Before After layer of neurons layer of neuron groups input = n values, output = value input = n vectors, output = vector

  • A capsule learns parameters (skew, scale, rotation, etc)
  • n-dimensional capsule = n-dimensional vectorout

⇒ n parameters ˆ = pose

  • probability = ||vectorout||
  • L. Dombetzki — Capsule Networks

18

slide-19
SLIDE 19

Capsule Networks

Architecture - The CapsNet

Figure 14: Capsule Network Architecture as described in [19]

Layer Function Conv1 Convolutional layer PrimaryCaps Convolutional squashing capsules DigitCaps Normal (digit) capsules Class predictions Length of each DigitCapsule

  • L. Dombetzki — Capsule Networks

19

slide-20
SLIDE 20

Capsule Networks

Routing-by-agreement - the idea

Figure 15: capsule agreement [4]

  • L. Dombetzki — Capsule Networks

20

slide-21
SLIDE 21

Capsule Networks

Routing by agreement Phenomenon “coincidence filtering”

  • high dimensional pose-parameter-space
  • similar poses by chance very unlikely (curse of dimensionality)

Clustering the inputs based on their pose: repeat n times:

  • 1. find the mean vector of the cluster
  • 2. weighs all inputs based on their

distance to this mean

  • 3. normalize the weights

Figure 16: weighted clustering [4]

  • L. Dombetzki — Capsule Networks

21

slide-22
SLIDE 22

Capsule Networks

How to train the network Margin Loss Reconstruction (Decoder) network

Figure 17: Capsule Network architectures [19]

Goal Lossfunction Learning Parameter learning Reconstruction loss Unsupervised Classification Margin loss Supervised Reconstruction loss

  • reconstruct digit by

masking the active capsule Margin loss

  • detection: ||v|| ≥ 0.9
  • no detection: ||v|| ≤ 0.1
  • L. Dombetzki — Capsule Networks

22

slide-23
SLIDE 23

Capsule Networks

How does it perform? - Parameter Effects Scale and thickness Localized part Stroke thick- ness Localized skew Width and translation Localized part

Figure 18: Effects of capsule parameters on reconstruction [19]

  • L. Dombetzki — Capsule Networks

23

slide-24
SLIDE 24

Capsule Networks

How does it perform? - MultiMNIST R:(6, 0) R:(6, 8) R:(7, 1) R:(8, 7) R:(9, 4) R:(9, 5) R:(8, 4) L:(6, 0) L:(6, 8) L:(7, 1) L:(8, 7) L:(9, 4) L:(9, 5) L:(8, 4) Routing Rec.Loss MNIST (%) MultiMNIST (%) CNN

  • 0.39

8.1 CapsNet 1 no 0.34±0.032

  • CapsNet

1 yes 0.29±0.011 7.5 CapsNet 3 no 0.35±0.036

  • CapsNet

3 yes 0.25±0.005 5.2

Figure 19: Cpasule Network results on MultiMNIST [19]

  • L. Dombetzki — Capsule Networks

24

slide-25
SLIDE 25

Capsule Networks

How does it perform? - MultiMNIST Network was forced to reconstruct false predictions *R:(5, 7) *R:(2, 3) *R:(0, 8) *R:(1, 6) L:(5, 0) L:(4, 3) L:(1, 8) L:(7, 6)

Figure 20: [19]

  • L. Dombetzki — Capsule Networks

25

slide-26
SLIDE 26

Capsule Networks

Further research Authors Contribution Hinton et. al Pose capsules and EM-routing [6] Xi et. al Hyperparamter tuning for complex data [25] Phaye et. al Skip connections [17] Rawlinson et. al Unsupervised training [18] Bahadori et. al New routing (Eigen-decomposition) [3] Wang et. al Optimized routing (KL regularization) [22]

  • L. Dombetzki — Capsule Networks

26

slide-27
SLIDE 27

Discussion

Introduction Convolutional Neural Networks Capsule Networks Discussion Conclusion Bibliography Appendix

  • L. Dombetzki — Capsule Networks

27

slide-28
SLIDE 28

Discussion

Superior to CNNs? Advantages Viewpoint invariance Less training data needed Fewer parameters Better generalization White-box attacks Validatability Challenges Scalability “Explain everything” Entity based structure Loss functions Crowding Unoptimized implementation

  • L. Dombetzki — Capsule Networks

28

slide-29
SLIDE 29

Discussion

CapsNets for real world problems

Figure 21: Results from Afshar et. al [1]

Authors Application Benefit Afshar et. al [1] Brain tumor classification Less training data Wang et. al [23] Sentiment analysis with RNNs State-of-the-art performance LaLonde et. al [14] medical image classification Parameter reduction by 95.4%

  • L. Dombetzki — Capsule Networks

29

slide-30
SLIDE 30

Conclusion

Introduction Convolutional Neural Networks Capsule Networks Discussion Conclusion Bibliography Appendix

  • L. Dombetzki — Capsule Networks

30

slide-31
SLIDE 31

Conclusion

Conclusion Big step towards human vision

  • Novel network architecture
  • Inverse graphics through pose vector capsules
  • Dynamic routing via routing-by-agreement
  • Multiple significant advantages
  • Early development phase

But not comparable to CNNs in “mainstream areas”

  • L. Dombetzki — Capsule Networks

31

slide-32
SLIDE 32

Questions?

Figure 22: [20]

  • L. Dombetzki — Capsule Networks

32

slide-33
SLIDE 33

Bibliography

Introduction Convolutional Neural Networks Capsule Networks Discussion Conclusion Bibliography Appendix

  • L. Dombetzki — Capsule Networks

33

slide-34
SLIDE 34

Bibliography

[1] P . Afshar, A. Mohammadi, and K. N. Plataniotis. Brain tumor type classification via capsule networks. CoRR, abs/1802.10200, 2018. [2] Aphex34. Convolutional neural network - max pooling. https://en.wikipedia.org/wiki/Convolutional_neural_network#Max_pooling_shape; last accessed on 2018/06/14. [3]

  • M. T. Bahadori.

Spectral capsule networks. 2018. [4]

  • N. Bourdakos.

Understanding capsule networks - ai’s alluring new architecture. https://medium.freecodecamp.org/understanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc; last accessed on 2018/07/05. [5]

  • D. J. Eck.

Introduction to computer graphics: Hierarchical modeling. http://math.hws.edu/graphicsbook/c2/s4.html; last accessed on 2018/07/05. [6]

  • G. Hinton, S. Sabour, and N. Frosst.

Matrix capsules with em routing. 2018. [7]

  • G. E. Hinton.

Askmeanything on reddit. https://www.reddit.com/r/MachineLearning/comments/2lmo0l/ama_geoffrey_hinton/clyj4jv/; last accessed

  • n

2018/07/03.

  • L. Dombetzki — Capsule Networks

34

slide-35
SLIDE 35

Bibliography

[8]

  • G. E. Hinton.

What is wrong with convolutional neural nets? Talk recorded on youtube, https://youtu.be/rTawFwUvnLE; last accessed on 2018/06/14. [9]

  • G. F. Hinton and F. Cambridge.

Shape representatton in parallel systems. 1981. [10]

  • J. Hui.

Understanding dynamic routing between capsules. https://jhui.github.io/2017/11/03/Dynamic-Routing-Between-Capsules/; last accessed on 2018/07/05. [11]

  • H. Kazemi.

Image filtering. http://machinelearninguru.com/computer_vision/basics/convolution/image_convolution_1.html; last accessed on 2018/07/05. [12]

  • T. Kothari.

Uncovering the intuition behind capsule networks and inverse graphics. https://hackernoon.com/uncovering-the-intuition-behind-capsule-networks -and-inverse-graphics-part-i-7412d121798d last accessed on 2018/06/14. [13]

  • A. Krizhevsky, I. Sutskever, and G. E. Hinton.

Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012. [14]

  • R. LaLonde and U. Bagci.

Capsules for Object Segmentation. ArXiv e-prints, Apr. 2018.

  • L. Dombetzki — Capsule Networks

35

slide-36
SLIDE 36

Bibliography

[15]

  • H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng.

Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th annual international conference on machine learning, pages 609–616. ACM, 2009. [16] Mathworks. Convolutional neural network. https://www.mathworks.com/solutions/deep-learning/convolutional-neural-network.html; last accessed

  • n

2018/06/14. [17]

  • S. S. R. Phaye, A. Sikka, A. Dhall, and D. Bathula.

Dense and diverse capsule networks: Making the capsules learn better. arXiv preprint arXiv:1805.04001, 2018. [18]

  • D. Rawlinson, A. Ahmed, and G. Kowadlo.

Sparse unsupervised capsules generalize better. CoRR, abs/1804.06094, 2018. [19]

  • S. Sabour, N. Frosst, and G. E. Hinton.

Dynamic routing between capsules. In Advances in Neural Information Processing Systems, pages 3859–3869, 2017. [20]

  • S. SHARMA.

Activation functions: Neural networks. https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6; last accessed on 2018/07/05. [21] P . Veliˇ ckovi´ c. Tikz figure collection. https://github.com/PetarV-/TikZ/tree/master/Multilayerperceptron; last accessed on 2018/07/05.

  • L. Dombetzki — Capsule Networks

36

slide-37
SLIDE 37

Bibliography

[22]

  • D. Wang and Q. Liu.

An optimization view on dynamic routing between capsules. 2018. [23]

  • Y. Wang, A. Sun, J. Han, Y. Liu, and X. Zhu.

Sentiment analysis by capsules. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, pages 1165–1174. International World Wide Web Conferences Steering Committee, 2018. [24]

  • N. Wolchover.

As machines get smarter, evidence they learn like us. https://www.quantamagazine.org/as-machines-get-smarter-evidence-they-learn-like-us-20130723/; last accessed

  • n 2018/07/05.

[25]

  • E. Xi, S. Bing, and Y. Jin.

Capsule Network Performance on Complex Data. ArXiv e-prints, Dec. 2017.

  • L. Dombetzki — Capsule Networks

37

slide-38
SLIDE 38

Appendix

Introduction Convolutional Neural Networks Capsule Networks Discussion Conclusion Bibliography Appendix

  • L. Dombetzki — Capsule Networks

38

slide-39
SLIDE 39

Appendix

Improvements: EM-Routing

  • Hinton et. al
  • 4x4 pose matrix capsules
  • Expectation Maximization Routing
  • Performance on smallNORB dataset
  • CNN: 2.56% CapsNet: 1.4%
  • Testing on unseen viewpoints
  • L. Dombetzki — Capsule Networks

39

slide-40
SLIDE 40

Appendix

Routing by agreemen algorithm

1: procedure ROUTING( ˆ

uj|i, r, l)

2:

for all capsule i in layer l and capsule j in layer (l + 1): bij ← 0.

3:

for r iterations do

4:

for all capsule i in layer l: ci ← softmax(bi)

5:

for all capsule j in layer (l + 1): sj ←

i cij ˆ

uj|i

6:

for all capsule j in layer (l + 1): vj ← squash(sj)

7:

for all capsule i in layer l and capsule j in layer (l + 1): bij ← bij + ˆ uj|i.vj return vj

  • L. Dombetzki — Capsule Networks

40

slide-41
SLIDE 41

Appendix

More on capsules

  • L. Dombetzki — Capsule Networks

41

slide-42
SLIDE 42

Appendix

Math Squashing function vj = ||sj||2 1 + ||sj||2 sj ||sj|| (1) Full Capsule Connection sj =

  • i

cij ˆ uj|i , ˆ uj|i = Wijui (2) Routing softmax cij = exp(bij)

  • k exp(bik )

(3) Margin Loss Lk = Tk max(0, m+ − ||vk||)2 + λ (1 − Tk ) max(0, ||vk|| − m−)2 (4)

  • L. Dombetzki — Capsule Networks

42

slide-43
SLIDE 43

Appendix

Pooling is unnatural

  • L. Dombetzki — Capsule Networks

43