Understanding Convolutional Neural Networks
David Stutz
July 24th, 2014
David Stutz | July 24th, 2014 0/53 Understanding Convolutional Neural Networks David Stutz | July 24th, 2014 1/53
Understanding Convolutional Neural Networks David Stutz July 24th, - - PowerPoint PPT Presentation
Understanding Convolutional Neural Networks Understanding Convolutional Neural Networks David Stutz July 24th, 2014 David Stutz | July 24th, 2014 David Stutz | July 24th, 2014 0/53 1/53 Table of Contents - Table of Contents 1 Motivation
David Stutz | July 24th, 2014 0/53 Understanding Convolutional Neural Networks David Stutz | July 24th, 2014 1/53
1
2
3
4
5
Table of Contents -
David Stutz | July 24th, 2014 2/53
1
2
3
4
5
Motivation -
David Stutz | July 24th, 2014 3/53
◮ they accept images as raw input (preserving spatial information), ◮ and build up (learn) a hierarchy of features (no hand-crafted features
◮ Unsatisfactory state for evaluation and research!
Motivation -
David Stutz | July 24th, 2014 4/53
◮ they accept images as raw input (preserving spatial information), ◮ and build up (learn) a hierarchy of features (no hand-crafted features
◮ Unsatisfactory state for evaluation and research!
Motivation -
David Stutz | July 24th, 2014 4/53
1
2
3
4
5
Neural Networks and Network Training -
David Stutz | July 24th, 2014 5/53
Neural Networks and Network Training - Multilayer Perceptrons
David Stutz | July 24th, 2014 6/53
1 , . . . , y(1) m(1))
i
i
i
D
i,j xj + w(1) i,0 .
i,j are adjustable weights.
Neural Networks and Network Training - Multilayer Perceptrons
David Stutz | July 24th, 2014 7/53
i
i
i,0
Neural Networks and Network Training - Multilayer Perceptrons
David Stutz | July 24th, 2014 8/53
1 , . . . , y(l) m(l)) as follows:
i
i
i
m(l−1)
i,jy(l−1) j
i,0.
Neural Networks and Network Training - Multilayer Perceptrons
David Stutz | July 24th, 2014 9/53
1 .
C
Neural Networks and Network Training - Multilayer Perceptrons
David Stutz | July 24th, 2014 10/53
1
2
m(1)
1
2
m(L)
1
2
C
Neural Networks and Network Training - Multilayer Perceptrons
David Stutz | July 24th, 2014 11/53
◮ Non-linear activation functions will increase the expressive power:
◮ Depending on the application: For classification we may want to
Neural Networks and Network Training - Multilayer Perceptrons
David Stutz | July 24th, 2014 12/53
◮ Non-linear activation functions will increase the expressive power:
◮ Depending on the application: For classification we may want to
Neural Networks and Network Training - Multilayer Perceptrons
David Stutz | July 24th, 2014 12/53
−2 2 1
Neural Networks and Network Training - Multilayer Perceptrons
David Stutz | July 24th, 2014 13/53
i
i
k=1 exp(z(L+1) k
Neural Networks and Network Training - Multilayer Perceptrons
David Stutz | July 24th, 2014 14/53
i
i
k=1 exp(z(L+1) k
Neural Networks and Network Training - Multilayer Perceptrons
David Stutz | July 24th, 2014 14/53
◮ regression, ◮ or classification.
Neural Networks and Network Training - Network Training
David Stutz | July 24th, 2014 15/53
N
N
C
Neural Networks and Network Training - Network Training
David Stutz | July 24th, 2014 16/53
N
N
C
Neural Networks and Network Training - Network Training
David Stutz | July 24th, 2014 16/53
n∈M En(w).
Neural Networks and Network Training - Network Training
David Stutz | July 24th, 2014 17/53
n∈M En(w).
Neural Networks and Network Training - Network Training
David Stutz | July 24th, 2014 17/53
n∈M En(w).
Neural Networks and Network Training - Network Training
David Stutz | July 24th, 2014 17/53
◮ En(w) may be highly non-linear with many poor local minima.
◮ w[0] be an initial guess for the weights (several initialization
◮ and w[t] be the weights at iteration t.
Neural Networks and Network Training - Network Training
David Stutz | July 24th, 2014 18/53
◮ En(w) may be highly non-linear with many poor local minima.
◮ w[0] be an initial guess for the weights (several initialization
◮ and w[t] be the weights at iteration t.
Neural Networks and Network Training - Network Training
David Stutz | July 24th, 2014 18/53
Neural Networks and Network Training - Network Training
David Stutz | July 24th, 2014 19/53
Neural Networks and Network Training - Network Training
David Stutz | July 24th, 2014 20/53
∂w[t] in iteration [t + 1]? ◮ “Error Backpropagation” allows to evaluate ∂En ∂w[t] in O(W)!
◮ See the original paper “Learning Representations by
Neural Networks and Network Training - Network Training
David Stutz | July 24th, 2014 21/53
◮ No hand-crafted features necessary anymore!
◮ Error measure represents a highly non-convex, “potentially
Neural Networks and Network Training - Deep Learning
David Stutz | July 24th, 2014 22/53
◮ No hand-crafted features necessary anymore!
◮ Error measure represents a highly non-convex, “potentially
Neural Networks and Network Training - Deep Learning
David Stutz | July 24th, 2014 22/53
◮ Different activation functions offer faster learning, for example
◮ unsupervised pre-training can be done layer-wise; ◮ ...
◮ See “Learning Deep Architectures for AI,” by Y. Bengio [Ben09] for a
Neural Networks and Network Training - Deep Learning
David Stutz | July 24th, 2014 23/53
◮ allow to taylor the architecture (layers, activation functions) to the
◮ can be trained using gradient descent and error backpropagation; ◮ can be used for learning feature hierarchies (deep learning).
Neural Networks and Network Training - Summary
David Stutz | July 24th, 2014 24/53
1
2
3
4
5
Convolutional Networks -
David Stutz | July 24th, 2014 25/53
h1
h2
Convolutional Networks - Notions
David Stutz | July 24th, 2014 26/53
h1
h2
Convolutional Networks - Notions
David Stutz | July 24th, 2014 26/53
Convolutional Networks - Notions
David Stutz | July 24th, 2014 27/53
◮ Can handle raw image input.
◮ Generate a hierarchy of feature maps.
Convolutional Networks - Convolutional Layer
David Stutz | July 24th, 2014 28/53
1
i
2
3
1
2 × m(l) 3
i
i
m(l−1)
1
i,j ∗ Y (l−1) j
i
i,j are the filters to be learned.
Convolutional Networks - Convolutional Layer
David Stutz | July 24th, 2014 29/53
◮ The size m(l) 2 × m(l) 3
◮ The weights w(l) i,j are hidden in the bias matrix B(l) i
i,j .
Convolutional Networks - Convolutional Layer
David Stutz | July 24th, 2014 30/53
1
i
i
1 = m(l−1) 1
2 = m(l−1) 2
3 = m(l−1) 3
Convolutional Networks - Non-Linearity Layer
David Stutz | July 24th, 2014 31/53
1
2
3
1 = m(l−1) 1
◮ For example by placing windows at non-overlapping positions within
Convolutional Networks - Subsampling and Pooling Layer
David Stutz | July 24th, 2014 32/53
◮ LeCun et al. [LKF10] and Jarrett et al. [JKRL09] give a review of
Convolutional Networks - Architectures
David Stutz | July 24th, 2014 33/53
Convolutional Networks - Architectures
David Stutz | July 24th, 2014 34/53
i
i
i
Convolutional Networks - Architectures
David Stutz | July 24th, 2014 35/53
◮ There are different implementations available, see Krizhevsky et al.
Convolutional Networks - Architectures
David Stutz | July 24th, 2014 36/53
◮ convolutional layers; ◮ non-linearity layers; ◮ and subsampling layers.
Convolutional Networks - Summary
David Stutz | July 24th, 2014 37/53
1
2
3
4
5
Understanding Convolutional Networks -
David Stutz | July 24th, 2014 38/53
◮ But: Learned feature hierarchy not well understood.
◮ Feature activations after first convolutional layer can be backprojected
Understanding Convolutional Networks -
David Stutz | July 24th, 2014 39/53
◮ But: Learned feature hierarchy not well understood.
◮ Feature activations after first convolutional layer can be backprojected
Understanding Convolutional Networks -
David Stutz | July 24th, 2014 39/53
◮ by convolving the input image by a set of filters – like convolutional
◮ however, they are fully unsupervised.
Understanding Convolutional Networks - Deconvolutional Networks
David Stutz | July 24th, 2014 40/53
i
i !
m(l)
1
j,i
j .
◮ are unsupervised by definition; ◮ need to learn feature activations and filters.
Understanding Convolutional Networks - Deconvolutional Networks
David Stutz | July 24th, 2014 41/53
◮ See “Deconvolutional Networks,” by Zeiler et al. [ZKTF10] for details
Understanding Convolutional Networks - Deconvolutional Networks
David Stutz | July 24th, 2014 42/53
◮ filters are already learned – no training necessary.
Understanding Convolutional Networks - Visualization
David Stutz | July 24th, 2014 43/53
Understanding Convolutional Networks - Visualization
David Stutz | July 24th, 2014 44/53
Understanding Convolutional Networks - Visualization
David Stutz | July 24th, 2014 44/53
Understanding Convolutional Networks - Visualization
David Stutz | July 24th, 2014 45/53
Understanding Convolutional Networks - Visualization
David Stutz | July 24th, 2014 46/53
(a) Images. (b) Activations.
Figure: Activations of layer 3 backprojected to pixel level [ZF13].
Understanding Convolutional Networks - Visualization
David Stutz | July 24th, 2014 47/53
(a) Images. (b) Activations.
Figure: Activations of layer 3 backprojected to pixel level [ZF13].
Understanding Convolutional Networks - Visualization
David Stutz | July 24th, 2014 48/53
(a) Images. (b) Activations.
Figure: Activations of layer 4 backprojected to pixel level [ZF13].
Understanding Convolutional Networks - Visualization
David Stutz | July 24th, 2014 49/53
(a) Images. (b) Activations.
Figure: Activations of layer 4 backprojected to pixel level [ZF13].
Understanding Convolutional Networks - Visualization
David Stutz | July 24th, 2014 50/53
1
2
3
4
5
Conclusion -
David Stutz | July 24th, 2014 51/53
◮ [ZF13] use deconvolutional networks to visualize feature activations; ◮ this allows to analyze the feature hierarchy and to increase
◮ For example by adjusting the filter size and subsampling scheme. Conclusion -
David Stutz | July 24th, 2014 52/53
Conclusion -
David Stutz | July 24th, 2014 53/53
David Stutz | July 24th, 2014 53/53
David Stutz | July 24th, 2014 53/53
David Stutz | July 24th, 2014 53/53
David Stutz | July 24th, 2014 53/53
David Stutz | July 24th, 2014 53/53
David Stutz | July 24th, 2014 53/53
David Stutz | July 24th, 2014 53/53
David Stutz | July 24th, 2014 53/53