Hybrid Deep Learning Topology for Image Classification
15.07.2019
Petru Radu petru.radu@ness.com
27th Summer School on Image Processing, Timisoara 2019
Hybrid Deep Learning Topology for Image Classification Petru Radu - - PowerPoint PPT Presentation
15.07.2019 Hybrid Deep Learning Topology for Image Classification Petru Radu petru.radu@ness.com 27 th Summer School on Image Processing, Timisoara 2019 SSIP 2019 Introduction Classical Neural Networks employed in image classification have a
15.07.2019
Petru Radu petru.radu@ness.com
27th Summer School on Image Processing, Timisoara 2019
2
Petru Radu: petru.radu@e-uvt.ro SSIP 2019
Classical Neural Networks employed in image classification have a large number of parameters => impossible to train such a system without
sufficient number of training examples. But with Convolutional Neural Networks(CNN), the task of training the whole network from the scratch can be carried out using existing large (enough) datasets like ImageNet.
3
One important aspect of deep learning is understanding the underlying working principles of a model that was designed to solve a certain problem. A very popular deep neural network model is VGG (*). VGG stands for Visual Geometry Group, which is the research group that proposed it. One of the main benchmarks in image classification was Image Net Large Scale
Visual Recognition Challenge (ILSVRC) (**) was evaluated on ImageNet dataset.
(*) K. Simonoyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition”, ICLR 2015 (**) Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei,” ImageNet Large Scale Visual Recognition Challenge”. IJCV, 2015
SSIP 2019 Petru Radu: petru.radu@ness.com
4
SSIP 2019 Petru Radu: petru.radu@ness.com
5
VGG won the competition in 2014 and obtained high results in both image localization and image classification tasks. The VGG architecture had as starting the architecture of LeNet model, which is shown in the figure below. There is a convolution layer followed by a pooling layer then another convolution followed by another pulling layer and a couple of fully connected layers.
SSIP 2019 Petru Radu: petru.radu@ness.ro
6
As it may be intuitively noticed, there are multiple types of VGGs, depending on customized configurations of the base topology. Two of the most known VGG models are VGG16, which has 16 layers and VGG19, which has 19 layers. VGG16:
SSIP 2019 Petru Radu: petru.radu@e-uvt.ro
7
As it may be intuitively noticed, there are multiple types of VGGs, depending on customized configurations of the base topology. VGG16:
SSIP 2019 Petru Radu: petru.radu@e-uvt.ro
138 M params 38.7 K params 221.4 K params 1.4 M params 5.9 M params 7 M params 123 M params
8
2 of the most significant research questions in AI:
for new classes?
whilst maintaining its accuracy? Assume the database that needs to be used contains images of cats and dogs. However, if the VGG returns the label of a house, the engineer knows for sure that this is false if the network has not been trained on cats and dogs.
SSIP 2019
a b Transfer Learning Conducted work
Petru Radu: petru.radu@ness.com
9
One could think of a deep neural network as a combination between 2 pieces: a feature transformer and a linear model that works on those transformed features. By retraining the final part of the network, i.e. the classifier, on the original data by augmenting new classes of images, the task of transfer learning is achieved. In the case of VGG16, training only the final 1-3 dense layers, while keeping the feature transformer weights fixed achieves the capability of adding new classes to the output labels.
SSIP 2019 Petru Radu: petru.radu@e-uvt.ro
10
The underlying assumption is that the weights of the feature transformer are highly
Only the final part, which contains the fully connected layers needs to be retrained
Petru Radu: petru.radu@ness.com
11
The underlying assumption is that the weights of the feature transformer are highly
Only the final part, which contains the fully connected layers needs to be retrained BUT: The final part (i.e. the fully connected layers) contain a significant amount of the weights of a deep learning architecture due to the flattening operation
Petru Radu: petru.radu@ness.com
F.C. layers: 123 M params
12
to efficiently calculate the gradient of an objective function over each model parameter.
Petru Radu: petru.radu@e-uvt.ro SSIP 2019
(*) https://towardsdatascience.com/the-problem-with-back-propagation-13aa84aabd71
13
–network connection weights –network architecture (network topology, transfer function) –network learning algorithms
Petru Radu: petru.radu@e-uvt.ro SSIP 2019
(*) A. Lee et al, “Determination of Optimal Initial Weights of an ANN by Using the Harmony Search Algorithm: Application to Breakwater Armor Stones”, Applied Sciences, 2016
14
natural biological evolution
a normal distribution.
Petru Radu: petru.radu@ness.com SSIP 2019
15
Petru Radu: petru.radu@ness.com SSIP 2019
16
Petru Radu: petru.radu@ness.com SSIP 2019
Initialize population Objective function
Apply evolutionary operators Crossover Mutation Select elitist individuals Assess objective function … no stop yes
17
adaptively increase or decrease the search space for the next generation.
population
the current iteration:
the best solutions to each other.
Petru Radu: petru.radu@e-uvt.ro SSIP 2019
18
Petru Radu: petru.radu@ness.com SSIP 2019
19
highly unlikely to converge to the global minimum.
at the end of the network.
–Serial optimisation: convolution filters are optimised individually, one at a
–Parallel optimisation: multiple convolutional kernels are optimized in
parallel and get plugged into the network architecture when the process is finished.
Petru Radu: petru.radu@e-uvt.ro SSIP 2019
20
assigns a class label to an input feature according to the class label that the majority of a reference set, or prototypes features have
Petru Radu: petru.radu@ness.com SSIP 2019
21
architecture to reduce its complexity. In this example, the CNN has 5 layers: 2 convolutional layers and 3 fully connected layers. The CNN – k-NN hybrid approach reduces the number of layers
Petru Radu: petru.radu@e-uvt.ro SSIP 2019
22
SSIP 2019 Petru Radu: petru.radu@ness.com
23
test set of 10,000 examples.
28x28 pixels Petru Radu: petru.radu@e-uvt.ro SSIP 2019
24
SSIP 2019 Petru Radu: petru.radu@ness.com
25
algorithms.
convolutional layer.
Petru Radu: petru.radu@e-uvt.ro SSIP 2019
26
architecture file which holds the optimized weights.
gradually reducing the number of neurons in the first fully connected layer.
Inference engine size 128 1.2 million 4.5 MB 64 122k 0.5 MB 32 70k 273 KB 16 44.6k 174 KB 8 31.7k 124 KB
SSIP 2019 Petru Radu: petru.radu@ness.com
27
1050 Ti, with 768 CUDA cores. The training time for the network on 5 epochs is only 45 seconds.
with 6 cores running at 4.7 GHz.
for PSO optimization.
engine size is only 74 KB, which is
connected layer and
Petru Radu: petru.radu@e-uvt.ro SSIP 2019
28
network Accuracy for Train/Test of 4000/1000 5 epochs Accuracy for Train/Test of 60000/10000 5 epoch Accuracy for Train/Test
1.2 million 97.5 99.91 98 122k 97 98.96 97.5 70k 97 98.98 97.5 44.6k 96.1 98.65 97.4 31.7k 87.5 97.8 96.5 18.8k CMA-ES 97.5 98.57 97.5 18.8k PSO 97.3 98.52 97.3 SSIP 2019 Petru Radu: petru.radu@ness.com
29
where computing systems weave themselves in various aspects of everyday life and vanish into the background.
Petru Radu: petru.radu@e-uvt.ro SSIP 2019
30
Petru Radu: petru.radu@ness.com SSIP 2019
Edge computing information flow
31
Petru Radu: petru.radu@e-uvt.ro SSIP 2019
32
Petru Radu: petru.radu@ness.com SSIP 2019
In progress: comparing the running speed of the original neural network architecture on an Intel Neural Compute Stick 2 vs k-NN slicing approach on generic laptop
33
eliminating the fully connected layers and training a generic classifier, such as k-NN on the activation maps of convolutional layers
network optimized with the evolutionary algorithms is comparable to the classification accuracy obtained by training the full-size CNN using the classical backpropagation approach.
Petru Radu: petru.radu@e-uvt.ro SSIP 2019