Convolutional Neural Networks 1
Convolutional Neural Networks
08, 10 & 17 Nov, 2016
- J. Ezequiel Soto S.
Image Processing 2016
- Prof. Luiz Velho
Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel - - PowerPoint PPT Presentation
Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing 2016 Prof. Luiz Velho Convolutional Neural Networks 1 Summary & References 08/11 ImageNet Classification with Deep Convolutional Neural Networks
Convolutional Neural Networks 1
Convolutional Neural Networks 2
08/11 ImageNet Classification with Deep Convolutional Neural Networks
2012, Krizhevsky et. al. [source]
10/11 Going Deeper with Convolutions
2015, Szegedy et. al. [source]
17/11 Painting Style Transfer for Head Portraits using Convolutional Neural Networks
2016, Selim & Elgharib [source]
+ CS231n: Convolutional Neural Networks for Visual Recognition
Sanford University Course Notes
+ Very Deep Convolutional Networks for Large-Scale Image Recognition
2015, Simonyan & Zizzerman [source]
Convolutional Neural Networks 3
Convolutional Neural Networks 4
– ReLU Nonlinearity – Parallel GPU training – Local Response Normalization – Overlapping Pooling
– Data augmentation – Dropout
Convolutional Neural Networks 5
Machine Learning Methods →
– Larger datasets – Powerful learning methods – Better techniques vs. overfitting
– NORB, CIFAR – LabelMe: ~100k segmented & labeled images – ImageNet: >15M labeled hi-res images in 22k categories
Convolutional Neural Networks 6
→ Convolutional Neural Networks.
– Stationarity of statistics – Locality of pixel dependencies
– Variable capacity (depth and breadth) – Fewer connections and parameters than usual, but still a lot... – Easier to train
– Prohibitively expensive to apply in large scale to high-resolution images
GPU with optimized 2D convolutions →
→ Large enough datasets like ImageNet for training without overfitting
Convolutional Neural Networks 7
Convolutional Neural Networks 8
ILSVRC Challenges 2010
Classification with 1000 categories
2011
Classification Classification + localization
2012
Classification Classification + localization Fine-grained classification (100+ categories dogs) * WINNER: Krizhevsky et.al. 2012
2013
PASCAL-style detection on fully labeled data for 200 categories Classification with 1000 categories Classification + localization with 1000 categories
2014
PASCAL-style detection on fully labeled data for 200 categories Classification + localization with 1000 categories * WINNERs: GoogLeNet, Szegedy et.al. 2015 VGG, Simonyan & Zisserman, 2015
2015
Object detection for 200 fully labeled categories Object localization for 1000 categories Object detection from video for 30 fully labeled categories Scene classification for 401 categories
2016
Object localization for 1000 categories Object detection for 200 fully labeled categories Object detection from video for 30 fully labeled categories Scene classification for 365 scene categories Scene parsing for 150 stuff and discrete object categories
Source: http://image-net.org/challenges/LSVRC/
Convolutional Neural Networks 9
Convolutional Neural Networks 10
– 5 convolutional (Conv) – 3 fully connected (FC)
Conv1 Conv2 Conv3 Conv4 Conv5 FC1 FC2 FC3 → → → → →
Convolutional Neural Networks 11
tanh(x) max(0, x)
function: max(0,x)
Units (ReLU)
→ Figure: test on a 4-deep CNN on CIFAR-10, no regularization, different
ELU? Open debate...
Convolutional Neural Networks 12
4 and FC →
– Easy with modern GPUs: common access to memory
columns by 1.7% (top-1) and 1.2% (top-5)
Convolutional Neural Networks 13
generalization →
k=2, n=5, α=10-4, β=0.75
Convolutional Neural Networks 14
kernel
Convolutional Neural Networks 15
Convolutional Neural Networks 16
Model:
Maximize the multinomial logistic regression objective Maximize the average across training cases of the log-probability of the correct label under the prediction distribution ~60 million parameters
Conv1 Conv2 Conv3 Conv4 Conv5 FC1 FC2 FC3 → → → → → 96 Ker 256 Ker 384 Ker 384 Ker 256 Ker 4096 neurons
11×11×3(s4) 5×5×48 3×3×256 3×3×192 3×3×192 each
LRN LRN
dropout
Convolutional Neural Networks 17
Convolutional Neural Networks 18
bits of constraints in the mapping from image to label
→ Not enough to prevent overfitting
preserving transformations
– Image translation and horizontal reflection
Training over random 224 × 224 patches and its reflections → 2048x training set size Test with four corners and central patch 10x test chance →
– Changes in the intensity and color of illumination: Alter color intensities
with PCA of of the 3×3 covariance color matrix I’xy = [IRxy, IGxy, IBxy] + [p1, p2, p3][α1λ1, α2λ2, α3λ3]T Each αi is a random Gaussian computed each training use of the image
Convolutional Neural Networks 19
Convolutional Neural Networks 20
(turn off: during forward feed and back-propagation)
– Combine the predictions of many models is effective but it is too
expensive
– Similar results strategy that costs about 2x the time of training – Reduce co-adaptation of neighboring neurons – Forced to learn more robust features – Test time: multiply all outputs by 0.5! – Dropout inhibits substantial overfitting – Doubles time of convergence
Convolutional Neural Networks 21
– Batch (Di) size: 128 images – Momentum: 0.9 – Weight decay: 0.0005 – Learning rate: ϵ
– Weights ~ N(0,0.01) – Biases = 1 for Conv2, Conv4, Conv5, all FC; = 0 everywhere else
→ accelerated initial learning with non-zero ReLU
– Learning rate 0.01 and divide by 10 when validation error rate stops improving (3 times
until termination)
Convolutional Neural Networks 22
(*Pre-training Conv6: ImageNet 2011 Fall, 15M images in 22k categories)
Convolutional Neural Networks 23
– Frequency / orientation selective filters – GPU specialization (independent of initialization)
Convolutional Neural Networks 24
Euclidean distance groups by 4096-dimensional feature vectors of last hidden layer (not equal to L2 on pixels) → Generate auto-encoders?
Convolutional Neural Networks 25
Convolutional Neural Networks 26