BBM413 Fundamentals of Image Processing Introduction to Deep Learning
Erkut Erdem Hacettepe University Computer Vision Lab (HUCVL)
BBM413 Fundamentals of Image Processing Introduction to Deep - - PowerPoint PPT Presentation
BBM413 Fundamentals of Image Processing Introduction to Deep Learning Erkut Erdem Hacettepe University Computer Vision Lab (HUCVL) What is deep learning? Y. LeCun, Y. Bengio, G. Hinton, "Deep Learning" , Nature,
Erkut Erdem Hacettepe University Computer Vision Lab (HUCVL)
− Yann LeCun, Yoshua Bengio and Geoff Hinton
2
3
4
5
Psychological Review, Vol. 65, 1958
6
(Rumelhart, Hinton, Williams, 1986)
(Werbos, 1988)
1989)
(LSTM) (Schmidhuber, 1997)
7
8
Adapted from Joan Bruna
9
learning
suitable features (weights).
supervised learning to achieve good results.
10
Science, Vol. 313, 28 July 2006.
11
12
Image classification
Easiest classes Hardest classes
Output Scale T-shirt Steel drum Drumstick Mud turtle Output Scale T-shirt Giant panda Drumstick Mud turtle
1K categories
error
2012 Teams %Error Supervision (Toronto) 15.3 ISI (Tokyo) 26.1 VGG (Oxford) 26.9 XRCE/INRIA 27.0 UvA (Amsterdam) 29.6 INRIA/LEAR 33.4
pooling layers)
dropout
13
CNN based, non-CNN based
14
Robotics
Amodei et al., "Deep Speech 2: End-to-End Speech Recognition in English and Mandarin", In CoRR 2015 M.-T. Luong et al., "Effective Approaches to Attention-based Neural Machine Translation", EMNLP 2015
Driving Cars”, In CoRR 2016
deep neural networks and tree search", Nature 529, 2016
supervision: Learning to Grasp from 50K Tries and 700 Robot Hours” ICRA 2015
reveals new insights into the genetic determinants of disease", Science 347, 2015
Groove: Generation of Realistic Accompaniments from Single Song Recordings", In IJCAI 2015
Speech recognition Self-Driving Cars Game Playing Genomics Machine Translation
am a student _ Je suis étudiant Je suis étudiant _ I
Audio Generation
And many more…
15
16
17
Slide credit: Neil Lawrence
17
18
Year Breakthroughs in AI Datasets (First Available) Algorithms (First Proposed) 1994 Human-level spontaneous speech recognition Spoken Wall Street Journal articles and other texts (1991) Hidden Markov Model (1984) 1997 IBM Deep Blue defeated Garry Kasparov 700,000 Grandmaster chess games, aka “The Extended Book” (1991) Negascout planning algorithm (1983) 2005 Google’s Arabic-and Chinese-to- English translation 1.8 trillion tokens from Google Web and News pages (collected in 2005) Statistical machine translation algorithm (1988) 2011 IBM Watson became the world Jeopardy! champion 8.6 million documents from Wikipedia, Wiktionary, and Project Gutenberg (updated in 2010) Mixture-of-Experts (1991) 2014 Google’s GoogLeNet object classification at near-human performance ImageNet corpus of 1.5 million labeled images and 1,000 object categories (2010) Convolutional Neural Networks (1989) 2015 Google’s DeepMind achieved human parity in playing 29 Atari games by learning general control from video Arcade Learning Environment dataset of over 50 Atari games (2013) Q-learning (1992) Average No. of Years to Breakthrough: 3 years 18 years
Table credit: Quant Quanto
Slide credit:
19
20 Slide credit:
20
21
Networks from Overfitting”, JMLR Vol. 15, No. 1,
22
In ICML 2015
23
24
25 x0 x1 x2 xn 1 ∑ inputs bias weights w0 w1 w2 wn b … sum non-linearity
26 x0 x1 x2 xn 1 ∑ inputs bias weights w0 w1 w2 wn b … sum non-linearity
i wixi = b + w>x
P
i wixi)
27 x0 x1 x2 xn 1 ∑ inputs bias weights w0 w1 w2 wn b … sum non-linearity
i wixi)
Range is determined by g(·)
Bi t ri s ed
(from Pascal Vincent’s slides)
Bias only changes the position of the riff
Image credit: Pascal Vincent
28 x0 x1 x2 xn 1 ∑ inputs bias weights w0 w1 w2 wn b … sum non-linearity
i wixi)
tion No nonlinear transformation No input squashing
29 x0 x1 x2 xn 1 ∑ inputs bias weights w0 w1 w2 wn b … sum non-linearity
i wixi)
Squashes the neuron’s output between 0 and 1 Always positive Bounded Strictly Increasing
1 1+exp(a)
s
30 x0 x1 x2 xn 1 ∑ inputs bias weights w0 w1 w2 wn b … sum non-linearity
i wixi)
Squashes the neuron’s output between
Can be positive or negative Bounded Strictly Increasing h(a) = exp(a)exp(a)
exp(a)+exp(a) = exp(2a)1 exp(2a)+1
exp(a)+exp(a)
=
31 x0 x1 x2 xn 1 ∑ inputs bias weights w0 w1 w2 wn b … sum non-linearity
i wixi)
Bounded below by 0 (always non-negative) Not upper bounded Strictly increasing Tends to produce units with sparse activities
32
(1 output per class)i j
p(y = c|x)
class conditional probability.
x0 x1 x2 xn
inputs …
h
exp(a1) P
c exp(ac) . . .
exp(aC) P
c exp(ac)
i>
33
x0 x1 xn h1 inputs … hidden layer h2 h0 hn
layer
⇣ a(x)i = b(1)
i
+ P
j W (1) i,j xj
⌘
⇣ b(2) + w(2)h(1)x ⌘
34
Consider a network with L hidden layers.
to L:
x0 x1 xn h1 inputs … hidden layer h2 h0 hn
layer
35
θ
t
Loss function Regularizer
MIT 6.S191 | Intro to Deep Learning | IAP 2017
loss function
=
MIT 6.S191 | Intro to Deep Learning | IAP 2017
36
MIT 6.S191 | Intro to Deep Learning | IAP 2017
37
38
MIT 6.S191 | Intro to Deep Learning | IAP 2017
Compute:
+
39
MIT 6.S191 | Intro to Deep Learning | IAP 2017
Move in direction opposite
+
Move in direction opposite
40
MIT 6.S191 | Intro to Deep Learning | IAP 2017
Move in direction opposite
+
Move in direction opposite
MIT 6.S191 | Intro to Deep Learning | IAP 2017
Move in direction opposite
+ +
Move in direction opposite
41
MIT 6.S191 | Intro to Deep Learning | IAP 2017
Move in direction opposite
+
MIT 6.S191 | Intro to Deep Learning | IAP 2017
Move in direction opposite
+ +
MIT 6.S191 | Intro to Deep Learning | IAP 2017
Repeat!
Repeat!
42
MIT 6.S191 | Intro to Deep Learning | IAP 2017
Move in direction opposite
+
MIT 6.S191 | Intro to Deep Learning | IAP 2017
Move in direction opposite
+ +
MIT 6.S191 | Intro to Deep Learning | IAP 2017
Repeat!
Repeat!
43
𝜄(𝑢+1) = 𝜄(𝑢) − 𝜃𝑢𝛼𝜄ℒ
○ For each training example (x, y): ■ Compute Loss Gradient: ■ Update θ with update rule:
MIT 6.S191 | Intro to Deep Learning | IAP 2017
○ For each training example (x, y): ■ Compute Loss Gradient: ■ Update θ with update rule:
MIT 6.S191 | Intro to Deep Learning | IAP 2017
44
𝜄(𝑢+1) = 𝜄(𝑢) − 𝜃𝑢𝛼𝜄ℒ
○ For each training example (x, y): ■ Compute Loss Gradient: ■ Update θ with update rule:
MIT 6.S191 | Intro to Deep Learning | IAP 2017
○ For each training example (x, y): ■ Compute Loss Gradient: ■ Update θ with update rule:
MIT 6.S191 | Intro to Deep Learning | IAP 2017
Only an estimate of true gradient!
Advantages:
⎯ Smoother convergence ⎯ Allows for larger learning rates
⎯ Can parallelize computation + achieve significant speed increases on GPU’s
45
𝜄(𝑢+1) = 𝜄(𝑢) − 𝜃𝑢𝛼𝜄ℒ
○ For each training example (x, y): ■ Compute Loss Gradient: ■ Update θ with update rule:
MIT 6.S191 | Intro to Deep Learning | IAP 2017
○ For each training batch {(x0, y0), … , (xB, yB)}: ■ Compute Loss Gradient: ■ Update θ with update rule:
MIT 6.S191 | Intro to Deep Learning | IAP 2017
More accurate estimate!
More accurate estimate!
46 𝜄(𝑢+1) = 𝜄(𝑢) − 𝜃𝑢𝛼𝜄ℒ
ze:
r 8
mple
ion:
nts: ,
ent: ,
,
r
Training epoch = Iteration over all examples
learning functions
function
aL(x; θ1,...,L) = hL(hL−1(. . . h1(x, θ1), θL−1), θL) ✓∗ ← arg min
θ
X
(x,y)∈(X,Y )
`(y, aL(x; ✓1,...,L))
48
h1(xi; θ)
Input
h2(xi; θ) h3(xi; θ) h4(xi; θ) h5(xi; θ) Loss
Forward connections (Feedforward architecture)
49
Input Interweaved connections (Directed Acyclic Graph architecture – DAGNN)
h1(xi; θ) h2(xi; θ) h3(xi; θ)
h4(xi; θ)
h5(xi; θ)
Loss
h2(xi; θ) h4(xi; θ)
function
enclosing other functions, like 𝑏𝑀 (... )?
50
aL(x; θ1,...,L) = hL(hL−1(. . . h1(x, θ1), θL−1), θL) ✓∗ ← arg min
θ
X
(x,y)∈(X,Y )
`(y, aL(x; ✓1,...,L)) ✓ θt+1 = θt − ηt ∂L ∂θt ◆ ∂L ∂θl , l = 1, . . . , L
51
dz dx = dz dy dy dx dz dxi = X
j
dz dyj dyj dxi
gradients from all possible paths
𝑒𝑦 = 𝑒𝑨 𝑒𝑧 𝑒𝑧 𝑒𝑦
𝑒𝑦𝑗 = σ𝑘 𝑒𝑨 𝑒𝑧𝑘 𝑒𝑧𝑘 𝑒𝑦𝑗
𝑨 𝑧(1) 𝑧(2) 𝑦(1) 𝑦(2) 𝑦(3)
52
dz dx = dz dy dy dx dz dxi = X
j
dz dyj dyj dxi
gradients from all possible paths
𝑒𝑦 = 𝑒𝑨 𝑒𝑧 𝑒𝑧 𝑒𝑦
𝑒𝑦𝑗 = σ𝑘 𝑒𝑨 𝑒𝑧𝑘 𝑒𝑧𝑘 𝑒𝑦𝑗
𝑨 𝑧(1) 𝑧(2) 𝑦(1) 𝑦(2) 𝑦(3)
𝑒𝑦 = 𝑒𝑨 𝑒𝑧 𝑒𝑧 𝑒𝑦
𝑒𝑦𝑗 = σ𝑘 𝑒𝑨 𝑒𝑧𝑘 𝑒𝑧𝑘 𝑒𝑦𝑗
𝑨 𝑧1 𝑧2 𝑦1 𝑦2 𝑦3
𝑒𝑨 𝑒𝑦1 = 𝑒𝑨 𝑒𝑧1 𝑒𝑧1 𝑒𝑦1+ 𝑒𝑨 𝑒𝑧2 𝑒𝑧2 𝑒𝑦1
53
dz dx = dz dy dy dx dz dxi = X
j
dz dyj dyj dxi
gradients from all possible paths
𝑒𝑦 = 𝑒𝑨 𝑒𝑧 𝑒𝑧 𝑒𝑦
𝑒𝑦𝑗 = σ𝑘 𝑒𝑨 𝑒𝑧𝑘 𝑒𝑧𝑘 𝑒𝑦𝑗
𝑨 𝑧(1) 𝑧(2) 𝑦(1) 𝑦(2) 𝑦(3)
𝑒𝑦 = 𝑒𝑨 𝑒𝑧 𝑒𝑧 𝑒𝑦
𝑒𝑦𝑗 = σ𝑘 𝑒𝑨 𝑒𝑧𝑘 𝑒𝑧𝑘 𝑒𝑦𝑗
𝑨 𝑧1 𝑧2 𝑦1 𝑦2 𝑦3
𝑒𝑨 𝑒𝑦1 = 𝑒𝑨 𝑒𝑧1 𝑒𝑧1 𝑒𝑦1+ 𝑒𝑨 𝑒𝑧2 𝑒𝑧2 𝑒𝑦1
54
dz dx = dz dy dy dx dz dxi = X
j
dz dyj dyj dxi
gradients from all possible paths
𝑒𝑦 = 𝑒𝑨 𝑒𝑧 𝑒𝑧 𝑒𝑦
𝑒𝑦𝑗 = σ𝑘 𝑒𝑨 𝑒𝑧𝑘 𝑒𝑧𝑘 𝑒𝑦𝑗
𝑨 𝑧(1) 𝑧(2) 𝑦(1) 𝑦(2) 𝑦(3)
𝑒𝑦 = 𝑒𝑨 𝑒𝑧 𝑒𝑧 𝑒𝑦
𝑒𝑦𝑗 = σ𝑘 𝑒𝑨 𝑒𝑧𝑘 𝑒𝑧𝑘 𝑒𝑦𝑗
𝑨 𝑧1 𝑧2 𝑦1 𝑦2 𝑦3
𝑒𝑨 𝑒𝑦1 = 𝑒𝑨 𝑒𝑧1 𝑒𝑧1 𝑒𝑦1+ 𝑒𝑨 𝑒𝑧2 𝑒𝑧2 𝑒𝑦1
𝑒𝑦 = 𝑒𝑨 𝑒𝑧 𝑒𝑧 𝑒𝑦
𝑒𝑦𝑗 = σ𝑘 𝑒𝑨 𝑒𝑧𝑘 𝑒𝑧𝑘 𝑒𝑦𝑗
𝑨 𝑧(1) 𝑧(2) 𝑦(1) 𝑦(2) 𝑦(3)
𝑒𝑨 𝑒𝑦3 = 𝑒𝑨 𝑒𝑧1 𝑒𝑧1 𝑒𝑦3+ 𝑒𝑨 𝑒𝑧2 𝑒𝑧2 𝑒𝑦3
55
L(y, aL) aL(x; θ1,...,L) = hL(hL−1(. . . h1(x, θ1), θL−1), θL)
∂L ∂θl = ∂L ∂al · ∂aL ∂aL−1 · ∂aL−1 ∂aL−2 · · · · ·
Gradient of a module w.r.t. its parameters Gradient of loss w.r.t. the module output
∂L ∂θl = ✓∂al ∂θl ◆T · ∂L ∂al
w.r.t. to input
56
∂L ∂θl = ✓∂al ∂θl ◆T · ∂L ∂al ∂L ∂al = ✓∂al+1 ∂al ◆T · ∂L ∂al+1
∂L ∂al = ✓∂al+1 ∂xl+1 ◆T · ∂L ∂al+1
∂al+1 ∂al
Recursive rule
𝜖ℒ 𝜖𝑏𝑚 𝜖ℒ 𝜖𝜄𝑚 = ( 𝜖𝑏𝑚 𝜖𝜄𝑚)𝑈⋅ 𝜖ℒ 𝜖𝑏𝑚
𝜖ℒ 𝜖𝑏𝑚 = 𝜖𝑏𝑚+1 𝜖𝑏𝑚
𝑈
⋅ 𝜖ℒ 𝜖𝑏𝑚+1
𝜖𝑏𝑚+1 𝜖𝑏𝑚
ut
𝜖ℒ 𝜖𝑏𝑚 = 𝜖𝑏𝑚+1 𝜖𝑦𝑚+1
𝑈
⋅ 𝜖ℒ 𝜖𝑏𝑚+1
𝑏𝑚 = ℎ𝑚(𝑦𝑚; 𝜄𝑚) 𝑏𝑚+1 = ℎ𝑚+1(𝑦𝑚+1; 𝜄𝑚+1) 𝑦𝑚+1 = 𝑏𝑚
57
58
slide by Dhruv Batra
59
slide by Dhruv Batra
\ˈd ē p\
fixed learned
your favorite classifier hand-crafted features SIFT/HOG
“car” “+”
This burrito place is yummy and fun!
VISION SPEECH NLP
fixed learned
your favorite classifier hand-crafted features MFCC
fixed learned
your favorite classifier hand-crafted features Bag-of-words
slide by Marc’Aurelio Ranzato, Yann LeCun
60
the Perceptron
top of a simple feature extractor
considerable efforts by experts.
i=1 N
Feature Extractor
Wi
61
slide by Marc’Aurelio Ranzato, Yann LeCun
VISION SPEECH NLP pixels edge texton motif part
sample spectral band formant motif phone word character NP/VP/.. clause sentence story word
slide by Marc’Aurelio Ranzato, Yann LeCun
62
Given a library of simple functions Compose into a complicate function
slide by Marc’Aurelio Ranzato, Yann LeCun
63
Given a library of simple functions
Idea 1: Linear Combinations
Compose into a complicate function
slide by Marc’Aurelio Ranzato, Yann LeCun
64
Given a library of simple functions
Idea 2: Compositions
Compose into a complicate function
slide by Marc’Aurelio Ranzato, Yann LeCun
65
Given a library of simple functions
Idea 2: Compositions
Compose into a complicate function
slide by Marc’Aurelio Ranzato, Yann LeCun
66
“car”
slide by Marc’Aurelio Ranzato, Yann LeCun
67
Trainable Classifier Low-Level Feature Mid-Level Feature High-Level Feature
Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013]
“car”
slide by Marc’Aurelio Ranzato, Yann LeCun
68
69 Sparse DBNs [Lee et al. ICML ‘09] Figure courtesy: Quoc Le
slide by Dhruv Batra
70
slide by Dhruv Batra
\ˈd ē p\
fixed learned
your favorite classifier hand-crafted features SIFT/HOG
“car” “+”
This burrito place is yummy and fun!
VISION SPEECH NLP
fixed learned
your favorite classifier hand-crafted features MFCC
fixed learned
your favorite classifier hand-crafted features Bag-of-words
slide by Marc’Aurelio Ranzato, Yann LeCun
71
fixed unsupervised supervised
classifier Mixture of Gaussians
MFCC
\ˈd ē p\
fixed unsupervised supervised
classifier K-Means/ pooling
SIFT/HOG
“car”
fixed unsupervised supervised
classifier
n-grams
Parse Tree Syntactic
“+”
This burrito place is yummy and fun!
VISION SPEECH NLP
“Learned”
slide by Marc’Aurelio Ranzato, Yann LeCun
72
fixed unsupervised supervised
classifier Mixture of Gaussians
MFCC
\ˈd ē p\
fixed unsupervised supervised
classifier K-Means/ pooling
SIFT/HOG
“car”
fixed unsupervised supervised
classifier
n-grams
Parse Tree Syntactic
“+”
This burrito place is yummy and fun!
VISION SPEECH NLP “Learned”
slide by Marc’Aurelio Ranzato, Yann LeCun
73
higher-level one.
Trainable Feature- Transform / Classifier Trainable Feature- Transform / Classifier Trainable Feature- Transform / Classifier Learned Internal Representations
slide by Marc’Aurelio Ranzato, Yann LeCun
74
Trainable Feature- Transform / Classifier Trainable Feature- Transform / Classifier Trainable Feature- Transform / Classifier Learned Internal Representations
“Simple” Trainable Classifier hand-crafted Feature Extractor
fixed learned
slide by Marc’Aurelio Ranzato, Yann LeCun
75
76
slide by Dhruv Batra
77 Image credit: Moontae Lee
slide by Geoff Hinton
neurons
representation of many concepts
78
Local Distributed
slide by Geoff Hinton
Image credit: Moontae Lee
79
bedroom mountain
Scene Classification
slide by Bolei Zhou
80
slide by Yisong Yue
81
82
slide by Yisong Yue
83
slide by Yisong Yue
84
slide by Yisong Yue