[PPT] - Deep Learning T HEORY , H ISTORY , S TATE OF THE A RT & P PowerPoint Presentation

SLIDE 1

by Ilya Kuzovkin ilya.kuzovkin@gmail.com

Machine Learning Estonia 2016

Deep Learning

THEORY, HISTORY, STATE OF THE ART & PRACTICAL TOOLS

http://neuro.cs.ut.ee

SLIDE 2

SLIDE 3

Where it has started How it evolved What is the state now How can you use it How it learns

SLIDE 4

Where it has started

SLIDE 5

Where it has started

Artificial Neuron

SLIDE 6

Where it has started

Artificial Neuron

McCulloch and Pitts

“A Logical Calculus of the Ideas Immanent in Nervous Activity”

1943

SLIDE 7

Where it has started

Artificial Neuron

McCulloch and Pitts

“A Logical Calculus of the Ideas Immanent in Nervous Activity”

1943

SLIDE 8

Where it has started

Perceptron

Frank Rosenblatt 1957

SLIDE 9

Where it has started

Perceptron

Frank Rosenblatt 1957

SLIDE 10

Where it has started

Perceptron

Frank Rosenblatt 1957

SLIDE 11

Where it has started

Perceptron

Frank Rosenblatt 1957 “[The Perceptron is] the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.” THE NEW YORK TIMES

SLIDE 12

Where it has started

Sigmoid & Backpropagation

SLIDE 13

Where it has started

Sigmoid & Backpropagation

SLIDE 14

Where it has started

Sigmoid & Backpropagation

Measure how small changes in weights affect output Can apply NN to regression

SLIDE 15

Where it has started

Sigmoid & Backpropagation

(Werbos) Rumelhart, Hinton, Williams (1974) 1986

“Learning representations by back-propagating errors” (Nature)

Measure how small changes in weights affect output Can apply NN to regression

SLIDE 16

Where it has started

Sigmoid & Backpropagation

(Werbos) Rumelhart, Hinton, Williams (1974) 1986

“Learning representations by back-propagating errors” (Nature)

Measure how small changes in weights affect output Multilayer neural networks, etc. Can apply NN to regression

SLIDE 17

Where it has started

Why DL revolution did not happen in 1986?

FROM A TALK BY GEOFFREY HINTON

SLIDE 18

Where it has started

Why DL revolution did not happen in 1986?

Not enough data (datasets were 1000 times

too small)

Computers were too slow (1,000,000 times)

FROM A TALK BY GEOFFREY HINTON

SLIDE 19

Where it has started

Why DL revolution did not happen in 1986?

Not enough data (datasets were 1000 times

too small)

Computers were too slow (1,000,000 times)
Not enough attention to network initialization
Wrong non-linearity

FROM A TALK BY GEOFFREY HINTON

SLIDE 20

How it learns

SLIDE 21

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

NET OUT

SLIDE 22

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

NET OUT

SLIDE 23

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Forward Pass — Calculating the total error

NET OUT

SLIDE 24

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Forward Pass — Calculating the total error

NET OUT

SLIDE 25

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Forward Pass — Calculating the total error

NET OUT

SLIDE 26

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Forward Pass — Calculating the total error

NET OUT

SLIDE 27

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Forward Pass — Calculating the total error

NET OUT

Repeat for h2 = 0.596, o1 = 0.751, o2 = 0.773

SLIDE 28

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Forward Pass — Calculating the total error

NET OUT

We have o1, o2

SLIDE 29

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Forward Pass — Calculating the total error

NET OUT

We have o1, o2

SLIDE 30

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Forward Pass — Calculating the total error

NET OUT

We have o1, o2

SLIDE 31

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Forward Pass — Calculating the total error

NET OUT

We have o1, o2

SLIDE 32

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Forward Pass — Calculating the total error

NET OUT

We have o1, o2

SLIDE 33

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Backwards Pass — updating weights

NET OUT

SLIDE 34

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Backwards Pass — updating weights

NET OUT

SLIDE 35

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Backwards Pass — updating weights

NET OUT

SLIDE 36

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Backwards Pass — updating weights

NET OUT

SLIDE 37

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Backwards Pass — updating weights

NET OUT

SLIDE 38

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Backwards Pass — updating weights

NET OUT

SLIDE 39

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Backwards Pass — updating weights

NET OUT

SLIDE 40

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Backwards Pass — updating weights

NET OUT

SLIDE 41

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Backwards Pass — updating weights

NET OUT

SLIDE 42

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Backwards Pass — updating weights

NET OUT

SLIDE 43

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Backwards Pass — updating weights

NET OUT

SLIDE 44

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Backwards Pass — updating weights

NET OUT

SLIDE 45

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Backwards Pass — updating weights

NET OUT

SLIDE 46

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Backwards Pass — updating weights

NET OUT

SLIDE 47

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Backwards Pass — updating weights

NET OUT

SLIDE 48

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Backwards Pass — updating weights

NET OUT

SLIDE 49

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Backwards Pass — updating weights

NET OUT

SLIDE 50

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Backwards Pass — updating weights

NET OUT

Learning rate

SLIDE 51

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

1. The Backwards Pass — updating weights

NET OUT

Gradient descent update rule Learning rate

SLIDE 52

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

NET OUT

SLIDE 53

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

NET OUT

Repeat for w6, w7, w8

SLIDE 54

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

NET OUT

Repeat for w6, w7, w8
In analogous way for w1, w2, w3, w4

SLIDE 55

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

NET OUT

Repeat for w6, w7, w8
In analogous way for w1, w2, w3, w4
Calculate the total error again:

it was: 0.291027924 0.298371109

SLIDE 56

How it learns

Backpropagation

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99

NET OUT

Repeat for w6, w7, w8
In analogous way for w1, w2, w3, w4
Calculate the total error again:

it was:

Repeat 10,000 times:

0.291027924 0.298371109 0.000035085

SLIDE 57

How it learns

Optimization methods

Alec Radford “Introduction to Deep Learning with Python”

SLIDE 58

How it learns

Optimization methods

Alec Radford “Introduction to Deep Learning with Python”

SLIDE 59

How it evolved

SLIDE 60

How it evolved

1-layer NN

INPUT OUTPUT

Alec Radford “Introduction to Deep Learning with Python”

SLIDE 61

How it evolved

1-layer NN

92.5% on the MNIST test set

INPUT OUTPUT

Alec Radford “Introduction to Deep Learning with Python”

SLIDE 62

How it evolved

1-layer NN

92.5% on the MNIST test set

INPUT OUTPUT

Alec Radford “Introduction to Deep Learning with Python”

SLIDE 63

How it evolved

One hidden layer

Alec Radford “Introduction to Deep Learning with Python”

SLIDE 64

How it evolved

One hidden layer

98.2% on the MNIST test set

Alec Radford “Introduction to Deep Learning with Python”

SLIDE 65

How it evolved

One hidden layer

98.2% on the MNIST test set Activity of a 100 hidden neurons (out of 625)

Alec Radford “Introduction to Deep Learning with Python”

SLIDE 66

How it evolved

Overfitting

SLIDE 67

How it evolved

Dropout

Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, 2014

SLIDE 68

How it evolved

Dropout

Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, 2014

SLIDE 69

How it evolved

Dropout

Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, 2014

SLIDE 70

How it evolved

Dropout

Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, 2014

SLIDE 71

How it evolved

ReLU

X. Glorot, A. Bordes, Y. Bengio, “Deep Sparse Rectifier Neural Networks”, 2011

SLIDE 72

How it evolved

ReLU

X. Glorot, A. Bordes, Y. Bengio, “Deep Sparse Rectifier Neural Networks”, 2011

SLIDE 73

How it evolved

“Modern” ANN

Several hidden layers
ReLU activation units
Dropout

SLIDE 74

How it evolved

“Modern” ANN

Several hidden layers
ReLU activation units
Dropout

SLIDE 75

How it evolved

“Modern” ANN

Several hidden layers
ReLU activation units
Dropout

99.0% on the MNIST test set

SLIDE 76

How it evolved

Convolution

SLIDE 77

How it evolved

Convolution

+1 +1 +1

1
1
1

Prewitt edge detector

SLIDE 78

How it evolved

Convolution

+1 +1 +1

1
1
1

Prewitt edge detector

SLIDE 79

How it evolved

Convolution

+1 +1 +1

1
1
1

Prewitt edge detector

SLIDE 80

How it evolved

Convolution

+1 +1 +1

1
1
1

Prewitt edge detector

40 40 40 40 40 40 40 40 40 10 10 10 10 10 10 10 10 10

SLIDE 81

How it evolved

Convolution

+1 +1 +1

1
1
1

Prewitt edge detector

+1 +1 +1

1
1
1

40 40 40 40 40 40 40 40 40 10 10 10 10 10 10 10 10 10

SLIDE 82

How it evolved

Convolution

+1 +1 +1

1
1
1

Prewitt edge detector

+40 +40 +40

40
40
40

10 10 10 10 10 10 10 10 10

SLIDE 83

How it evolved

Convolution

+1 +1 +1

1
1
1

Prewitt edge detector

+40 +40 +40

40
40
40

10 10 10 10 10 10 10 10 10

SLIDE 84

How it evolved

Convolution

+1 +1 +1

1
1
1

Prewitt edge detector

90 90

40 40 40 40 40 40 40 40 40 10 10 10 10 10 10 10 10 10

SLIDE 85

How it evolved

Convolution

+1 +1 +1

1
1
1

Prewitt edge detector

90 90

40 40 40 40 40 40 40 40 40 10 10 10 10 10 10 10 10 10

SLIDE 86

How it evolved

Convolution

+1 +1 +1

1
1
1

Prewitt edge detector

90 90

40 40 40 40 40 40 40 40 40 10 10 10 10 10 10 10 10 10

SLIDE 87

How it evolved

Convolution

+1 +1 +1

1
1
1

Prewitt edge detector

90 90

40 40 40 40 40 40 40 40 40 10 10 10 10 10 10 10 10 10

Edge detector is a handcrafted feature detector.

SLIDE 88

How it evolved

Convolution

The idea of a convolutional layer is to learn feature detectors instead of using handcrafted ones

SLIDE 89

How it evolved

Convolution

The idea of a convolutional layer is to learn feature detectors instead of using handcrafted ones

http://yann.lecun.com/exdb/mnist/

SLIDE 90

How it evolved

Convolution

The idea of a convolutional layer is to learn feature detectors instead of using handcrafted ones 99.50% on the MNIST test set

CURRENT BEST: 99.77% by committee of 35 conv. nets

http://yann.lecun.com/exdb/mnist/

SLIDE 91

How it evolved

More layers

SLIDE 92

How it evolved

More layers

C. Szegedy, et al., “Going Deeper with Convolutions”, 2014

SLIDE 93

How it evolved

More layers

C. Szegedy, et al., “Going Deeper with Convolutions”, 2014

ILSVRC 2015 winner — 152 (!) layers

K. He et al., “Deep

Residual Learning for Image Recognition”, 2015

SLIDE 94

How it evolved

Hyperparameters

Network:
architecture
number of layers
number of units (in each layer)
type of the activation function
weight initialization
Convolutional layers:
size
stride
number of filters
Optimization method:
learning rate
ther method-specific

constants

…

SLIDE 95

How it evolved

Hyperparameters

Network:
architecture
number of layers
number of units (in each layer)
type of the activation function
weight initialization
Convolutional layers:
size
stride
number of filters
Optimization method:
learning rate
ther method-specific

constants

…

Grid search :(

SLIDE 96

How it evolved

Hyperparameters

Network:
architecture
number of layers
number of units (in each layer)
type of the activation function
weight initialization
Convolutional layers:
size
stride
number of filters
Optimization method:
learning rate
ther method-specific

constants

…

Grid search :( Random search :/

SLIDE 97

How it evolved

Hyperparameters

Network:
architecture
number of layers
number of units (in each layer)
type of the activation function
weight initialization
Convolutional layers:
size
stride
number of filters
Optimization method:
learning rate
ther method-specific

constants

…

Grid search :( Random search :/ Bayesian optimization :)

Snoek, Larochelle, Adams, “Practical Bayesian Optimization of Machine Learning Algorithms”

SLIDE 98

How it evolved

Hyperparameters

Network:
architecture
number of layers
number of units (in each layer)
type of the activation function
weight initialization
Convolutional layers:
size
stride
number of filters
Optimization method:
learning rate
ther method-specific

constants

…

Grid search :( Random search :/ Bayesian optimization :) Informal parameter search :)

Snoek, Larochelle, Adams, “Practical Bayesian Optimization of Machine Learning Algorithms”

SLIDE 99

How it evolved

Major Types of ANNs

feedforward convolutional

SLIDE 100

How it evolved

Major Types of ANNs

recurrent feedforward convolutional

SLIDE 101

How it evolved

Major Types of ANNs

recurrent autoencoder feedforward convolutional

SLIDE 102

What is the state now

SLIDE 103

What is the state now

Computer vision

http://cs.stanford.edu/people/karpathy/deepimagesent/ Kaiming He, et al. “Deep Residual Learning for Image Recognition” 2015

SLIDE 104

What is the state now

Natural Language Processing

speech recognition + translation Facebook bAbi dataset: question answering

http://smerity.com/articles/2015/keras_qa.html http://karpathy.github.io/2015/05/21/rnn-effectiveness/

SLIDE 105

What is the state now

AI

DeepMind’s DQN

SLIDE 106

What is the state now

AI

Sukhbaatar et al. “MazeBase: A Sandbox for Learning from Games”, 2015

DeepMind’s DQN

SLIDE 107

What is the state now

Neuroscience

Güclü and Gerven, “Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream”, 2015

SLIDE 108

What is the state now

Neuroscience

Güclü and Gerven, “Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream”, 2015

SLIDE 109

How can you use it

SLIDE 110

How can you use it

Pre-trained models

SLIDE 111

How can you use it

Pre-trained models

Go to https://github.com/BVLC/caffe/wiki/Model-Zoo, pick a model
… and use it in your application

SLIDE 112

How can you use it

Pre-trained models

Go to https://github.com/BVLC/caffe/wiki/Model-Zoo, pick a model
… and use it in your application
Or …

SLIDE 113

How can you use it

Pre-trained models

Go to https://github.com/BVLC/caffe/wiki/Model-Zoo, pick a model
… and use it in your application
Or …
… use part of it as the starting point for your model

. . .

SLIDE 114

How can you use it

Zoo of Frameworks

Low-level High-level & Wrappers

SLIDE 115

How can you use it

Keras

http://keras.io

SLIDE 116

How can you use it

Keras

http://keras.io

SLIDE 117

How can you use it

Keras

http://keras.io

SLIDE 118

How can you use it

Keras

http://keras.io

SLIDE 119

How can you use it

Keras

http://keras.io

SLIDE 120

How can you use it

Keras

http://keras.io

SLIDE 121

How can you use it

Keras

http://keras.io

SLIDE 122

A Step by Step Backpropagation Example

http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example

Online book by By Michael Nielsen

http://neuralnetworksanddeeplearning.com

CS231n: Convolutional Neural Networks for Visual Recognition

http://cs231n.stanford.edu/