1
Introduction to Artificial Neural Networks
Ahmed Guessoum – Intro. to Neural Networks 24/06/2018 AMLSS
Introduction to Artificial Neural Networks Ahmed Guessoum Natural - - PowerPoint PPT Presentation
Introduction to Artificial Neural Networks Ahmed Guessoum Natural Language Processing and Machine Learning Research Group Laboratory for Research in Artificial Intelligence Universit des Sciences et de la Technologie Houari Boumediene 1
1
Ahmed Guessoum – Intro. to Neural Networks 24/06/2018 AMLSS
2
– Nonlinear transfer functions – Multi-layer networks of nonlinear units (sigmoid, hyperbolic tangent)
Ahmed Guessoum – Intro. to Neural Networks 24/06/2018 AMLSS
Ahmed Guessoum – Intro. to Neural Networks 24/06/2018 AMLSS
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
5
https://en.wikipedia.org/wiki/List_of_animals_by_number_of_neurons )
Ahmed Guessoum – Intro. to Neural Networks 24/06/2018 AMLSS
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
7
– Linear Threshold Unit (LTU) or Linear Threshold Gate (LTG) – Net input to unit: defined as a linear combination 𝒐𝒇𝒖 𝒚 = 𝒙𝒋𝒚𝒋
𝒐 𝒋=𝟏
– Output of unit: threshold (activation) function on net input (threshold =- w0)
– Neuron is modeled using a unit connected by weighted links wi to other units – Multi-Layer Perceptron (MLP)
x1 x2 xn w1 w2 wn
x0 = 1 w0
n i i i x
w
𝒑 𝒚𝟐, 𝒚𝟑, … , 𝒚𝒐 = 𝟐, 𝒙𝒋𝒚𝒋
𝒐 𝒋=𝟏
≥ 𝟏 −𝟐, 𝒑𝒖𝒊𝒇𝒔𝒙𝒋𝒕𝒇 𝐰𝐟𝐝𝐮𝐩𝐬 𝐨𝐩𝐮𝐛𝐮𝐣𝐩𝐨 𝒑 𝒚 = 𝒕𝒉𝒐(𝒚, 𝒙) = 𝟐, 𝒙 𝒚 ≥ 𝟏 −𝟐, 𝒑𝒖𝒊𝒇𝒔𝒙𝒋𝒕𝒇 24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
8
Ahmed Guessoum – Intro. to Neural Networks 24/06/2018 AMLSS
9
– LTU emulation of logic gates (McCulloch and Pitts, 1943) – e.g., What weights represent g(x1, x2) = AND (x1, x2)? OR(x1, x2)? NOT(x)? (w0 + w1 . x1 + w2 . x2 w0 = -0.8 w1 = w2 = 0.5 w0 = - 0.3 )
– e.g., not linearly separable – Solution: use networks of perceptrons (LTUs)
Example A +
+
x2 + + Example B
x2
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
10
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
11
Ahmed Guessoum – Intro. to Neural Networks
i i i i i
r(t Δw Δw w w
24/06/2018 AMLSS
12
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
13
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
14
Linearly Separable (LS) Data Set x1 x2 + + + + + + + + + + +
15
16
– Consider simpler, unthresholded linear unit: – Objective: find “best fit” to D
– Quantitative objective: minimize error over training data set D – Error function: sum squared error (SSE)
𝒚∈𝑬
– Simple optimization – Move in direction of steepest gradient in weight-error space
𝒐 𝒋=𝟏
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
17
𝛂𝑭 𝒙 = 𝝐𝑭 𝝐𝒙𝟏 , 𝝐𝑭 𝝐𝒙𝟐 , … , 𝝐𝑭 𝝐𝒙𝒐 ∆𝑿 = −𝒔 𝜶𝑭 𝒙 ∆𝒙𝒋 = −𝒔 𝝐𝑭 𝝐𝒙𝒋 𝝐𝑭 𝝐𝒙𝒋 = 𝝐 𝝐𝒙𝒋 𝟐 𝟑 𝒖 𝒚 − 𝒑 𝒚
𝟑 𝒚∈𝑬
= 𝟐 𝟑 𝝐 𝝐𝒙𝒋 𝒖 𝒚 − 𝒑 𝒚
𝟑 𝒚∈𝑬
𝒚∈𝑬
𝒚∈𝑬
𝝐𝑭 𝝐𝒙𝒋 = 𝒖 𝒚 − 𝒑 𝒚 − 𝒚𝒋
𝒚∈𝑬
18
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
19
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
20
Example A +
+
x2 + + Example B
x2 Example C x1 x2 + + + + + + + + + + + + +
Ahmed Guessoum – Intro. to Neural Networks
21
– UNTIL the termination condition is met, DO
– RETURN final w
– UNTIL the termination condition is met, DO FOR each <x, t(x)> in D, DO
– RETURN final w
– – Incremental gradient descent can approximate batch gradient descent arbitrarily closely if r made small enough
2 d D x 2 D
x
t 2 1 w E , x
t 2 1 w E
𝜶𝑭𝑬 𝒙 𝒙 ← 𝒙 − 𝒔 𝜶𝑭𝑬 𝒙 𝜶𝑭𝒆 𝒙 𝒙 ← 𝒙 − 𝒔 𝜶𝑭𝒆 𝒙
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
22
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
24
x1 x2 x3 Input Layer u 11 h1 h2 h3 h4 Hidden Layer
v42 Output Layer
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
25
x1 x2 x3 Input Layer u 11 h1 h2 h3 h4 Hidden Layer
v42 Output Layer
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
26
𝒑 𝒚 = 𝒕𝒉𝒐(𝒚, 𝒙) = 𝝉(𝒐𝒇𝒖) 𝒐𝒇𝒖 = 𝒙𝒋𝒚𝒋 = 𝒙 𝒚
𝒐 𝒋=𝟏
– Linear threshold gate activation function: sgn (w x) – Nonlinear activation (aka transfer, squashing) function: generalization of sgn – is the sigmoid function – Can derive gradient rules to train
x1 x2 xn w1 w2 wn
x0 = 1 w0
net
e net σ
1 1
net net net net
e e e e net net net σ
cosh sinh
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
𝝐𝑭 𝝐𝒙𝟏 , 𝝐𝑭 𝝐𝒙𝟐 , … , 𝝐𝑭 𝝐𝒙𝒐
𝝐𝑭 𝝐𝒙𝒋 = 𝝐 𝝐𝒙𝒋 𝟐 𝟑 𝒖 𝒚 − 𝒑 𝒚
𝟑 𝒚 𝒖(𝒚) ∈𝑬
= 𝟐 𝟑 𝝐 𝝐𝒙𝒋 𝒖 𝒚 − 𝒑 𝒚
𝟑 𝒚 𝒖(𝒚) ∈𝑬
= = 𝟐 𝟑 𝟑 𝒖 𝒚 − 𝒑 𝒚 𝝐 𝝐𝒙𝒋 𝒖 𝒚 − 𝒑 𝒚
𝒚 𝒖(𝒚) ∈𝑬
= 𝒖 𝒚 − 𝒑 𝒚 − 𝝐 𝝐𝒙𝒋 𝒑 𝒚
𝒚 𝒖(𝒚) ∈𝑬
= − 𝒖 𝒚 − 𝒑 𝒚 𝝐𝒑 𝒚 𝝐𝒐𝒇𝒖(𝒚) 𝝐𝒐𝒇𝒖(𝒚) 𝝐𝒙𝒋
𝒚 𝒖(𝒚) ∈𝑬
27
𝝐𝒑 𝒚 𝝐𝒐𝒇𝒖(𝒚) = 𝝐𝝉(𝒐𝒇𝒖) 𝝐𝒐𝒇𝒖(𝒚) = 𝒑 𝒚 𝟐 − 𝒑 𝒚 𝝐𝒐𝒇𝒖(𝒚) 𝝐𝒙𝒋 = 𝝐 𝒙 𝒚 𝝐𝒙𝒋 = 𝒚𝒋
So: 𝝐𝑭 𝝐𝒙𝒋 = 𝒖 𝒚 − 𝒑 𝒚 𝒑 𝒚 𝟐 − 𝒑 𝒚 𝒚𝒋
𝒚 𝒖(𝒚) ∈𝑬
28
– Each training example is a pair of the form <x, t(x)>, where x: input vector; t(x): target vector; r :learning rate – Initialize all weights wi to (small) random values – UNTIL the termination condition is met, DO FOR each <x, t(x)> in D, DO Input the instance x to the unit and compute the output o(x) = (net(x)) FOR each output unit k, DO (calculate its error ) FOR each hidden unit j, DO Update each w = ui,j (a = hj) or w = vj,k (a = ok) wstart-layer, end-layer wstart-layer, end-layer + wstart-layer, end-layer wstart-layer, end-layer r end-layer aend-layer – RETURN final u, v
x
t x
k k k
1 δk
k j j
v x h x h
k k j, j
δ 1 δ
x1 x2 x3 Input Layer u 11 h1 h2 h3 h4 Hidden Layer
v42 Output Layer
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
29
x1 x2 x3 Input Layer u 11 h1 h2 h3 h4 Hidden Layer
v42 Output Layer
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
30
k k k
k j j
k k j, j
x1 x2 x3 Input Layer u 11 h1 h2 h3 h4 Hidden Layer
v42 Output Layer
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
31
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
32
layer
layer,
layer
layer
layer
layer,
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
33
– e.g., raw sensor input – Conversion of symbolic data to quantitative (numerical) representations possible
– e.g., low-level control policy for a robot actuator – Similar qualitative/quantitative (symbolic/numerical) conversions may apply
– Performance measured purely in terms of accuracy and efficiency – Readability: ability to explain inferences made using model; similar criteria
– Speech phoneme recognition – Image classification – Financial prediction
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
34
– http://www.cs.cmu.edu/afs/cs/project/alv/member/www/projects/ALVINN.html – Drives 70mph on highways
Hidden-to-Output Unit Weight Map (for one hidden unit) Input-to-Hidden Unit Weight Map (for one hidden unit)
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
35
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
36
– Training procedure: hidden unit representations that minimize error E – Sometimes backprop will define new hidden features that are not explicit in the input representation x, but which capture properties of the input instances that are most relevant to learning the target function t(x) – Hidden units express newly constructed features – Change of representation to linearly separable D’
– ANNs learn discover useful representations at the hidden layers
Input Hidden Values Output
1 0
1 1 0
1 1 0
1 1 0
1 1 0
1 1 0
1 1 0
1 1
1
Input Hidden Values Output
1 0
0.89 0.04 0.08
1 1 0
0.01 0.11 0.88
1 1 0
0.01 0.97 0.27
1 1 0
0.99 0.97 0.71
1 1 0
0.03 0.05 0.02
1 1 0
0.22 0.99 0.99
1 1 0
0.80 0.01 0.98
1 1
0.60 0.94 0.01
1
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
37
– Compare: perceptron convergence (to best h H, provided h H; i.e., LS) – Gradient descent to some local error minimum (perhaps not global) – Possible improvements on Backprop (BP) (see later)
– Improvements on feedforward networks
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
38
Error versus epochs (Example 2)
Error versus epochs (Example 1)
39
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
40
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
41
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
42
30 x 32 Inputs
Left Straight Right Up
Hidden Layer Weights after 1 Epoch Hidden Layer Weights after 25 Epochs Output Layer Weights (including w0 = ) after 1 Epoch
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
– Note: Instead of 0 and 1 values, 0.1 and 0.9 are used (sigmoid units cannot output 0 and 1 given finite weights)
43
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
44
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
45
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
46
– Learning to convert text to speech
– Good performance after training on a vocabulary of ~1000 words
– Input: 7-letter window; determines the phoneme for the center letter and context
– Output: units for articulatory modifiers (e.g., “voiced”), stress, closest phoneme; distributed representation – 40 hidden units; 10000 weights total
– Vocabulary: trained on 1024 of 1463 (informal) and 1000 of 20000 (dictionary) – 78% on informal, ~60% on dictionary
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
47
Ahmed Guessoum – Intro. to Neural Networks
24/06/2018 AMLSS
48
Ahmed Guessoum – Intro. to Neural Networks
24/06/2018 AMLSS
49
Ahmed Guessoum – Intro. to Neural Networks 24/06/2018 AMLSS
50
Ahmed Guessoum – Intro. to Neural Networks
24/06/2018 AMLSS
51
Ahmed Guessoum – Intro. to Neural Networks
24/06/2018 AMLSS
52
Ahmed Guessoum – Intro. to Neural Networks
layer
layer,
layer
layer
layer
layer,
24/06/2018 AMLSS
53
Ahmed Guessoum – Intro. to Neural Networks
24/06/2018 AMLSS
54
Ahmed Guessoum – Intro. to Neural Networks 24/06/2018 AMLSS
55
Ahmed Guessoum – Intro. to Neural Networks
24/06/2018 AMLSS
56
Ahmed Guessoum – Intro. to Neural Networks 24/06/2018 AMLSS
57
Ahmed Guessoum – Intro. to Neural Networks
24/06/2018 AMLSS
58
Ahmed Guessoum – Intro. to Neural Networks 24/06/2018 AMLSS
59
Ahmed Guessoum – Intro. to Neural Networks 24/06/2018 AMLSS
60
Ahmed Guessoum – Intro. to Neural Networks
24/06/2018 AMLSS
61
Ahmed Guessoum – Intro. to Neural Networks
24/06/2018 AMLSS
−1𝑙
−1 is the Hessian matrix (second derivatives) of the
62
Ahmed Guessoum – Intro. to Neural Networks
63
– Closest to pure concept learning and classification – Some ANNs can be post-processed to produce probabilistic diagnoses
– aka prognosis (sometimes forecasting) – Predict a continuation of (typically numerical) data
– aka recommender systems – Provide assistance to human “subject matter” experts in making decisions
– Mobile robots – Autonomic sensors and actuators
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
66
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
67
24/06/2018 AMLSS Ahmed Guessoum – Intro. to Neural Networks
Andrew Ng, Machine Learning Yearning, --Technical Strategy for AI Engineers in the Era of Deep Learning, Draft Version, 2018, deeplearning.ai
68
Ahmed Guessoum – Intro. to Neural Networks
24/06/2018 AMLSS
69
Ahmed Guessoum – Intro. to Neural Networks 24/06/2018 AMLSS
70
Ahmed Guessoum – Intro. to Neural Networks 24/06/2018 AMLSS