Department of Computer Science University of Bristol
COMSM0045 – Applied Deep Learning 2020/21
comsm0045-applied-deep-learning.github.io 35 Slides Lecture 01
BASICS OF ARTIFICIAL NEURAL NETWORKS
Tilo Burghardt | tilo@cs.bris.ac.uk
BASICS OF ARTIFICIAL NEURAL NETWORKS Tilo Burghardt | - - PowerPoint PPT Presentation
Department of Computer Science University of Bristol COMSM0045 Applied Deep Learning 2020/21 comsm0045-applied-deep-learning.github.io Lecture 01 BASICS OF ARTIFICIAL NEURAL NETWORKS Tilo Burghardt | tilo@cs.bris.ac.uk
Department of Computer Science University of Bristol
COMSM0045 – Applied Deep Learning 2020/21
comsm0045-applied-deep-learning.github.io 35 Slides Lecture 01
Tilo Burghardt | tilo@cs.bris.ac.uk
Applied Deep Learning | University of Bristol Lecture 1 | 2
Applied Deep Learning | University of Bristol Lecture 1 | 3
Applied Deep Learning | University of Bristol Lecture 1 | 4
image source: www.the-scientist.com CAMILLO GOLGI
Computation in biological neural networks is delivered based on the co-operation of individual computational components, namely neuron cells.
Applied Deep Learning | University of Bristol Lecture 1 | 5
nucleus cell body dendrites axon myelin sheath axon terminals synapse main flow of information: feed-forward
Applied Deep Learning | University of Bristol Lecture 1 | 6
image source: www.psysci.co
An environment can condition the behaviour of biological neural networks leading to the incorporation of new information.
Applied Deep Learning | University of Bristol Lecture 1 | 7
image source: www.cognifit.com
Example of structural sprouting. temporal system evolution
Applied Deep Learning | University of Bristol Lecture 1 | 8
Applied Deep Learning | University of Bristol Lecture 1 | 9
image source: csis.pace.edu
Applied Deep Learning | University of Bristol Lecture 1 | 10
x1 x2 x3 ...
w1 w2 w3 ... y sign
flow of information: feed-forward bias input x multiplication with weights w
1 1 ) (
v
if v sign b x w sign y
def i i i
input width activation function summation
Applied Deep Learning | University of Bristol Lecture 1 | 11
T input parameters T function activation
unit
parameters input function unit
x1 x2 ...
...] [ θ ...] [ w
1 1
w w
w1 w0 w2 ... bias unit function y=f(x) is shorthand for f(x;w)
activation function g summation
parameters s y input
CONVENTION: bias is incorporated in parameter vector various different variable names are used for parameters, most often we will use w NOTATION: a minor letter in non-italic font refers to a vector, a capital letter in non-italic font would refer to a matrix or vector set NOTATION: italic font refers to scalars NOTATION: semicolon separates input (left) from parameters (right)
Applied Deep Learning | University of Bristol Lecture 1 | 12
w1 w2 normal vector hyper plane defined by parameters w x1 x2 w0/w1 w0/w2 positive sign area negative sign area
x w
T
x w
T
x w
T
T
hyper plane acts as decision boundary
Applied Deep Learning | University of Bristol Lecture 1 | 13
actual truth ground update
* *
Applied Deep Learning | University of Bristol Lecture 1 | 14
Compute Output Compare Output and Ground Truth Adjust Weights Consider Next (Training) Input Pair
T
*
) x ( ) x ( ) x (
* *
f f
if x f w
i i
*
Applied Deep Learning | University of Bristol Lecture 1 | 15
x1 x2 f*
1 1 1 1 1 1 1 x0 x1 x2 parameters w f f* update ∆w
(0,0,0) 1
(1,0,0)
1 (1,0,0)
1 (-1,1,0)
(0,1,0) 1
(1,0,0)
1 (1,1,0)
1 (-1,0,1)
(0,1,1) 1
(1,0,0)
1 (1,1,1) 1 1 (0,0,0)
1 (1,1,1) 1 1 (0,0,0)
1 1 (1,1,1) 1 1 (0,0,0)
(1,1,1)
(0,0,0) ... ... ... ... ... ... ...
*
learning progress sampling some (x,f*)
encoding could be changed to traditional value 0 by adjusting the output of the sign function to 0; training algorithm still valid
Applied Deep Learning | University of Bristol Lecture 1 | 16
hyper plane defined by weights x1 x2 1=w0/w1 1=w0/w2 positive sign area negative sign area
x w
T
x w
T
x w
T
1 1 1 class label 1 class label -1
Applied Deep Learning | University of Bristol Lecture 1 | 17
image source: datasciencelab.wordpress.com
Applied Deep Learning | University of Bristol Lecture 1 | 18
Applied Deep Learning | University of Bristol Lecture 1 | 19
X x 2 * X x * * ~ )) x ( x, (
*
function loss example per function loss loss p f
Applied Deep Learning | University of Bristol Lecture 1 | 20
Cost Function J parameter dimensions of w
Applied Deep Learning | University of Bristol Lecture 1 | 21
Applied Deep Learning | University of Bristol Lecture 1 | 22
parameter dimensions of w
gradient steepest rate learning
new
t t 1 t
Applied Deep Learning | University of Bristol Lecture 1 | 23
* T * T X x * T X x 2 * T
derivative error the is k k k k k
MSE-type cost function with identity function as activation function weight vector change is modelled as a move along the steepest descent change for a single weight wk this term looks similar to the Perceptron learning rule also known as The Delta Rule (Widrow & Hoff, 1960)
Applied Deep Learning | University of Bristol Lecture 1 | 24
Applied Deep Learning | University of Bristol Lecture 1 | 25
Will the learning process ever produce a solution?
x1 x2 f*
1 1 1 1 1 1
x0 x1 x2 parameters f f* update
(0,0,0) 1
(1,0,0)
1 (1,0,0)
1 (-1,1,0)
(0,1,0) 1
(1,0,0)
1 (1,1,0)
1 (-1,0,1)
(0,1,1) 1
(1,0,0)
1 (1,1,1) 1 1 (0,0,0)
1 (1,1,1) 1 1 (0,0,0)
1 1 (1,1,1) 1
(1,-1,-1)
1 (1,0,0)
1 (-1,1,0)
1 (1,1,0)
1 (-1,0,1) ... ... ... ... ... ... ...
learning progress sampling some (x,f*)
*
Applied Deep Learning | University of Bristol Lecture 1 | 26
NO hyper plane separates the two classes. Single-Layer Perceptrons (SLPs) can only learn linearly separable problems. x1 x2 positive sign area negative sign area
x w
T
x w
T
1 1
Applied Deep Learning | University of Bristol Lecture 1 | 27
Example of a hyper surface that separates the two classes. x1 x2 positive sign area negative sign area
(x) f (x) f
1 1
Applied Deep Learning | University of Bristol Lecture 1 | 28
Applied Deep Learning | University of Bristol Lecture 1 | 29
... ... ... ... ... ... ... ... W ... W W ) W )...; W ); W ; f ( f ( f (... f ))...) f ) W (( g ) W (( g (... g f
2 1 N 2 1 1 2 1 1 2 2 l ij N parameters all layer input N T T N layer
N
w
depth
W
f0 f1 f2 ... ... ... ... ...
1
2
w
N
w
input layer
network width d first hidden layer network depth N weight wij
l connects
the ith neuron in layer l-1 to the jth neuron in layer l f0
N
f1
N
f2
N
...
NOTATION: superscript usually refers to layer number, subscript to position in layer NOTATION: bold math-script used for tensors of
(basically 3D arrays or higher)
Applied Deep Learning | University of Bristol Lecture 1 | 30
Outlook:
Applied Deep Learning | University of Bristol Lecture 1 | 31
with one hidden layer (networks may be very wide requiring an exponential number of hidden neurons compared to input).
continuous function [Cybenko 1989; Hornik et al. 1989].
mathematical function [Cybenko 1988]. long-standing optimism about the potential of neural networks to model learning and intelligent systems question arises: why use more than two hidden layers – why is `deep’ advantageous at all? (see Lecture 4)
Applied Deep Learning | University of Bristol Lecture 1 | 32
source: Ian Goodfellow, www.deeplearningbook.org
Applied Deep Learning | University of Bristol Lecture 1 | 33
source: Ian Goodfellow, www.deeplearningbook.org
Applied Deep Learning | University of Bristol Lecture 1 | 34
“It is only after much hesitation that the writer has reconciled himself to the addition of the term "neurodynamics" to the list of such recent linguistic artifacts as "cybernetics", "bionics", "autonomics", "biomimesis", "synnoetics", "intelectronics", and "robotics". It is hoped that by selecting a term which more clearly delimits our realm of interest and indicates its relationship to traditional academic disciplines, the underlying motivation
successfully communicated.”
from “Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms”, Spartan Books, 1962
source: www.lmtech.info
Applied Deep Learning | University of Bristol Lecture 1 | 35