Neural Networks Module2 : learning with Gradient Descent module 2: - - PowerPoint PPT Presentation

neural networks
SMART_READER_LITE
LIVE PREVIEW

Neural Networks Module2 : learning with Gradient Descent module 2: - - PowerPoint PPT Presentation

Neural Networks Module2 : learning with Gradient Descent module 2: numerical optimization LEARNING PERFORMANCE REPRESENTATION DATA PROBLEM EVALUATION RAW DATA CLUSTERING FEATURES housing data train/test error, accuracy spam data Cross


slide-1
SLIDE 1

Neural Networks

slide-2
SLIDE 2

Module2 : learning with Gradient Descent

  • formulate problem by model/parameters
  • formulate error as mathematical objective
  • optimize numerically the parameters for the given objective
  • usually algebraic setup
  • involves matrices and calculus
  • probabilistic setup (likelihoods) next module

RAW DATA housing data spam data LABELS FEATURES SUPERVISED LEARNING

numerical optimization Logistic Regression Perceptron Neural Network

CLUSTERING EVALUATION ANALYSIS SELECTION DIMENSIONS DATA PROCESSING TUNING

DATA PROBLEM REPRESENTATION LEARNING PERFORMANCE

train/test error, accuracy Cross Validation ROC

module 2: numerical optimization

slide-3
SLIDE 3

Module 2 Objectives/Neural Networks

  • perceptron rules
  • neural network idea, philosophy, construction
  • NN weights
  • Backpropagation : training NN using gradient

descent

  • NN modes, autoencoders
  • run NN-autoencoder on a simple problem
slide-4
SLIDE 4

The perceptron

slide-5
SLIDE 5

The perceptron

  • (like with regression) we are looking for a linear

classifier

  • error different than regression: weighted sum
  • ver misclassified points set M
slide-6
SLIDE 6

Perceptron - geometry

  • perceptron is a linear (hyperplane) separator
  • for simplicity, will transform data points with

y=-1 (left) to y=1 (right) by reversing the sign

slide-7
SLIDE 7

The perceptron

  • To optimize for perceptron error, use gradient

descent

  • with update rule
  • batch update:
slide-8
SLIDE 8

perceptron update - intuition

  • perceptron update: the plane (dotted red) normal w (red

arrow) moves in the direction of misclassified p1 until p1 is

  • n the correct side.
slide-9
SLIDE 9

Perceptron proof of convergence

  • if data is indeed linearly separable, the

perceptron will find the separator line.

slide-10
SLIDE 10

Multilayer perceptrons

slide-11
SLIDE 11

Checkpoint: XOR perceptron

  • build/explain a 3-layer perceptron that give the

same classification as the logical XOR function

  • your answer is required! Submit via dropbox.
slide-12
SLIDE 12

Neural Networks

  • NN is a stack of

connected perceptrons

  • bottom up:
  • input layer
  • hidden layer
  • output layer
  • multilayer NN very very

powerful in that they can approximate almost any function

  • with enough training

data

slide-13
SLIDE 13

Neural Networks

  • Each unit performs first a

linear combination of inputs

  • Then applies a nonlinear (ex.

logistic) function “f” before

  • utputting a value
  • Three layer NN output can be

expressed mathematically as

slide-14
SLIDE 14

Training the NN weights (w)

  • one datapoint
  • set of weights up (close to output):
  • we obtain the hidden-output weight update rule
slide-15
SLIDE 15

Training the NN weights (w)

  • weight first set of weights (close to input)
slide-16
SLIDE 16

NN training

slide-17
SLIDE 17

Autoencoders

  • network is “rotated”
  • from left to right: input-hidden-ouput
  • input and output are the same values
  • hidden layer encodes the input and decodes back to itself
slide-18
SLIDE 18

BackPropagation (Tom Mitchell book)

slide-19
SLIDE 19