9.54 class 4 Supervised learning Shimon Ullman + Tomaso Poggio - - PowerPoint PPT Presentation

9 54 class 4
SMART_READER_LITE
LIVE PREVIEW

9.54 class 4 Supervised learning Shimon Ullman + Tomaso Poggio - - PowerPoint PPT Presentation

9.54 class 4 Supervised learning Shimon Ullman + Tomaso Poggio Danny Harari + Daneil Zysman + Darren Seibert 9.54, fall semester 2014 Intro 9.54, fall semester 2014 An old and simple model of supervised learning associate b to a


slide-1
SLIDE 1

9.54, fall semester 2014

9.54 class 4

Shimon Ullman + Tomaso Poggio

Danny Harari + Daneil Zysman + Darren Seibert

Supervised learning

slide-2
SLIDE 2

9.54, fall semester 2014

Intro

slide-3
SLIDE 3

9.54, fall semester 2014

An old and simple model of supervised learning

  • φb,a(x) = b ∗ a =

Z b(ξ)a(x − ξ)dξ

associate b to a and store: retrieve output b from input a — if

a φb,a(x) = Z a(τ)φb,a(τ + x)dτ ⇡ a a a ⇡ a

slide-4
SLIDE 4

9.54, fall semester 2014

An old and simple model of supervised learning

  • when

retrieve output b from input a — if

φ(x) = X bi ∗ ai aj φ ⇡ bj aj ai ⇡ δi,j

It is a special case…

slide-5
SLIDE 5

9.54, fall semester 2014

Linear

slide-6
SLIDE 6

9.54, fall semester 2014

“Linear” learning

  • Suppose

Find linear operator (eg a matrix) such that Define

(x1, · · · , xN) = X and (y1, · · · , yN) = Y xi ∈ Rn and yi ∈ Rm, i = 1, · · · , N MX = Y

slide-7
SLIDE 7

9.54, fall semester 2014

“Linear” learning

  • MX = Y

If X−1 exists, then

= ⇒ M = Y X−1

If X−1 does not exists, then

MX = Y = ⇒ M = Y X†

where the pseudo inverse is the solution of and if X is full column rank

min ||MX − Y ||F

||A||F = p ( X

i,j

|ai,j|2) with

X† = (XT X)−1XT

slide-8
SLIDE 8

9.54, fall semester 2014

“Linear” learning is linear regression

  • If

e.g. the output y is scalar, then

m = 1

Mx = y = ⇒ y = mT x = X

i

mixi

with M = XY −1

slide-9
SLIDE 9

9.54, fall semester 2014

Nonlinear

slide-10
SLIDE 10

10

Nonlinear learning

  • Suppose

Find operator N such that Define

(x1, · · · , xN) = X and (y1, · · · , yN) = Y xi ∈ Rn and yi ∈ Rm, i = 1, · · · , N N X = Y

In general impossible but…assume N is in the class of polynomial mappings of degree k in the vector space V (over the real field)…eg N has a convergent Taylor series expansion Weierstrass theorem ensures approximation of any continuous function

slide-11
SLIDE 11

11

Nonlinear learning

  • Y = Lo + L1(X) + L2(X, X) + ... + Lk(X, ..., X)

f(x) is a polynomial with all monomials as in this 2D example

y = a1x1 + a2x2 + b1x2

1 + b12x1x2 + · · ·

slide-12
SLIDE 12

9.54, fall semester 2014

Classification and Regression

slide-13
SLIDE 13

13

slide-14
SLIDE 14
slide-15
SLIDE 15

y = sign(Mx)

slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18

In our language: is L1 enough?

slide-19
SLIDE 19

y = sign(L1x + L2(x, x)) = sign(a1u1 + a2u2 + bu1u2) = sign(u1u2)

XOR function is in fact enough. This corresponds to a universal, one-hidden layer network all monomials input variables

  • utput layer
slide-20
SLIDE 20
slide-21
SLIDE 21

A few non-standard remarks

  • Regression is king, Gauss knew everything…
  • Perhaps no need of multiple layers…are 2 layers universal?
  • An interesting junction here

MLPs RBFs

slide-22
SLIDE 22

9.54, fall semester 2014

Radial Basis Functions

slide-23
SLIDE 23

Nonlinear learning

  • e||ˆ

xk−x||2 = ∞

X

n=0

||ˆ xk − x||2n n! Later we will see that RBF expansions are a good approximation

  • f functions in high dimensions:
  • RBF can be written as a 1-hidden layer

network

  • RBF is a rewriting of our polynomial (infinite

radius of convergence)

N

X

k=1

cke−||xk−x||2

slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27

9.54, fall semester 2014

Memory-based computation

f(x) = X

i

ciG(x, xi) = X

i

cie− ||x−xi||2

2σ2

The training set is

  • Suppose now that

: then it is a

(x1, · · · , xN) = X and (y1, · · · , yN) = Y

e− ||x−xi||2

2σ2

→ δ(x − xi) f(x) = ( y, if x = xi 0, if x 6= xi

memory, a lookup table

slide-28
SLIDE 28

9.54, fall semester 2014

Memory-based computation Of course learning is much more than memory but in this model the difference is between a Gaussian and a delta function

slide-29
SLIDE 29

From Learning-from-Examples to 
 View-based Networks for Object Recognition

VIEW ANGLE

Σ

Poggio, Edelman Nature, 1990.

f(x) = X

i

ciG(x, xi) = X

i

cie− ||x−xi||2

2σ2

slide-30
SLIDE 30

Recording Sites in Anterior IT

Logothetis, Pauls, and Poggio, 1995

slide-31
SLIDE 31

9.54, fall semester 2014

Garfield

slide-32
SLIDE 32

Image Analysis

⇒ Bear (0° view)

⇒ Bear (45° view)

slide-33
SLIDE 33

Image Synthesis

UNCONVENTIONAL GRAPHICS

Θ = 0° view ⇒ Θ = 45° view ⇒

slide-34
SLIDE 34

34

slide-35
SLIDE 35

9.54, fall semester 2014

Hyperbf

slide-36
SLIDE 36

36

slide-37
SLIDE 37

9.54, fall semester 2014

Cartooon male

slide-38
SLIDE 38

A toy problem: Gender Classification

slide-39
SLIDE 39

Brunelli, Poggio ’91 (IRST, MIT)

slide-40
SLIDE 40

An example: HyperBF and gender classification Some of the geometrical feature (white) used in the gender classification experiments

slide-41
SLIDE 41

HyperBF and gender classification Typical stimuli used in the (informal!) psychophysical experiments of gender classification (about 90% correct)

slide-42
SLIDE 42

Figure 3: Feature weights for gender classification as computed by the HyperBF networks

slide-43
SLIDE 43
slide-44
SLIDE 44
slide-45
SLIDE 45
slide-46
SLIDE 46

9.54, fall semester 2014

Radial Basis Functions and MLPs

slide-47
SLIDE 47

Sigmoidal units are radial basis functions (for normalized inputs)

  • If

σ(w · x + b)

||x|| = 1

and thus is a radial function Consider the MLP units

Since

||x − w||2 = ||x||2 + ||w||2 − 2(x · w)

(x · w) = 1 + ||w||2 − ||x − w||2 2

σ(x · w − θ) = 1 1 + e−(x·w−θ)

slide-48
SLIDE 48

Sigmoidal units are radial basis functions (for normalized inputs)

  • The corresponding radial function is
slide-49
SLIDE 49

Sigmoidal units are radial basis functions (for normalized inputs)

slide-50
SLIDE 50
slide-51
SLIDE 51
slide-52
SLIDE 52
slide-53
SLIDE 53