9.54 class 4 Supervised learning Shimon Ullman + Tomaso Poggio - - PowerPoint PPT Presentation

▶

Jun 04, 2023 519 likes •1.08k views

9.54 class 4 Supervised learning Shimon Ullman + Tomaso Poggio Danny Harari + Daneil Zysman + Darren Seibert 9.54, fall semester 2014 Intro 9.54, fall semester 2014 An old and simple model of supervised learning associate b to a

SLIDE 1

9.54, fall semester 2014

9.54 class 4

Shimon Ullman + Tomaso Poggio

Danny Harari + Daneil Zysman + Darren Seibert

Supervised learning

SLIDE 2

9.54, fall semester 2014

Intro

SLIDE 3

9.54, fall semester 2014

An old and simple model of supervised learning

φb,a(x) = b ∗ a =

Z b(ξ)a(x − ξ)dξ

associate b to a and store: retrieve output b from input a — if

a φb,a(x) = Z a(τ)φb,a(τ + x)dτ ⇡ a a a ⇡ a

SLIDE 4

9.54, fall semester 2014

An old and simple model of supervised learning

when

retrieve output b from input a — if

φ(x) = X bi ∗ ai aj φ ⇡ bj aj ai ⇡ δi,j

It is a special case…

SLIDE 5

9.54, fall semester 2014

Linear

SLIDE 6

9.54, fall semester 2014

“Linear” learning

Suppose

Find linear operator (eg a matrix) such that Define

(x1, · · · , xN) = X and (y1, · · · , yN) = Y xi ∈ Rn and yi ∈ Rm, i = 1, · · · , N MX = Y

SLIDE 7

9.54, fall semester 2014

“Linear” learning

MX = Y

If X−1 exists, then

= ⇒ M = Y X−1

If X−1 does not exists, then

MX = Y = ⇒ M = Y X†

where the pseudo inverse is the solution of and if X is full column rank

min ||MX − Y ||F

||A||F = p ( X

i,j

|ai,j|2) with

X† = (XT X)−1XT

SLIDE 8

9.54, fall semester 2014

“Linear” learning is linear regression

e.g. the output y is scalar, then

m = 1

Mx = y = ⇒ y = mT x = X

i

mixi

with M = XY −1

SLIDE 9

9.54, fall semester 2014

Nonlinear

SLIDE 10

Nonlinear learning

Suppose

Find operator N such that Define

(x1, · · · , xN) = X and (y1, · · · , yN) = Y xi ∈ Rn and yi ∈ Rm, i = 1, · · · , N N X = Y

In general impossible but…assume N is in the class of polynomial mappings of degree k in the vector space V (over the real field)…eg N has a convergent Taylor series expansion Weierstrass theorem ensures approximation of any continuous function

SLIDE 11

Nonlinear learning

Y = Lo + L1(X) + L2(X, X) + ... + Lk(X, ..., X)

f(x) is a polynomial with all monomials as in this 2D example

y = a1x1 + a2x2 + b1x2

1 + b12x1x2 + · · ·

SLIDE 12

9.54, fall semester 2014

Classification and Regression

SLIDE 13

SLIDE 14

SLIDE 15

y = sign(Mx)

SLIDE 16

SLIDE 17

SLIDE 18

In our language: is L1 enough?

SLIDE 19

y = sign(L1x + L2(x, x)) = sign(a1u1 + a2u2 + bu1u2) = sign(u1u2)

XOR function is in fact enough. This corresponds to a universal, one-hidden layer network all monomials input variables

utput layer

SLIDE 20

SLIDE 21

A few non-standard remarks

Regression is king, Gauss knew everything…
Perhaps no need of multiple layers…are 2 layers universal?
An interesting junction here

MLPs RBFs

SLIDE 22

9.54, fall semester 2014

Radial Basis Functions

SLIDE 23

Nonlinear learning

e||ˆ

xk−x||2 = ∞

n=0

||ˆ xk − x||2n n! Later we will see that RBF expansions are a good approximation

f functions in high dimensions:
RBF can be written as a 1-hidden layer

network

RBF is a rewriting of our polynomial (infinite

radius of convergence)

N

X

k=1

cke−||xk−x||2

SLIDE 24

SLIDE 25

SLIDE 26

SLIDE 27

9.54, fall semester 2014

Memory-based computation

f(x) = X

i

ciG(x, xi) = X

i

cie− ||x−xi||2

2σ2

The training set is

Suppose now that

: then it is a

(x1, · · · , xN) = X and (y1, · · · , yN) = Y

e− ||x−xi||2

2σ2

→ δ(x − xi) f(x) = ( y, if x = xi 0, if x 6= xi

memory, a lookup table

SLIDE 28

9.54, fall semester 2014

Memory-based computation Of course learning is much more than memory but in this model the difference is between a Gaussian and a delta function

SLIDE 29

From Learning-from-Examples to   View-based Networks for Object Recognition

VIEW ANGLE

Σ

Poggio, Edelman Nature, 1990.

f(x) = X

i

ciG(x, xi) = X

i

cie− ||x−xi||2

2σ2

SLIDE 30

Recording Sites in Anterior IT

Logothetis, Pauls, and Poggio, 1995

SLIDE 31

9.54, fall semester 2014

Garfield

SLIDE 32

Image Analysis

⇒ Bear (0° view)

⇒ Bear (45° view)

SLIDE 33

Image Synthesis

UNCONVENTIONAL GRAPHICS

Θ = 0° view ⇒ Θ = 45° view ⇒

SLIDE 34

SLIDE 35

9.54, fall semester 2014

Hyperbf

SLIDE 36

SLIDE 37

9.54, fall semester 2014

Cartooon male

SLIDE 38

A toy problem: Gender Classification

SLIDE 39

Brunelli, Poggio ’91 (IRST, MIT)

SLIDE 40

An example: HyperBF and gender classification Some of the geometrical feature (white) used in the gender classification experiments

SLIDE 41

HyperBF and gender classification Typical stimuli used in the (informal!) psychophysical experiments of gender classification (about 90% correct)

SLIDE 42

Figure 3: Feature weights for gender classification as computed by the HyperBF networks

SLIDE 43

SLIDE 44

SLIDE 45

SLIDE 46

9.54, fall semester 2014

Radial Basis Functions and MLPs

SLIDE 47

Sigmoidal units are radial basis functions (for normalized inputs)

σ(w · x + b)

||x|| = 1

and thus is a radial function Consider the MLP units

Since

||x − w||2 = ||x||2 + ||w||2 − 2(x · w)

(x · w) = 1 + ||w||2 − ||x − w||2 2

σ(x · w − θ) = 1 1 + e−(x·w−θ)

SLIDE 48

Sigmoidal units are radial basis functions (for normalized inputs)

The corresponding radial function is

SLIDE 49

Sigmoidal units are radial basis functions (for normalized inputs)

SLIDE 50

SLIDE 51

SLIDE 52

SLIDE 53

9.54 class 4

Supervised learning

Z b(ξ)a(x − ξ)dξ

a φb,a(x) = Z a(τ)φb,a(τ + x)dτ ⇡ a a a ⇡ a

φ(x) = X bi ∗ ai aj φ ⇡ bj aj ai ⇡ δi,j

(x1, · · · , xN) = X and (y1, · · · , yN) = Y xi ∈ Rn and yi ∈ Rm, i = 1, · · · , N MX = Y

= ⇒ M = Y X−1

MX = Y = ⇒ M = Y X†

min ||MX − Y ||F

m = 1

Mx = y = ⇒ y = mT x = X

i

mixi

(x1, · · · , xN) = X and (y1, · · · , yN) = Y xi ∈ Rn and yi ∈ Rm, i = 1, · · · , N N X = Y

y = a1x1 + a2x2 + b1x2

1 + b12x1x2 + · · ·

y = sign(Mx)

In our language: is L1 enough?

A few non-standard remarks

network

radius of convergence)

N

X

k=1

cke−||xk−x||2

f(x) = X

i

ciG(x, xi) = X

i

cie− ||x−xi||2

(x1, · · · , xN) = X and (y1, · · · , yN) = Y

e− ||x−xi||2

→ δ(x − xi) f(x) = ( y, if x = xi 0, if x 6= xi

From Learning-from-Examples to View-based Networks for Object Recognition

VIEW ANGLE

Σ

f(x) = X

i

ciG(x, xi) = X

i

cie− ||x−xi||2

Recording Sites in Anterior IT

⇒ Bear (0° view)

Image Synthesis

UNCONVENTIONAL GRAPHICS

Θ = 0° view ⇒ Θ = 45° view ⇒

A toy problem: Gender Classification

Brunelli, Poggio ’91 (IRST, MIT)

An example: HyperBF and gender classification Some of the geometrical feature (white) used in the gender classification experiments

HyperBF and gender classification Typical stimuli used in the (informal!) psychophysical experiments of gender classification (about 90% correct)

Figure 3: Feature weights for gender classification as computed by the HyperBF networks

σ(w · x + b)

||x|| = 1

||x − w||2 = ||x||2 + ||w||2 − 2(x · w)

(x · w) = 1 + ||w||2 − ||x − w||2 2

σ(x · w − θ) = 1 1 + e−(x·w−θ)

From Learning-from-Examples to   View-based Networks for Object Recognition