[PPT] - What is modeling? NEU 466M Instructor: Professor Ila R. PowerPoint Presentation

SLIDE 1

What ¡is ¡modeling? ¡ ¡

NEU ¡466M ¡ Instructor: ¡Professor ¡Ila ¡R. ¡Fiete ¡ Spring ¡2016 ¡

SLIDE 2

NEURAL ¡NETWORKS ¡FOR ¡PATTERN ¡ RECOGNITION, ¡CHRISOPHER ¡BISHOP ¡

Reference: ¡

hEp://cs.du.edu/~mitchell/mario_books/Neural_Networks_for_PaEern_RecogniLon_-‑_Christopher_Bishop.pdf ¡

SLIDE 3

What ¡does ¡modeling ¡mean? ¡

example of ‘a’ example of ‘b’ Pixels xi with values 1 or 0 (black or white).

SLIDE 4

What ¡does ¡modeling ¡mean? ¡

example of ‘a’ example of ‘b’ What is ‘a’-ness, versus ‘b’-ness?

SLIDE 5

Equivalent ¡problem ¡encountered ¡by ¡electrophysiologists ¡

Categorize ¡recorded ¡spike ¡as ¡coming ¡from ¡neuron ¡a ¡or ¡b ¡

→ ‘a’ ‘b’

figure ¡from ¡Quian ¡Quiroga ¡

SLIDE 6

What ¡does ¡modeling ¡mean? ¡

example of ‘a’ example of ‘b’ What is ‘a’-ness, versus ‘b’-ness?

SLIDE 7

Model: ¡relaLonship ¡between ¡data ¡and ¡ its ¡category ¡

256 × 256 pixels : N = 65536

Store every image with its letter label?

{x1, x2, · · · , xN} → ‘a’ {x0

1, x0 2, · · · , x0 N} → ‘b’

SLIDE 8

Model: ¡store ¡every ¡possible ¡image ¡ with ¡corresponding ¡leEer ¡label? ¡

256 × 256 pixels : N = 65536

→ ‘a’ ‘b’

Number of 256 × 256 bw images: 265536 ∼ 1020000

Atoms in universe: ∼ 1080 Houston, ¡we ¡have ¡a ¡problem. ¡ ¡

SLIDE 9

Storing ¡each ¡data, ¡category ¡pair ¡

Need ¡too ¡many ¡examples/data ¡to ¡fill ¡grid ¡between ¡

inputs ¡to ¡categories! ¡“Curse ¡of ¡dimensionality” ¡

Too ¡much ¡data ¡to ¡store! ¡ ¡

à ¡Compactness ¡ ¡

Not ¡predicLve: ¡What ¡to ¡do ¡with ¡new ¡example? ¡ ¡

¡ à ¡Generalizability ¡ ¡

SLIDE 10

What ¡we ¡want ¡from ¡a ¡model: ¡compactness ¡and ¡

generalizability. ¡

SLIDE 11

One ¡soluLon: ¡feature ¡selecLon ¡

Look ¡at ¡some ¡much ¡smaller ¡set ¡of ¡

characterisLc ¡features ¡that ¡define ¡the ¡classes. ¡

How ¡to ¡choose ¡these? ¡ ¡

¡-‑ ¡by ¡“hand” ¡ ¡-‑ ¡some ¡“automaLc” ¡technique ¡

(sounds ¡magical ¡but ¡this ¡is ¡goal ¡of ¡much ¡staLsLcs ¡and ¡machine ¡learning; ¡ ¡ we ¡will ¡consider ¡how ¡automaLcally ¡find ¡features ¡in ¡this ¡class) ¡

SLIDE 12

Features ¡

˜ x1 : height-to-width ratio of object ˜ x2 : some other feature

SLIDE 13

Features ¡

˜ x1 : height-to-width ratio of object ˜ x2 : some other feature

: ‘a’ × : ‘b’

SLIDE 14

Features ¡

: ‘a’ × : ‘b’

˜ x1 only would lead to poor categorization More features can be helpful:

SLIDE 15

Features ¡

If ¡adding ¡features ¡improves ¡performance, ¡

keep ¡adding ¡independent ¡features? ¡

Will ¡this ¡conLnue ¡to ¡improve ¡performance? ¡

At ¡some ¡point, ¡NO! ¡Performance ¡will ¡get ¡worse. ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡WHY? ¡

SLIDE 16

A ¡more ¡familiar ¡example: ¡regression ¡

Instead ¡of ¡discrete ¡categories ¡(‘a’, ¡’b’), ¡each ¡

datapoint ¡(or ¡data ¡vector) ¡maps ¡to ¡some ¡value ¡

f ¡a ¡conLnuous ¡variable ¡(y). ¡ ¡

¡

{(x1, y1), (x2, y2), · · · , (xN, yN)}

SLIDE 17

{(x1, y1), (x2, y2), · · · , (xN, yN)} x1 independent variable y1 response or dependent variable

SLIDE 18

Modeling ¡as ¡regression ¡

{(x1, y1), (x2, y2), · · · , (xN, yN)}

What ¡does ¡it ¡mean ¡to ¡model ¡this ¡data? ¡ ¡

‑ ¡ ¡Want ¡to ¡write ¡y ¡as ¡some ¡funcLon ¡of ¡x ¡
‑ Want ¡to ¡fit ¡a ¡funcLon ¡through ¡x, ¡y ¡ ¡
‑ Given ¡x ¡want ¡to ¡predict ¡y ¡

SLIDE 19

Regression: ¡curve-‑fieng ¡

{(x1, y1), (x2, y2), · · · , (xN, yN)} free parameters: (w0, w1, · · · , wM) ˜ y(x) = w0 + w1x + · · · + wMxM =

M

X

j=0

wjxj

SLIDE 20

Polynomial ¡regression ¡

The ¡larger ¡M, ¡the ¡higher-‑degree ¡the ¡polynomial ¡

à ¡more ¡complex ¡model/more ¡features. ¡ ¡

Expect ¡fit ¡to ¡get ¡beEer ¡with ¡increasing ¡M. ¡ ¡

When ¡M ¡= ¡N, ¡then ¡exact ¡fit ¡to ¡all ¡datapoints ¡(b/c ¡ Mth ¡order ¡polynomial ¡has ¡M+1 ¡parameters, ¡M ¡ roots). ¡ ¡

So ¡are ¡the ¡more-‑complex ¡models ¡beEer? ¡ ¡

SLIDE 21

Parameters ¡chosen ¡to ¡minimize ¡fit ¡error ¡

Common ¡error ¡funcLon: ¡sum-‑of-‑squares: ¡ ¡

(How ¡to ¡implement? ¡Matlab: ¡polyfit. ¡Theory: ¡we’ll ¡get ¡to ¡it.) ¡ (Is ¡this ¡the ¡only ¡choice? ¡No. ¡Best ¡choice? ¡InteresLng ¡q: ¡we’ll ¡get ¡to ¡it.) ¡

w∗ = arg min

w

1 2

N

X

n=1

[˜ y(xn; w) − yn] E = 1 2

N

X

n=1

[˜ y(xn; w) − yn]

SLIDE 22

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Degree 1, squared error = 0.45126