[PDF] - . . : =L distance functions Possible Euclidean . PDF Document

SLIDE 1 Topics Since the Midterm ÷ NN regression and classification

]

Instance . based

learning

. Kernel mgohssien " Peru , than ^ kernel trick , kernel . .←dy←u,¢ ... ]

nline

learning , margin ' based approaches . SVM . K

means

. PCA

]

unsupervised Learning e GMM . EM for GMM . Bayes

ptimal

Classifier ' Naive Bayes

]

structural

models . Bayes ' Nets . Deny learning

÷

N . Predict using avg .

f

K nearest neighbors a

.

. a

.

€

:

=L Possible distance functions . Euclidean

Manhattan

.

ther

SLIDE 2 Bids 1 Variance tradeoff as function

f

K ? Note : KNN regression has discontinuities .

ke-ssiacldsfhdt.tn

' ' s . H ' ' nearest neighbors Given Kernel function kx , I , =

i&Vkt

( ×i , xq ) yc .

i#xx

Kernel bandwidth d : b. ' dsl variance Gaussian kernel : K , ( ' ' i. xq ) = exyf.lt#lh) Boxcar kernel ,

thers

SLIDE 3 Payton

and ;±meqsg" dnjonghisien

boundary

Leg

Training : wcd =c At each t : F+=s:gn( a 't ' . xD If y^+ =Y+ Do nothing Else wct ' ' ) ← wltl + Y+X+ Perception less function : L Cx ,

wI={

if ycx . ntzo ycw.xle.tn . ⇒

[

YG.nl }+ LC x. w )

go.lt#ycw.x

based

a

' stance from boundary

SLIDE 4 Minimize : njn

nt §

,,L(w , xi ) =

Todd C

year

.

#

+ %LCx , D=

yx

|¥w4w,xl=d

←

nlcx ) a-

hfinud

Batch hinge min : it "l

w

't 't 7

,t{

Illydwct

' .x , .l±c]%.xi{

P=tnasq)forh

:#

loss

SLIDE 5 Kernel

yeruytran

÷

a not linearly sez : compute ¢C×|

daily

xtq ) wct '

i.

←

G. aegis

. ⇒ sign ( w # ' xl 's .'gn[;{m+y , . ( x ; .xl } Replace x with DG ) sign ( wl As dcx ) )

ii.

,

!gmxi Cdcxiodcxlltofgajlikcxi

, × )

→

No computing ¢ . This is the ' ' kernel trick ' ' Instead

f

keeping track

f

n 't ' , just remember MCH . Many Kennel functions . . .

SLIDE 6

wxetwail

w.xi.no =

1

Svm :

\

Max

margin

t \ |

Ethan

.

±

If f. §

SVM

bjective

! \ min

llwlljh

s .t . . n , Wa yi ( w . xi + we ) =/ Vie I , ... , N Data not linearly separable : violation =

{

if I

yi(w

. x , + nd < c I

yilwxi

+ well at .w . Trade

ff

margin violation us . llwll ' . Min Hwllah + C§| ( l

yi

( n . xitnd ) + ← dsiectia when data w , we not 1. ' nearly 4 . SVM minimitns

regularized

hinge loss ( similar te perception ) can do SGD , Kennel trick , ' see slicks for details .

SLIDE 7 1£

Means
K

minimize : { C llyj

Xi Had

j=iiiti=j

Choosing K ' . plat distortion , look for " kink "

t.t.pt#

Given data

Xc

. , select a basis Cu , , ... , uu ) K < D , that gives the " best ' ' lower

I

'm projection

f

X÷

Projection

f

xi

is zi . ( ~ assume zii ( till } , ... , zi[ K ] ) , with z ; 4 ' }=x , .u ; data arr maan

centered

Reconstruct x ; from z , as xnc . =y§ , zil ; ]u ; ' ' best " projection ! the

ne

that minimizes

Itai

.

E. in

.

SLIDE 8 Big result : The nesters u , , ... , use that give the best reconstruction are the K eigenvectors

f

←

flex

. .xiT with the

largest

eigenvalwes

. The

egenralms

areyngartionoel

to the variance

f

the data , projected

nto

each ui . Gin 's Generative process ' . ^ Sampler a class Ke

catch

. sample data X , ~N( yk , , Ek ,\ . Likelihood : Pcxiiol = ,,§ , top

Gilt

; , S ;) why GMM : lets you angneseet complex , multimodal distributions . Et ° Motivation ' . GMM has

ld

xe , unwed zi ,

parameters

O={ it ; , µ ; , 9 ; };k= , want to learn , talk best guess at zi .

SLIDE 9 Intuition : Alternate blw

updating

O using a good guess at Z , and

updating

ur

guess at 7 using

ur

current O . Instead

f

making " had ' '

assignments

ti , make " salt ' '

assignments

rij . See slides far more details Pathologies

f

EM : A mixture

component

an collapse

nto

a

single

paint

lelustyry

← cluster 2

*a

Waives Features X , a- tyut class Y . Naive Bayes assumption ' . Fi , ; , X [ i ] 1 X C ' , ] I Y Thus the distribution factories according to . PCX , , ... , xD , YI = MY ) .

!PtM×[d}lA

To

predict

y : I =

argqax

PG ) .#

PlX[d]ly\

SLIDE 10 MLE for Nain Bayes ' . just counts PCXC ;

]=x[

;]

|y=§=

count 2×5 ' ] = xc ; } , y . g)

T.tw

NB is unrealistic , but

ften

performs well in practice .

Bayesiannetwerks

Represent joint distribution as alienated acyclic graph CDAG ) Vertices ! Random variables

Edges

: Conditional dependencies One CPT for each variable ! PC variable

lgarenfs )

PCA @

@

1>431 1×1

\~d

PCxliTPCXilpamntsCXil@PCctt.B

) it , I Put , B. c. DKPCAIPCBI

~@p(

Dlc ) Pktt , 131 PCDK )

SLIDE 11

@

HHerg@

Assume each variable is

)

d binary .

sin@

# parameters needed to specify the joint : io check yourself )

Headd

%@

# yaaamet .rs without knowing

gneyh

structure ' . 25

1=31

. Local Marker

Assumption

: X is independent

f

man

descendants

given parents Headache 1 Flu , Allergy , Nose 1 Sinus

Explaining

away : Flu , allergy are

marginally

independent

. Homer , gyms , they b. came

dependent

cthink about why\ .