. . : =L distance functions Possible Euclidean . - - PDF document

l distance functions possible euclidean manhattan other
SMART_READER_LITE
LIVE PREVIEW

. . : =L distance functions Possible Euclidean . - - PDF document

Midterm the Since Topics based Instance and classification ] learning NN . regression Kernel . mgohssien ... ] ' , margin than Peru " learning , . .dyu, online approaches trick kernel kernel based ^ ,


slide-1
SLIDE 1 Topics Since the Midterm ÷ NN regression and classification

]

Instance . based

learning

. Kernel mgohssien " Peru , than ^ kernel trick , kernel . .←dy←u,¢ ... ]
  • nline
learning , margin ' based approaches . SVM . K
  • means
. PCA

]

unsupervised Learning e GMM . EM for GMM . Bayes
  • ptimal
Classifier ' Naive Bayes

]

structural

models . Bayes ' Nets . Deny learning

÷

N . Predict using avg .
  • f
K nearest neighbors a
  • .
. a

.

:

=L Possible distance functions . Euclidean
  • Manhattan
.
  • ther
slide-2
SLIDE 2 Bids 1 Variance tradeoff as function
  • f
K ? Note : KNN regression has discontinuities .

ke-ssiacldsfhdt.tn

' ' s . H ' ' nearest neighbors Given Kernel function kx , I , =

i&Vkt

( ×i , xq ) yc .

i#xx

Kernel bandwidth d : b. ' dsl variance Gaussian kernel : K , ( ' ' i. xq ) = exyf.lt#lh) Boxcar kernel ,
  • thers
slide-3
SLIDE 3 Payton

and ;±meqsg" dnjonghisien

boundary

Leg

Training : wcd =c At each t : F+=s:gn( a 't ' . xD If y^+ =Y+ Do nothing Else wct ' ' ) wltl + Y+X+ Perception less function : L Cx ,

wI={

if ycx . ntzo ycw.xle.tn .

[

YG.nl }+ LC x. w )

go.lt#ycw.x

based

a

' stance from boundary
slide-4
SLIDE 4 Minimize : njn

nt §

,,L(w , xi ) =

Todd C

  • year
.

#

+ %LCx , D=
  • yx

|¥w4w,xl=d

nlcx ) a-
  • hfinud
Batch hinge min : it "l
  • w
't 't 7

,t{

Illydwct

' .x , .l±c]%.xi{

P=tnasq)forh

:#

loss
slide-5
SLIDE 5 Kernel

yeruytran

÷

a not linearly sez : compute ¢C×|

daily

xtq ) wct '

i.

  • G. aegis
. sign ( w # ' xl 's .'gn[;{m+y , . ( x ; .xl } Replace x with DG ) sign ( wl As dcx ) )

ii.

,

!gmxi Cdcxiodcxlltofgajlikcxi

, × )

No computing ¢ . This is the ' ' kernel trick ' ' Instead
  • f
keeping track
  • f
n 't ' , just remember MCH . Many Kennel functions . . .
slide-6
SLIDE 6

wxetwail

w.xi.no =
  • 1
Svm :
  • \
Max
  • margin
t \ |
  • Ethan
.

±

If f. §

SVM
  • bjective
! \ min

llwlljh

s .t . . n , Wa yi ( w . xi + we ) =/ Vie I , ... , N Data not linearly separable : violation =

{

if I
  • yi(w
. x , + nd < c I
  • yilwxi
+ well at .w . Trade
  • ff
margin violation us . llwll ' . Min Hwllah + C§| ( l
  • yi
( n . xitnd ) + dsiectia when data w , we not 1. ' nearly 4 . SVM minimitns

regularized

hinge loss ( similar te perception ) can do SGD , Kennel trick , ' see slicks for details .
slide-7
SLIDE 7
  • Means
  • K
minimize : { C llyj
  • Xi Had

j=iiiti=j

Choosing K ' . plat distortion , look for " kink "

t.t.pt#

Given data

Xc

. , select a basis Cu , , ... , uu ) K < D , that gives the " best ' ' lower
  • I
'm projection
  • f

Projection
  • f

xi

is zi . ( ~ assume zii ( till } , ... , zi[ K ] ) , with z ; 4 ' }=x , .u ; data arr maan
  • centered
Reconstruct x ; from z , as xnc . =y§ , zil ; ]u ; ' ' best " projection ! the
  • ne
that minimizes

Itai

.
  • E. in
.
slide-8
SLIDE 8 Big result : The nesters u , , ... , use that give the best reconstruction are the K eigenvectors
  • f

flex

. .xiT with the

largest

eigenvalwes

. The

egenralms

areyngartionoel

to the variance
  • f
the data , projected
  • nto
each ui . Gin 's Generative process ' . ^ Sampler a class Ke
  • catch
. sample data X , ~N( yk , , Ek ,\ . Likelihood : Pcxiiol = ,,§ , top

Gilt

; , S ;) why GMM : lets you angneseet complex , multimodal distributions . Et ° Motivation ' . GMM has
  • ld
xe , unwed zi ,

parameters

O={ it ; , µ ; , 9 ; };k= , want to learn , talk best guess at zi .
slide-9
SLIDE 9 Intuition : Alternate blw

updating

O using a good guess at Z , and

updating

  • ur
guess at 7 using
  • ur
current O . Instead
  • f
making " had ' '

assignments

ti , make " salt ' '

assignments

rij . See slides far more details Pathologies
  • f
EM : A mixture

component

an collapse
  • nto
a

single

paint

lelustyry

cluster 2

*a

Waives Features X , a- tyut class Y . Naive Bayes assumption ' . Fi , ; , X [ i ] 1 X C ' , ] I Y Thus the distribution factories according to . PCX , , ... , xD , YI = MY ) .

!PtM×[d}lA

To

predict

y : I =

argqax

PG ) .#

PlX[d]ly\

slide-10
SLIDE 10 MLE for Nain Bayes ' . just counts PCXC ;

]=x[

;]

|y=§=

count 2×5 ' ] = xc ; } , y . g)

T.tw

NB is unrealistic , but
  • ften
performs well in practice .

Bayesiannetwerks

Represent joint distribution as alienated acyclic graph CDAG ) Vertices ! Random variables

Edges

: Conditional dependencies One CPT for each variable ! PC variable

lgarenfs )

PCA @

@

1>431 1×1

\~d

PCxliTPCXilpamntsCXil@PCctt.B

) it , I Put , B. c. DKPCAIPCBI

~@p(

Dlc ) Pktt , 131 PCDK )
slide-11
SLIDE 11

@

HHerg@

Assume each variable is

)

d binary .

sin@

# parameters needed to specify the joint : io check yourself )

Headd

%@

# yaaamet .rs without knowing

gneyh

structure ' . 25
  • 1=31
. Local Marker

Assumption

: X is independent
  • f
man
  • descendants
given parents Headache 1 Flu , Allergy , Nose 1 Sinus

Explaining

away : Flu , allergy are

marginally

independent

. Homer , gyms , they b. came

dependent

cthink about why\ .