Instance-based Learning
CE-717: Machine Learning
Sharif University of Technology
- M. Soleymani
Instance-based Learning CE-717: Machine Learning Sharif University - - PowerPoint PPT Presentation
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2018 Outline } Non-parametric approach } Unsupervised: Non-parametric density estimation } Parzen Windows } Kn-Nearest Neighbor Density
2
} Parzen Windows } Kn-Nearest Neighbor Density Estimation
} Classification
¨ kNN classification ¨ Weighted (or kernel) kNN
} Regression
¨ kNN regression ¨ Locally linear weighted regression
3
} Parametric density functions cannot usually fit the densities we
} e.g., parametric densities are unimodal.
} Non-parametric methods don't assume that the model (from) of
} Estimate 𝑞(𝒚|𝒟&) from & using non-parametric density estimation
} Estimate 𝑞(𝒟&|𝒚) from
4
} Learning: finding parameters from data
} Training examples are explicitly used
} Training phase is not required
5
} 𝑙.(𝑐+): number of samples (among n ones) lied in the bin 𝑐+
6
} 𝑄I = 𝑜
} 𝐹 𝑙 = 𝑜𝑄
} 𝑙 ≈ 𝑜𝑄 ⇒ I
. as an estimate for 𝑄
} More accurate for larger 𝑜
7
ℛ
8
} 𝑊
.: the volume of region around 𝒚
} 𝑙.: the number of samples falling in the region
9
} Volume grows until it contains k neighbors of 𝒚 } converges to the true probability density in the limit 𝑜 → ∞ when k
} Number of points falling inside the volume can vary from point to
} converges to the true probability density in the limit 𝑜 → ∞ when V
. = 𝑊 H/ 𝑜
10
} Hyper-cubes with length of side ℎ (i.e., volume ℎ[) are located on the
e
} 𝑙. = ∑
𝒚J𝒚(h) <e . &GH } 𝑊 . = ℎ. [ −1/2 1/2
−1/2 1/2
11
12
@<o
13
14
} Too large: low resolution } Too small: much variability
15
16
} for a large enough number of samples, the smaller ℎ the better the
} ℎ can be set using techniques like cross-validation where the density
17
} If 𝑜 equidistant points are required to densely fill a one-dim interval,
¨ We need an exponentially large quantity of training data to ensure that the
18
19
. is a function of 𝒚
20
21
22
} With enough samples, convergence to an arbitrarily complicated target
} grows exponentially with the dimensionality of the feature space
23
24
w ew× w xy ∑
x
w eo× w xy ∑
x
} }: set of training samples labels as 𝒟
}
25
26
27
28
29
} 𝑄 𝒟
} 𝒚 ≈ I‹ I where 𝑙} shows the number of training samples among
}
30
} All points in a cell are labeled by the category of the corresponding
31
32
} Decision rule: 𝑧
&GH,…,Œ
ŒŒ = lim .→W 𝑆. ŒŒ
33
’’
34
35
36
@
D @ + ⋯ + 𝑦[ − 𝑦[ D @
} 𝑒𝒙 𝒚, 𝒚′ =
D @ + ⋯ + 𝑥[ 𝑦[ − 𝑦[ D @
} 𝑒𝑩 𝒚, 𝒚′ =
D — [ &GH
˜
37
D @ + 𝑦@ − 𝑦@ D @
D @ + 3 𝑦@ − 𝑦@ D @
38
39
ŠGH,…,š
}(𝒚)×𝐽(𝑑 = 𝑧 } ) }∈Œ•(𝒚)
}(𝒚) =
ŠGH,…,š
}(𝒚)×𝐽(𝑑 = 𝑧 } ) . }GH
} e.g., 𝐿(𝒚, 𝒚(})) = 𝑓Jy(𝒚,𝒚(‹))
Ÿo
40
41
} Solution:Weighted (or kernel) regression
} flats the ends
42
43
‹∈’•(𝒚)
‹∈’•(𝒚)
44
H £@ of x-axis width
H £@ of x-axis width
H H¤ of x-axis width
45
46
@ &∈Œ•(𝒚)
47
@ &∈Œ•(𝒚)
@ &∈Œ•(𝒚)
𝒚{𝒚 h
48
H H¤ of x-axis width
H £@ of x-axis width
H § of x-axis width
49
} Fit a parametric model (e.g. linear function) to the neighbors of 𝒚 (or on
} Implicit assumption: the target function is reasonably smooth.
50
} Need to keep around support vectors (possibly all of the training
51
} prediction on a new data based on the training data themselves
} kd-tree, Locally Sensitive Hashing (LSH), and other kNN approximations
52