Summary
◮ Linearly separable classification problems. ◮ Logistic loss ℓlog and (empirical) risk Rlog. ◮ Gradient descent.
20 / 68
Summary Linearly separable classification problems. Logistic loss - - PowerPoint PPT Presentation
Summary Linearly separable classification problems. Logistic loss log and (empirical) risk R log . Gradient descent. 20 / 68 (Slide from last time) Classification For now, lets consider binary classification: Y = { 1 , +1
20 / 68
i=1, a predictor w ∈ Rd,
21 / 68
n
Txi);
22 / 68
n
Txi);
22 / 68
2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0
2 .
.
. . 4 . 8 . 1 2 . 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0
0.000 0.400 0.800 1.200
23 / 68
2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0
2 .
.
. . 4 . 8 . 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0
. 2
8
4 . . 4 . 8 1 . 2
24 / 68
10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 2.000 4.000 6 . 8 . 10.000 12.000 1 4 .
25 / 68
10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 2.000 4.000 6 . 8 . 10.000 12.000 1 4 .
25 / 68
10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 2.000 4.000 6 . 8 . 10.000 12.000 1 4 .
25 / 68
log(z) = −1 1+exp(z), and use the chain rule (hw1!).
26 / 68
n
i=1 ln(1 + exp(−yiwTxi)).
27 / 68
28 / 68
29 / 68
y∈Y
30 / 68
Tw),
2 4 6 0.2 0.4 0.6 0.8 1
31 / 68
n
n
n
Txi)) + (1 − yi) ln(1 + exp(w Txi))
n
Txi)),
32 / 68
Tβ,
1Some authors allow affine function; we can get this using affine expansion.
33 / 68
Tβ,
1Some authors allow affine function; we can get this using affine expansion.
33 / 68
Tβ,
1Some authors allow affine function; we can get this using affine expansion.
33 / 68
34 / 68
34 / 68
34 / 68
x
2 4
2 4 0.05 0.1 0.15
2 4
2 4
R1 R2
P(ω2)=.5 P(ω1)=.5
34 / 68
35 / 68
35 / 68
35 / 68
35 / 68
36 / 68
2 A−1(x−µ1)2 2
2 A−1(x−µ0)2 2
2 + 1
2
2 − A−1µ02 2)
T(AA T)−1x
36 / 68
2 A−1(x−µ1)2 2
2 A−1(x−µ0)2 2
2 + 1
2
2 − A−1µ02 2)
T(AA T)−1x
36 / 68
2 A−1(x−µ1)2 2
2 A−1(x−µ0)2 2
2 + 1
2
2 − A−1µ02 2)
T(AA T)−1x
36 / 68
2 A−1(x−µ1)2 2
2 A−1(x−µ0)2 2
2 + 1
2
2 − A−1µ02 2)
T(AA T)−1x
36 / 68
W ∈Rd×kAW − B2
F with B ∈ Rn×k;
37 / 68
≥0 : i pi = 1},
k
k
entropy
k
j=1 exp(f(x)j)
j=1 exp(f(x)j)
k
38 / 68
j
j=y f(x)j,
j zj ≈ maxj zj, cross-entropy satisfies
j
39 / 68
Tx)) = ℓce(ey, W Tx). 40 / 68
41 / 68