Midterm review
CS 446
Midterm review CS 446 1. Lecture review (Lec1.) Basic setting: - - PowerPoint PPT Presentation
Midterm review CS 446 1. Lecture review (Lec1.) Basic setting: supervised learning Training data : labeled examples ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x n , y n ) 1 / 61 (Lec1.) Basic setting: supervised learning Training data : labeled
CS 446
1 / 61
1 / 61
1 / 61
1 / 61
i=1
i=1 “closest” to x
2 / 61
3 / 61
3 / 61
4 / 61
x1 > 1.7 x2 > 2.8 ˆ y = 1 ˆ y = 2 ˆ y = 3
4 / 61
sepal length/width 1.5 2 2.5 3 petal length/width 2 2.5 3 3.5 4 4.5 5 5.5 6
5 / 61
sepal length/width 1.5 2 2.5 3 petal length/width 2 2.5 3 3.5 4 4.5 5 5.5 6
ˆ y = 2
5 / 61
sepal length/width 1.5 2 2.5 3 petal length/width 2 2.5 3 3.5 4 4.5 5 5.5 6
x1 > 1.7
5 / 61
sepal length/width 1.5 2 2.5 3 petal length/width 2 2.5 3 3.5 4 4.5 5 5.5 6
x1 > 1.7 ˆ y = 1 ˆ y = 3
5 / 61
sepal length/width 1.5 2 2.5 3 petal length/width 2 2.5 3 3.5 4 4.5 5 5.5 6
x1 > 1.7 x2 > 2.8 ˆ y = 1
5 / 61
sepal length/width 1.5 2 2.5 3 petal length/width 2 2.5 3 3.5 4 4.5 5 5.5 6
x1 > 1.7 x2 > 2.8 ˆ y = 1 ˆ y = 2 ˆ y = 3
5 / 61
1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0
6 / 61
n
n
w∈Rd
Tx
w∈Rd
n
Txi)2.
7 / 61
8 / 61
8 / 61
1
n
9 / 61
1
n
n
T
i w
2.
9 / 61
1
n
n
T
i w
2.
9 / 61
1
n
n
T
i w
2.
TA)w = A Tb,
9 / 61
1
n
n
T
i w
2.
TA)w = A Tb,
9 / 61
10 / 61
i=1 siuivT i , where r is the rank of M, and
i=1 1 si viuT i .
i=1 span t
11 / 61
TAw = A Tb.
TAw =
r
T
i
r
T
i
r
T
i
r
T
i
r
T
i
Tb.
i=1 uiuT i = I.
12 / 61
T(Aw − y) + Aw − y2.
T(Aw − y) = (w′ − w) T(A TAw − A Ty) = 0,
13 / 61
T(Aw − y) + Aw − y2.
T(Aw − y) = (w′ − w) T(A TAw − A Ty) = 0,
i=1 siuivT i ,
r
i viv
T
i
13 / 61
T(Aw − y) + Aw − y2.
T(Aw − y) = (w′ − w) T(A TAw − A Ty) = 0,
i=1 siuivT i ,
r
i viv
T
i
13 / 61
2
14 / 61
2
14 / 61
2
14 / 61
2
2
14 / 61
2
2
14 / 61
2
2
14 / 61
Tw = 0
15 / 61
16 / 61
16 / 61
Tw > 0
16 / 61
Tw > 0
16 / 61
2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0
17 / 61
18 / 61
Tb + b2 = 2a · b − 2a Tb,
a
ab b
n
Txi);
19 / 61
n
Txi);
19 / 61
2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0
2 .
.
. . 4 . 8 . 1 2 . 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0
0.000 0.400 0.800 1.200
20 / 61
2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0
2 .
.
. . 4 . 8 . 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0
. 2
8
4 . . 4 . 8 1 . 2
21 / 61
22 / 61
22 / 61
22 / 61
23 / 61
n
i=1 ln(1+exp(yiwTxi)) and set to 0 ???
23 / 61
n
i=1 ln(1+exp(yiwTxi)) and set to 0 ???
23 / 61
10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 2.000 4.000 6 . 8 . 10.000 12.000 1 4 .
24 / 61
10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 2.000 4.000 6 . 8 . 10.000 12.000 1 4 .
24 / 61
10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 2.000 4.000 6 . 8 . 10.000 12.000 1 4 .
24 / 61
W ∈Rd×kAW − B2
F with B ∈ Rn×k;
25 / 61
≥0 : i pi = 1},
k
k
entropy
k
j=1 exp(f(x)j)
j=1 exp(f(x)j)
k
26 / 61
j
j=y f(x)j,
j zj ≈ maxj zj, cross-entropy satisfies
j
27 / 61
28 / 61
i=1 with W i ∈ Rdi×di−1 are the weights, and (bi)L i=1 are the biases.
i=1 with σi : Rdi → Rdi are called nonlinearties, or activations, or
29 / 61
1 1+exp(−z).
30 / 61
i=1, the weights and biases, are the parameters.
i=1,
31 / 61
W
n
W 1∈Rd×d1 ,b1∈Rd1
W L∈RdL−1×dL ,bL∈RdL
n
i=1)
W 1∈Rd×d1 ,b1∈Rd1
W L∈RdL−1×dL ,bL∈RdL
n
32 / 61
1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00
0.000 1.500 3.000 4.500 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00
2 .
6 .
. 0.000 0.000 8 . 8.000 16.000
1 ].
33 / 61
34 / 61
34 / 61
34 / 61
x∈[0,1]d
35 / 61
x∈[0,1]d
35 / 61
36 / 61
(Taken from https://github.com/vdumoulin/conv_arithmetic by Vincent Dumoulin, Francesco Visin.) 37 / 61
(Taken from https://github.com/vdumoulin/conv_arithmetic by Vincent Dumoulin, Francesco Visin.) 37 / 61
(Taken from https://github.com/vdumoulin/conv_arithmetic by Vincent Dumoulin, Francesco Visin.) 37 / 61
(Taken from https://github.com/vdumoulin/conv_arithmetic by Vincent Dumoulin, Francesco Visin.) 37 / 61
38 / 61
2 2 3 3 1 3 2 1 2 2 2 3 1 1 2 3 1
3.0 3.0 3.0 2.0 3.0 3.0 3.0 3.0 3.0
(Taken from https://github.com/vdumoulin/conv_arithmetic by Vincent Dumoulin, Francesco Visin.) 39 / 61
2 2 3 3 1 3 2 1 2 2 2 3 1 1 2 3 1
3.0 3.0 3.0 2.0 3.0 3.0 3.0 3.0 3.0
(Taken from https://github.com/vdumoulin/conv_arithmetic by Vincent Dumoulin, Francesco Visin.) 39 / 61
2 2 3 3 1 3 2 1 2 2 2 3 1 1 2 3 1
3.0 3.0 3.0 2.0 3.0 3.0 3.0 3.0 3.0
(Taken from https://github.com/vdumoulin/conv_arithmetic by Vincent Dumoulin, Francesco Visin.) 39 / 61
2 2 3 3 1 3 2 1 2 2 2 3 1 1 2 3 1
3.0 3.0 3.0 2.0 3.0 3.0 3.0 3.0 3.0
(Taken from https://github.com/vdumoulin/conv_arithmetic by Vincent Dumoulin, Francesco Visin.)
39 / 61
Tx T,
T
LGL−1(W)
T
T
L,
40 / 61
Tx T,
T
LGL−1(W)
T
T
L,
T Gj−1(W) T,
T , 40 / 61
Tx T,
T
LGL−1(W)
T
T
L,
T Gj−1(W) T,
T ,
40 / 61
T Gj−1(W) T.
41 / 61
T Gj−1(W) T.
41 / 61
T Gj−1(W) T.
41 / 61
T Gj−1(W) T.
41 / 61
n
i=1 with minibatch ((x′ i, y′ i))b i=1?
42 / 61
43 / 61
i=1
i x ≤ bi
i=1 αixi : k ∈ N, xi ∈ S, αi ≥ 0, k i=1 αi = 1}.
43 / 61
44 / 61
45 / 61
i=1 exp(xi)
45 / 61
T(x − x0)
46 / 61
T(x − x0)
46 / 61
x∈Rd
47 / 61
x∈Rd
47 / 61
T(X − y)
TE (X − y) = f(y).
48 / 61
w∈Rd
2
Tw ≥ 1
49 / 61
w∈Rd
2
Tw ≥ 1
49 / 61
w∈Rd
2
Tw ≥ 1
49 / 61
w∈Rd
2
Tw ≥ 1
49 / 61
w∈Rd
2
Tw ≥ 1
49 / 61
2 + n
T
i w).
α≥0 L(w, α) = max α≥0
2 + n
T
i w)
w∈Rd L(w, α) = L
n
n
2
n
n
T
i xj.
i=1 αiyixi;
i ˆ
50 / 61
w∈Rd
n
T
i w
α∈Rn 0≤αi≤C
n
i=1 αiyixi.
51 / 61
w∈Rd
n
T
i w
α∈Rn 0≤αi≤C
n
i=1 αiyixi.
i w + b);
i=1 yiαi = 0 in dual.
2 and C with λ 2 and 1 n, respectively.
51 / 61
i xj.
α1,α2,...,αn≥0 n
n
T
i xj.
52 / 61
i xj.
α1,α2,...,αn≥0 n
n
T
i xj.
α1,α2,...,αn≥0 n
n
Tφ(xj). 52 / 61
i xj.
α1,α2,...,αn≥0 n
n
T
i xj.
α1,α2,...,αn≥0 n
n
Tφ(xj).
i=1 ˆ
T ˆ
n
Tφ(xi). 52 / 61
i xj.
α1,α2,...,αn≥0 n
n
T
i xj.
α1,α2,...,αn≥0 n
n
Tφ(xj).
i=1 ˆ
T ˆ
n
Tφ(xi).
52 / 61
2), where
1, . . . , x2 d,
53 / 61
2), where
1, . . . , x2 d,
Tφ(x′) = (1 + x Tx′)2. 53 / 61
2), where
1, . . . , x2 d,
Tφ(x′) = (1 + x Tx′)2.
53 / 61
2), where
1, . . . , x2 d,
Tφ(x′) = (1 + x Tx′)2.
53 / 61
2), where
1, . . . , x2 d,
Tφ(x′) = (1 + x Tx′)2.
53 / 61
Tφ(x′) = exp
2
54 / 61
55 / 61
Tφ(x′) = K(x, x′).
55 / 61
α1,α2,...,αn≥0 n
n
56 / 61
α1,α2,...,αn≥0 n
n
i=1 ˆ
T ˆ
n
56 / 61
α1,α2,...,αn≥0 n
n
i=1 ˆ
T ˆ
n
56 / 61
α1,α2,...,αn≥0 n
n
i=1 ˆ
T ˆ
n
56 / 61
α1,α2,...,αn≥0 n
n
i=1 ˆ
T ˆ
n
56 / 61
1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00
6 .
. . 8.000 8.000 16.000
1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00
.
. 5 . 0.000 2 . 5
1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00
.
.
. . 0.000 1.000 2.000 3.000
1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00
. 5
.
. . 0.500 0.500 1 . 1 . 5
57 / 61
Tx ≤ 0]yx.
58 / 61
Tx ≤ 0]yx.
58 / 61
Tx ≤ 0]yx.
58 / 61
Tx ≤ 0]yx.
58 / 61
Tx ≤ 0]yx.
58 / 61
Tx ≤ 0]yx.
58 / 61
Tx ≤ 0]yx.
58 / 61
Tx ≤ 0]yx.
58 / 61
Tx ≤ 0]yx.
58 / 61
59 / 61
Tb| ≤ a · b. 60 / 61
Tb| ≤ a · b.
Tb + b2
Tb ≤ r2a2
Tb ≤ a2
Tb = a T (−b) ≤ a · − b = a · b,
Tb
61 / 61
61 / 61
i=1 siuivT i .
61 / 61
i=1 siuivT i .
61 / 61
i=1 siuivT i .
⊤
61 / 61
i=1 siuivT i .
⊤
61 / 61
i=1 siuivT i .
⊤
61 / 61