Adaptivity of deep ReLU network and its generalization error analysis
Taiji Suzuki†‡
†The University of Tokyo
Department of Mathematical Informatics
‡AIP-RIKEN
22nd/Feb/2019 The 2nd Korea-Japan Machine Learning Workshop
1 / 50
Adaptivity of deep ReLU network and its generalization error - - PowerPoint PPT Presentation
Adaptivity of deep ReLU network and its generalization error analysis Taiji Suzuki The University of Tokyo Department of Mathematical Informatics AIP-RIKEN 22nd/Feb/2019 The 2nd Korea-Japan Machine Learning Workshop 1 / 50 Deep
†The University of Tokyo
Department of Mathematical Informatics
‡AIP-RIKEN
1 / 50
2 / 50
2 / 50
3 / 50
3 / 50
4 / 50
1
2
5 / 50
m
j=1
j x + bj).
j=1 vjη(w ⊤ j x + bj)
6 / 50
m
j=1
j x + bj).
j=1 vjη(w ⊤ j x + bj)
1 1+exp(−u)
6 / 50
7 / 50
8 / 50
K
k=1
9 / 50
10 / 50
10 / 50
i=1 Yiφ(X1, . . . , Xn, x),
11 / 50
i=1.
f ∈F
n
i=1
12 / 50
True Model
Estimator
13 / 50
1
2
14 / 50
15 / 50
1
2
16 / 50
ˆ f :estimator
f o∈F
L2(P)] ≤ n−?
17 / 50
|α|≤m
|α|=m sup x∈Ω
p (Ω))
p =
|α|≤k
Lp(Ω)
p
p,q(Ω)) (0 < p, q ≤ ∞, 0 < s ≤ m)
∥h∥≤t
j=1
p,q(Ω) = ∥f ∥Lp(Ω) +
18 / 50
p,1 ֒
p ֒
p,∞,
2,2 = W m 2 .
∞,∞.
19 / 50
p,q ֒
p,q ֒
1,1([0, 1]) ⊂ {bounded total variation} ⊂ B1 1,∞([0, 1])
20 / 50
21 / 50
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
p=0.5 p=1 p=2
k∈N+
j∈J(k)
p,q ≃
∞
k=0
j∈J(k)
1/q
22 / 50
23 / 50
3d∨mN
s d
c(d,m)
p,q([0,1]d))
24 / 50
p,q([0,1]d))
24 / 50
N0 N1 N2 N3
k,j (x1, . . . , xd) = d
i=1
25 / 50
p,q if and only if f can be decomposed into
k∈N+
j∈J(k)
k,j ,
∞
k=0
j∈J(k)
1/q
p,q ≃ N(f ).
k,j by deep NN “efficiently”.
26 / 50
27 / 50
f o∈U(Bs
p,q([0,1]d))
ˇ f ∈F(L,W ,S,B) ∥f o − ˇ
p,q(Ω) = C s(Ω)
rough smooth
28 / 50
f o∈U(Bs
p,q([0,1]d))
ˇ f ∈F(L,W ,S,B) ∥f o − ˇ
p,q(Ω) = C s(Ω)
i=1 αiψi(x))
rough smooth
28 / 50
True Model
Estimator
f ∈F(L,W ,S,B) n
i=1
29 / 50
L2(PX )]
f ∈F(L,W ,S,B) ∥f − f o∥2 L2(PX )
30 / 50
L2(PX )]
f ∈F(L,W ,S,B) ∥f − f o∥2 L2(PX )
p,q(Ω), we know that
30 / 50
+ .
f ∈F(L,W ,S,B) n
i=1
p,q([0,1]d) ≤ 1 and ∥f o∥∞ ≤ R, and 0 < p, q ≤ ∞ with
p − 1 2)+, by letting N ≍ n
d 2s+d ,
2s 2s+d log(n)3.
31 / 50
+ .
f ∈F(L,W ,S,B) n
i=1
p,q([0,1]d) ≤ 1 and ∥f o∥∞ ≤ R, and 0 < p, q ≤ ∞ with
p − 1 2)+, by letting N ≍ n
d 2s+d ,
2s 2s+d log(n)3.
31 / 50
i=1 yiφ(x1, . . . , xn; x)
2s−2(1/p−1/2)+ 2s+1−2(1/p−1/2)+
2s 2s+1
rough smooth
32 / 50
ˆ f :Linear
f o∈F
L2(P)] =
ˆ f :Linear
f o∈conv(F)
L2(P)].
33 / 50
1
2
34 / 50
K
i=1
K
i=1
ˆ f :Linear
f o∈JK
L2(P)
f o∈JK
L2(P)
35 / 50
i∈Z+
(k,ℓ)
k>m
i=1
36 / 50
2α 2α+1 (log(n))− 4α2 2α+1 )
2α 2α+1 log(n)3)
2α 2α+1 log(n)3
37 / 50
1
2
38 / 50
2s 2s+d
39 / 50
p,p = Br1 p,p ⊗ · · · ⊗ Brd p,p
L2(P).
d )
2β 2β+d )
2 )
2β 2β+1+log2(e)) 40 / 50
p,p = Bβ p,p(R) ⊗p · · · ⊗p Bβ p,p(R)
r=1 f (1) r
r
r
p,q (see, for example, Sickel and Ullrich (2009);
41 / 50
p,p = Bβ p,p(R) ⊗p · · · ⊗p Bβ p,p(R)
r=1 f (1) r
r
r
p,q (see, for example, Sickel and Ullrich (2009);
p,p ⊗p G for a Banach space G be
p,p⊗pG := inf
r=1
r
Bβ
p,p
r=1
r
r=1
r=1 f (1) r
r
r
p,p and g (2) r
p,p ⊗p G is obtained by completion of the finite sum w.r.t. this norm.
p,p := Bβ p,p ⊗p (· · · Bβ p,p ⊗p (Bβ p,p ⊗p Bβ p,p))
41 / 50
p,p = Bβ p,p(R) ⊗p · · · ⊗p Bβ p,p(R)
r=1 f (1) r
r
r
p,q (see, for example, Sickel and Ullrich (2009);
p,q(R2)):
1
2
2
1∂x2
1∂x2 2
p (R2)):
1
2
41 / 50
p,q(R), f : sufficiently smooth.
d
r=1
R
r=1 d
k=1
42 / 50
p,q([0, 1]d)
p,q([0,1]d) ≤ 1 and N ≥ 1, there exists ReLU-NN ˇ
d,N
d,N
d,N
d−1 log(N))log(N)(1 + log(N) d−1 )d−1(≲ dlog(N) ∧ log(N)d−1).
p,q([0, 1]d): N−β/d.
43 / 50
f ∈F(L,W ,S,B) n
i=1
p,q([0, 1]d)
p,q([0,1]d) ≤ 1, by letting u = (1 − 1
q)+ (p ≥ 2), ( 1 2 − 1 q)+ (p < 2),
L2(P) ≤
2β 2β+1 log(n) 2β+2u 1+2β (d−1)log(n)3
2β 2β+1+log2(e) log(n)3
p,q([0, 1]d): ˜
2β 2β+d ).
44 / 50
45 / 50
46 / 50
d
j=1
R
r=1 d
k=1
p,q, and g ∈ Bγ p,q(RD):
2s 2s+1+log2(e) + n− 2γ 2γ+D ).
47 / 50
2γ 2γ+k
48 / 50
2γ 2γ+D
49 / 50
L2(P) = ˜
−
2(s−d(1/p−1/2)) 2s+d−2d(1/p−1/2)
2β+2u 1+2β (d−1))
50 / 50
50 / 50
50 / 50
50 / 50
50 / 50