Deep Neural Networks and Partial Differential Equations: - - PowerPoint PPT Presentation

deep neural networks and partial differential equations
SMART_READER_LITE
LIVE PREVIEW

Deep Neural Networks and Partial Differential Equations: - - PowerPoint PPT Presentation

Deep Neural Networks and Partial Differential Equations: Approximation Theory and Structural Properties Philipp Christian Petersen Joint work Joint work with: Helmut B olcskei (ETH Z urich) Philipp Grohs (University of Vienna)


slide-1
SLIDE 1

Deep Neural Networks and Partial Differential Equations: Approximation Theory and Structural Properties

Philipp Christian Petersen

slide-2
SLIDE 2

Joint work

Joint work with:

◮ Helmut B¨

  • lcskei (ETH Z¨

urich)

◮ Philipp Grohs (University of Vienna) ◮ Joost Opschoor (ETH Z¨

urich)

◮ Gitta Kutyniok (TU Berlin) ◮ Mones Raslan (TU Berlin) ◮ Christoph Schwab (ETH Z¨

urich)

◮ Felix Voigtlaender (KU Eichst¨

att-Ingolstadt)

1 / 36

slide-3
SLIDE 3

Today’s Goal

Goal of this talk: Discuss the suitability of neural networks as an ansatz system for the solution of PDEs. Two threads: Approximation theory:

◮ universal approximation ◮ optimal

approximation rates for all classical function spaces

◮ reduced curse of dimen-

sion

1 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1 1
  • 1
0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 0.5 1 1

Structural properties:

◮ non-convex,

non-closed ansatz spaces

◮ parametrization not stable ◮ very hard to optimize over

2 / 36

slide-4
SLIDE 4

Outline

Neural networks Introduction to neural networks Approaches to solve PDEs Approximation theory of neural networks Classical results Optimality High-dimensional approximation Structural results Convexity Closedness Stable parametrization

3 / 36

slide-5
SLIDE 5

Neural networks

We consider neural networks as a special kind of functions:

◮ d = N0 ∈ N: input dimension, ◮ L: number of layers, ◮ ̺ : R → R: activation function, ◮ Tℓ : RNℓ−1 → RNℓ, ℓ = 1, . . . , L: affine-linear maps.

Then Φ̺ : Rd → RNL given by Φ̺(x) = TL(̺(TL−1(̺(. . . ̺(T1(x)))))), x ∈ Rd, is called a neural network (NN). The sequence (d, N1, . . . , NL) is called the architecture of Φ̺.

4 / 36

slide-6
SLIDE 6

Why are neural networks interesting? - I

Deep Learning: Deep learning describes a variety of techniques based on data-driven adaptation of the affine linear maps in a neural network. Overwhelming success:

◮ Image classification ◮ Text understanding ◮ Game intelligence

Hardware design of the future!

Ren, He, Girshick, Sun; 2015

5 / 36

slide-7
SLIDE 7

Why are neural networks interesting? - II

Expressibility: Neural networks constitute a very powerful architecture.

Theorem (Cybenko; 1989, Hornik; 1991, Pinkus; 1999)

Let d ∈ N, K ⊂ Rd compact, f : K → R continuous, ̺ : R → R continuous and not a polynomial. Let ε > 0, then there exist a two-layer NN Φ̺: f − Φ̺∞ ≤ ε. Efficient expressibility: RM ∋ θ → (T1, . . . , TL) → Φ̺

θ yields a

parametrized system of functions. In a sense this parametrization is

  • ptimally efficient. (More on this below).

6 / 36

slide-8
SLIDE 8

How can we apply NNs to solve PDEs?

PDE problem: For D ⊂ Rd, d ∈ N find u such that G(x, u(x), ∇u(x), ∇2u(x)) = 0 for all x ∈ D. Approach of [Lagaris, Likas, Fotiadis; 1998]: Let (xi)i∈I ⊂ D, find a NN Φ̺

θ such that

G(xi, Φ̺

θ(xi), ∇Φ̺ θ(xi), ∇2Φ̺ θ(xi)) = 0 for all i ∈ I.

Standard methods can be used to find parameters θ.

7 / 36

slide-9
SLIDE 9

Approaches to solve PDEs - Examples

General Framework: Deep Ritz Method [E, Yu; 2017]: NNs as trial functions, SGD naturally replaces quadrature. High-dimensional PDEs: [Sirignano, Spiliopoulos; 2017]: Let D ⊂ Rd d ≥ 100 find u such that ∂u ∂t (t, x) + H(u)(t, x) = 0, (t, x) ∈ [0, T] × Ω, + BC + IC As the number of parameters of the NNs increases the minimizer of associated energy approaches true solution. No mesh generation required! [Berner, Grohs, Hornung, Jentzen, von Wurstemberger; 2017]: Phrasing problem as empirical risk minimization provably no curse of dimension in approximation problem or number of samples.

8 / 36

slide-10
SLIDE 10

How can we apply NNs to solve PDEs?

Deep learning and PDEs: Both approaches above are based on two ideas.

◮ Neural networks are highly efficient in representing solutions of

PDEs, hence the complexity of the problem can be greatly reduced.

◮ There exist black box methods from machine learning that

solve the optimization problem. This talk:

◮ We will show exactly how efficient the representations are. ◮ Raise doubt that the black box can produce reliable results in

general.

9 / 36

slide-11
SLIDE 11

Approximation theory of neural networks

10 / 36

slide-12
SLIDE 12

Complexity of neural networks

Recall: Φ̺(x) = TL(̺(TL−1(̺(. . . ̺(T1(x)))))), x ∈ Rd. Each affine linear mapping Tℓ is defined by a matrix Aℓ ∈ RNℓ×Nℓ−1 and a translation bℓ ∈ RNℓ via Tℓ(x) = Aℓ x + bℓ. The number of weights W (Φ̺) and the number of neurons N(Φ̺) are W (Φ̺) =

  • j≤L

(Ajℓ0 + bjℓ0) and N(Φ̺) =

L

  • j=0

Nj .

11 / 36

slide-13
SLIDE 13

Power of the architecture — Exemplary results

Given f from some class of functions, how many weights/neurons does an ε-approximating NN need to have?

12 / 36

slide-14
SLIDE 14

Power of the architecture — Exemplary results

Given f from some class of functions, how many weights/neurons does an ε-approximating NN need to have? Not so many...

Theorem (Maiorov, Pinkus; 1999)

There exists an activation function ̺weird : R → R that

◮ is analytic and strictly increasing, ◮ satisfies limx→−∞ ̺weird(x) = 0 and limx→∞ ̺weird(x) = 1,

such that for any d ∈ N, any f ∈ C([0, 1]d), and any ε > 0, there is a 3-layer ̺-network Φ̺weird

ε

with f − Φ̺weird

ε

L∞ ≤ ε and N(Φ̺weird

ε

) = 9d + 3.

12 / 36

slide-15
SLIDE 15

Power of the architecture — Exemplary results

◮ Barron; 1993: Approximation rate for functions with one finite

Fourier moment using shallow networks with activation function ̺ sigmoidal of order zero.

13 / 36

slide-16
SLIDE 16

Power of the architecture — Exemplary results

◮ Barron; 1993: Approximation rate for functions with one finite

Fourier moment using shallow networks with activation function ̺ sigmoidal of order zero.

◮ Mhaskar; 1993: Let ̺ be sigmoidal function of order k ≥ 2.

For f ∈ C s([0, 1]d), we have f − Φ̺

nL∞ N(Φ̺ n)−s/d and

L(Φ̺

n) = L(d, s, k).

13 / 36

slide-17
SLIDE 17

Power of the architecture — Exemplary results

◮ Barron; 1993: Approximation rate for functions with one finite

Fourier moment using shallow networks with activation function ̺ sigmoidal of order zero.

◮ Mhaskar; 1993: Let ̺ be sigmoidal function of order k ≥ 2.

For f ∈ C s([0, 1]d), we have f − Φ̺

nL∞ N(Φ̺ n)−s/d and

L(Φ̺

n) = L(d, s, k). ◮ Yarotsky; 2017: For f ∈ C s([0, 1]d), we have for ̺(x) = x+

(called ReLU) that f − Φ̺

nL∞ W (Φ̺ n)−s/d and

L(Φ̺

ε) ≍ log(n).

13 / 36

slide-18
SLIDE 18

Power of the architecture — Exemplary results

◮ Barron; 1993: Approximation rate for functions with one finite

Fourier moment using shallow networks with activation function ̺ sigmoidal of order zero.

◮ Mhaskar; 1993: Let ̺ be sigmoidal function of order k ≥ 2.

For f ∈ C s([0, 1]d), we have f − Φ̺

nL∞ N(Φ̺ n)−s/d and

L(Φ̺

n) = L(d, s, k). ◮ Yarotsky; 2017: For f ∈ C s([0, 1]d), we have for ̺(x) = x+

(called ReLU) that f − Φ̺

nL∞ W (Φ̺ n)−s/d and

L(Φ̺

ε) ≍ log(n). ◮ Shaham, Cloninger, Coifman; 2015: One can implement

certain wavelets using 4–layer NNs.

13 / 36

slide-19
SLIDE 19

Power of the architecture — Exemplary results

◮ Barron; 1993: Approximation rate for functions with one finite

Fourier moment using shallow networks with activation function ̺ sigmoidal of order zero.

◮ Mhaskar; 1993: Let ̺ be sigmoidal function of order k ≥ 2.

For f ∈ C s([0, 1]d), we have f − Φ̺

nL∞ N(Φ̺ n)−s/d and

L(Φ̺

n) = L(d, s, k). ◮ Yarotsky; 2017: For f ∈ C s([0, 1]d), we have for ̺(x) = x+

(called ReLU) that f − Φ̺

nL∞ W (Φ̺ n)−s/d and

L(Φ̺

ε) ≍ log(n). ◮ Shaham, Cloninger, Coifman; 2015: One can implement

certain wavelets using 4–layer NNs.

◮ He, Li, Xu, Zheng; 2018, Opschoor, Schwab, P.; 2019:

ReLU NNs reproduce approximation rates of h-, p- and hp-FEM.

13 / 36

slide-20
SLIDE 20

Lower bounds

Optimal approximation rates: Lower bounds on required network size only exist under additional assumptions. (Recall networks based

  • n ̺weird).

Options: (A) Place restrictions on activation function (e.g. only consider the ReLU), thereby excluding pathological examples like ̺weird. ( VC dimension bounds) (B) Place restrictions on the weights. ( Information theoretical bounds, entropy arguments) (C) Use still other concepts like continuous N-widths.

14 / 36

slide-21
SLIDE 21

Lower bounds

Optimal approximation rates: Lower bounds on required network size only exist under additional assumptions. (Recall networks based

  • n ̺weird).

Options: (A) Place restrictions on activation function (e.g. only consider the ReLU), thereby excluding pathological examples like ̺weird. ( VC dimension bounds) (B) Place restrictions on the weights. ( Information theoretical bounds, entropy arguments) (C) Use still other concepts like continuous N-widths.

14 / 36

slide-22
SLIDE 22

Asymptotic min-max rate distortion

Encoders: Let C ⊂ L2(Rd), ℓ ∈ N Eℓ :=

  • E : C → {0, 1}ℓ

, Dℓ :=

  • D : {0, 1}ℓ → L2(Rd)
  • .

{0, 1, 0, 0, 1, 1, 1}

Min-max code length: L(ǫ, C) := min

  • ℓ ∈ N : ∃D ∈ Dℓ, C ∈ Cℓ :sup

f ∈C

D(E(f )) − f 2 < ǫ

  • .

Optimal exponent: γ∗(C) := inf

  • γ > 0 : L(ǫ, C) = O(ǫ−γ)
  • .

15 / 36

slide-23
SLIDE 23

Asymptotic min-max rate distortion

Theorem (Boelcskei, Grohs, Kutyniok, P.; 2017)

Let C ⊂ L2(Rd), ̺ : R → R, then for all ǫ > 0: sup

f ∈C

  inf

Φ̺ NN with quantised weights Φ̺−f 2≤ǫ

W (Φ̺)   ǫ−γ∗(C). (1) Optimal approximation/parametrization: If for C ⊂ L2(Rd) one also has in (1), then NNs approximate a function class optimally. Versatility: It turns out that NNs achieve optimal approximation rates for many practically-used function classes.

16 / 36

slide-24
SLIDE 24

Some instances of optimal approximation

◮ Mhaskar; 1993: Let ̺ be sigmoidal function of order k ≥ 2.

For f ∈ C s([0, 1]d), we have f − Φ̺

nL∞ N(Φ̺ n)−s/d.

We have γ∗({f ∈ C s([0, 1]d : f ≤ 1}) = d/s.

◮ Shaham, Cloninger, Coifman; 2015: One can implement

certain wavelets using 4–layer ReLU NNs. Optimal, when wavelets are optimal.

◮ B¨

  • lcskei, Grohs, Kutyniok, P.; 2017: Networks yield
  • ptimal rates if any affine system does. Example: shearlets for

cartoon-like functions.

17 / 36

slide-25
SLIDE 25

ReLU Approximation

Piecewise smooth functions: Eβ,d denotes the d-dimensional C β-piecewise smooth functions

  • n [0, 1]d with interfaces in C β.
1 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1 0.5 1

Theorem (P., Voigtlaender; 2018)

Let d ∈ N, β ≥ 0, ̺ : R → R, ̺(x) = x+, then sup

f ∈Eβ,d

  inf

Φ̺ NN with quantised weights Φ̺−f 2≤ǫ

W (Φ̺)   ∼ ǫ−γ∗(Eβ,d) = ǫ−2(d−1)/β. The optimal depth of the networks is ∼ β/d.

18 / 36

slide-26
SLIDE 26

High-dimensional approximation

Curse of dimension: To guarantee approximation with error ≤ ε

  • f functions in Eβ,d one requires networks with O(ε−2(d−1)/β)

weights. Symmetries and invariances: Image classifiers are often:

◮ translation, dilation, and rotation invariant, ◮ invariant to small deformations, ◮ invariant to small changes in brightness, contrast, color.

19 / 36

slide-27
SLIDE 27

Curse of dimension

Two-step setup: f = χ ◦ τ

◮ τ : RD → Rd is a smooth dimension reducing feature map. ◮ χ ∈ Eβ,d performs classification on low-dimensional space.

20 / 36

slide-28
SLIDE 28

Curse of dimension

Two-step setup: f = χ ◦ τ

◮ τ : RD → Rd is a smooth dimension reducing feature map. ◮ χ ∈ Eβ,d performs classification on low-dimensional space.

Theorem (P., Voigtlaender; 2017)

Let ̺(x) = x+. There are constants c > 0, L ∈ N such that for any f = χ ◦ τ and any ε ∈ (0, 1/2), there is a NN Φ̺

ε with at most L

layers, and at most c · ε−2(d−1)/β non-zero weights such that Φ̺

ε − f L2 < ε.

Asymptotic approximation rate depends only on d, not on D.

20 / 36

slide-29
SLIDE 29

Compositional functions

Compositional functions: [Mhaskar, Poggio; 2016] High-dimensional functions as dyadic composition of 2-dimensional functions. R8 ∋ x → h3

1(h2 1(h1 1(x1, x2), h1 2(x3, x4)), h2 2(h1 3(x5, x6), h1 4(x7, x8)))

x1 x2 x3 x4 x5 x6 x7 x8

21 / 36

slide-30
SLIDE 30

Extensions

Approximation with respect to Sobolev norms: ReLU NNs Φ are Lipschitz continuous. Hence, for s ∈ [0, 1], p ≥ 1 and f ∈ W s,p(Ω), we can measure f − ΦW s,p(Ω). ReLU Networks achieve the same approximation rates as h-, p-, hp-FEM, [Opschoor, P., Schwab; 2019]. Convolutional neural networks: Direct correspondence between approximation by CNNs (without pooling) and approximation by fully-connected networks, [P., Voigtlaender; 2018].

22 / 36

slide-31
SLIDE 31

Optimal parametrization

Optimal parametrization:

◮ Neural networks yield optimal representations of many function

classes relevant in PDE applications,

◮ Approximation is flexible and quality is improved if

low-dimensional structure is present. PDE discretization:

◮ Problem complexity drastically reduced, ◮ No design of ansatz system necessary, since NNs approximate

almost every function class well. Can neural networks really be this good?

23 / 36

slide-32
SLIDE 32

The inconvenient structure of neural networks

24 / 36

slide-33
SLIDE 33

Fixed architecture networks

Goal: Fix a space of networks with prescribed shape and understand the associated set of functions. Fixed architecture networks: Let d, L ∈ N, N1, . . . , NL−1 ∈ N, ̺ : R → R then we define by NN ̺(d, N1, . . . , NL−1, 1) the set of NNs with architecture (d, N1, . . . , NL−1, 1).

d = 8 N1 = 12 N2 = 12 N3 = 12 N4 = 8 25 / 36

slide-34
SLIDE 34

Back to the basics

Topological properties: Is NN ̺(d, N1, . . . , NL−1, 1)

◮ star-shaped? ◮ convex? approximately convex? ◮ closed?

Is the map (T1, . . . , TL) → Φ open? Implications for optimization: If we do not have the properties above, then we can have

◮ terrible local minima, ◮ exploding weights, ◮ very slow convergence.

26 / 36

slide-35
SLIDE 35

Star-shapedness

Star-shapedness: NN ̺(d, N1, . . . , NL−1, 1) is trivially star-shaped with center 0. ...but...

Proposition (P., Raslan, Voigtlaender; 2018)

Let d, L, N, N1, . . . , NL−1 ∈ N and let ̺ : R → R be locally Lipschitz continuous. Then the number of linearly independent centers of NN ̺(d, N1, . . . , NL) is at most L

ℓ=1(Nℓ−1 + 1)Nℓ,

where N0 = d.

27 / 36

slide-36
SLIDE 36

Convexity?

Corollary (P., Raslan, Voigtlaender; 2018)

Let d, L, N, N1, . . . , NL−1 ∈ N, N0 = d, and let ̺ : R → R be locally Lipschitz continuous. If NN ̺(d, N1, . . . , NL−1, 1) contains more than L

ℓ=1(Nℓ−1 + 1)Nℓ

linearly independent functions, then NN ̺(d, N1, . . . , NL−1, 1) is not convex. From translation invariance: If NN ̺(d, N1, . . . , NL−1, 1) only finitely many linearly independent functions then ̺ is a finite sum of complex exponentials multiplied with polynomials.

28 / 36

slide-37
SLIDE 37

Weak Convexity?

Weak convexity: NN ̺(d, N1, . . . , NL−1, 1) is almost never convex, but what about NN ̺(d, N1, . . . , NL−1, 1) + B·∞

ǫ

(0) for a hopefully small ǫ > 0?

29 / 36

slide-38
SLIDE 38

Weak Convexity?

Weak convexity: NN ̺(d, N1, . . . , NL−1, 1) is almost never convex, but what about NN ̺(d, N1, . . . , NL−1, 1) + B·∞

ǫ

(0) for a hopefully small ǫ > 0?

Theorem (P., Raslan, Voigtlaender; 2018)

Let d, L, N, N1, . . . , NL−1 ∈ N, N0 = d. For all commonly-used activation functions there does not exist ǫ > 0 such that NN ̺(d, N1, . . . , NL−1, 1) + B·∞

ǫ

(0) is convex. As a corollary, we also get that NN ̺(d, N1, . . . , NL−1, 1) is usually nowhere dense.

29 / 36

slide-39
SLIDE 39

Illustration

Illustration: The set NN ̺(d, N1, . . . , NL−1, 1) has very few centers, it is scaling invariant, not approximately convex, and nowhere dense.

30 / 36

slide-40
SLIDE 40

Closedness in Lp

Compact weights: If the activation function ̺ is continuous, then a compactness argument shows that the set of networks of a compact parameter set is closed.

31 / 36

slide-41
SLIDE 41

Closedness in Lp

Compact weights: If the activation function ̺ is continuous, then a compactness argument shows that the set of networks of a compact parameter set is closed.

Theorem (P., Raslan, Voigtlaender; 2018)

Let d, L, N1, . . . , NL−1 ∈ N, N0 = d. If ̺ has one of the properties below, then NN ̺(d, N1, . . . , NL−1, 1) is not closed in Lp, p ∈ (0, ∞).

◮ analytic, bounded, not constant, ◮ C 1 but not C ∞, ◮ continuous, monotone, bounded, ̺′(x0) exists and is non-zero

in at least one point x0 ∈ R.

◮ continuous, monotone, continuous differentiable outside a

compact set, and limx→∞ ̺′(x), limx→−∞ ̺′(x) exist and do not coincide.

31 / 36

slide-42
SLIDE 42

Closedness in L∞

Theorem (P., Raslan, Voigtlaender; 2018)

Let d, L, N, N1, . . . , NL−1 ∈ N, N0 = d. If ̺ is has one of the properties below, then NN ̺(d, N1, . . . , NL−1, 1) is not closed in L∞.

◮ analytic, bounded, not constant, ◮ C 1 but not C ∞ ◮ ρ ∈ C p and |ρ(x) − xp +| bounded, for p ≥ 1.

ReLU: The set of two-layer ReLU NNs is closed in L∞!

32 / 36

slide-43
SLIDE 43

Illustration

Illustration: For most activation functions (except the ReLU) ̺ the set NN ̺(d, N1, . . . , NL−1, 1) is star-shaped with center 0, not approximately convex, not closed.

33 / 36

slide-44
SLIDE 44

Stable parametrization

Continuous parametrization: It is not hard to see, that if ̺ is continuous, then so is the map R̺ : (T1, . . . , TL) → Φ. Quotient map: We can also ask if R̺ is a quotient map, i.e., if Φ1, Φ2 are NNs which are close (w.r.t. · sup), then there exist (T 1

1 , . . . , T 1 L) and (T 2 1 , . . . , T 2 L) which are close in some norm and

R̺((T 1

1 , . . . , T 1 L)) = Φ1 and R̺((T 2 1 , . . . , T 2 L)) = Φ2.

Proposition (P., Raslan, Voigtlaender; 2018)

Let ̺ be Lipschitz continuous and not affine linear, then R̺ is not a quotient map.

34 / 36

slide-45
SLIDE 45

Consequences

No convexity:

◮ Want to solve ∇J(Φ) = 0 for an energy J and NN Φ. ◮ Not only J could be non-convex, but also the set we optimize

  • ver.

◮ Similar to N-term approximation by dictionaries.

No closedness:

◮ Exploding coefficients (if PNN (f ) ∈ NN). ◮ No low-neuron approximation.

No inverse-stable parametrization:

◮ Error term very small, while parametrization is far from optimal. ◮ Potentially very slow convergence.

35 / 36

slide-46
SLIDE 46

Where to go from here?

Different networks:

◮ Special types of networks could be more robust. ◮ Convolutional neural networks are probably still too large a

  • class. [P., Voigtlaender; 2018].

Stronger norms:

◮ Stronger norms naturally help with closedness and inverse

stability.

◮ Example is Sobolev training [Czarnecki, Osindero, Jaderberg,

Swirszcz, Pascanu; 2017].

◮ Many arguments of our results break down if W 1,∞ norm is

used.

36 / 36

slide-47
SLIDE 47

Conclusion

Approximation: NNs are a very powerful approximation tool:

◮ Often optimally efficient

parametrization

◮ overcome curse of dimension ◮ surprisingly efficient black-box

  • ptimization

Topological structure: NNs form an impractical set:

◮ non-convex ◮ non-closed ◮ no inverse-stable parametrization.

37 / 36

slide-48
SLIDE 48

References:

  • H. Andrade-Loarca, G. Kutyniok, O. ¨

Oktem, P. Petersen, Extraction of digital wavefront sets using applied harmonic analysis and deep neural networks, arXiv:1901.01388

  • H. B¨
  • lcskei, P. Grohs, G. Kutyniok, P. Petersen, Optimal Approximation with Sparsely Connected Deep

Neural Networks, arXiv:1705.01714

  • J. Opschoor, P. Petersen, Ch. Schwab, Deep ReLU Networks and High-Order Finite Element Methods,

SAM, ETH Z¨ urich, 2019.

  • P. Petersen, F. Voigtlaender, Optimal approximation of piecewise smooth functions using deep ReLU neural

networks, Neural Networks, (2018)

  • P. Petersen, M. Raslan, F. Voigtlaender, Topological properties of the set of functions generated by neural

networks of fixed size, arXiv:1806.08459

  • P. Petersen, F. Voigtlaender, Equivalence of approximation by convolutional neural networks and

fully-connected networks, arXiv:1809.00973 37 / 36

slide-49
SLIDE 49

Thank you for your attention!

References:

  • H. Andrade-Loarca, G. Kutyniok, O. ¨

Oktem, P. Petersen, Extraction of digital wavefront sets using applied harmonic analysis and deep neural networks, arXiv:1901.01388

  • H. B¨
  • lcskei, P. Grohs, G. Kutyniok, P. Petersen, Optimal Approximation with Sparsely Connected Deep

Neural Networks, arXiv:1705.01714

  • J. Opschoor, P. Petersen, Ch. Schwab, Deep ReLU Networks and High-Order Finite Element Methods,

SAM, ETH Z¨ urich, 2019.

  • P. Petersen, F. Voigtlaender, Optimal approximation of piecewise smooth functions using deep ReLU neural

networks, Neural Networks, (2018)

  • P. Petersen, M. Raslan, F. Voigtlaender, Topological properties of the set of functions generated by neural

networks of fixed size, arXiv:1806.08459

  • P. Petersen, F. Voigtlaender, Equivalence of approximation by convolutional neural networks and

fully-connected networks, arXiv:1809.00973 36 / 36