Scalability analysis of the distributed-memory implementation of the - - PowerPoint PPT Presentation

scalability analysis of the distributed memory
SMART_READER_LITE
LIVE PREVIEW

Scalability analysis of the distributed-memory implementation of the - - PowerPoint PPT Presentation

Scalability analysis of the distributed-memory implementation of the Aggregated unfitted Finite Element Method (AgFEM) Alberto F. Martn , Santiago Badia, Francesc Verdugo, Eric Neiva MWNDEA 2020, Melbourne, Australia, 12/02/2020 Embedded


slide-1
SLIDE 1

Scalability analysis of the distributed-memory implementation of the Aggregated unfitted Finite Element Method (AgFEM) Alberto F. Martín∗, Santiago Badia, Francesc Verdugo, Eric Neiva

MWNDEA 2020, Melbourne, Australia, 12/02/2020

slide-2
SLIDE 2

Embedded Finite elements

CutFEM, Finite Cell Method, AgFEM, X-FEM, ... body-fitted mesh unfitted mesh ✔ Simplified mesh generation

Alberto F . Martín (Monash University) MWNDEA2020 2/35

slide-3
SLIDE 3

Embedded Finite elements

CutFEM, Finite Cell Method, AgFEM, X-FEM, ... body-fitted mesh unfitted mesh ✔ Simplified mesh generation ✘ Dirichlet BC? ✘ Numerical integration? ✘ ill-conditioning? (this talk)

Alberto F . Martín (Monash University) MWNDEA2020 2/35

slide-4
SLIDE 4

Parallel distributed-memory simulation pipeline

  • 1. Unfitted (adaptive) Cartesian grids

(p4est)

Alberto F . Martín (Monash University) MWNDEA2020 3/35

slide-5
SLIDE 5

Parallel distributed-memory simulation pipeline

  • 1. Unfitted (adaptive) Cartesian grids

(p4est)

  • 2. Partition using space filling-curves

(p4est)

Alberto F . Martín (Monash University) MWNDEA2020 3/35

slide-6
SLIDE 6

Parallel distributed-memory simulation pipeline

  • 1. Unfitted (adaptive) Cartesian grids

(p4est)

  • 2. Partition using space filling-curves

(p4est)

  • 3. Unfitted FE discretization (AgFEM)
  • 4. AMG linear solver (PETSc)

Alberto F . Martín (Monash University) MWNDEA2020 3/35

slide-7
SLIDE 7

Unfitted methods at large scales: pros and cons

✔ Highly scalable mesh generation based on octrees (e.g. p4est)

Alberto F . Martín (Monash University) MWNDEA2020 4/35

slide-8
SLIDE 8

Unfitted methods at large scales: pros and cons

✔ Highly scalable mesh generation based on octrees (e.g. p4est) ✔ Highly scalable mesh partition with space-filling curves (Parmetis not needed)

Alberto F . Martín (Monash University) MWNDEA2020 4/35

slide-9
SLIDE 9

Unfitted methods at large scales: pros and cons

✔ Highly scalable mesh generation based on octrees (e.g. p4est) ✔ Highly scalable mesh partition with space-filling curves (Parmetis not needed) ✔ Highly scalable adaptive mesh refinement + load balancing

Alberto F . Martín (Monash University) MWNDEA2020 4/35

slide-10
SLIDE 10

Unfitted methods at large scales: pros and cons

✔ Highly scalable mesh generation based on octrees (e.g. p4est) ✔ Highly scalable mesh partition with space-filling curves (Parmetis not needed) ✔ Highly scalable adaptive mesh refinement + load balancing ✘ Not guaranteed that highly scalable linear solvers keep their optimal properties for cut elements.

Alberto F . Martín (Monash University) MWNDEA2020 4/35

slide-11
SLIDE 11

PETSc CG + AMG preconditioner on unfitted meshes

Poisson equation (weak scaling test with 5 meshes)

AMG+AgFEM AMG + Naive unfitted FEM*

103 104 105 106 107 108

DOFs

4 6 8 10 12 14 16

GC iterations

103 104 105 106 107 108

DOFs

4 6 8 10 12 14 16

GC iterations

* Nitsche BCs + modified integration in cut cells

Alberto F . Martín (Monash University) MWNDEA2020 5/35

slide-12
SLIDE 12

Why linear solvers are affected by cut cells?

Condition number estimates (Poisson Eq.)

(a) Body-fitted case

k2(A) ∼ h−2

(b) Naive unfitted FEM

k2(A) ∼ |η|−(2p+1− 2

d )

"small cut cell problem"

Alberto F . Martín (Monash University) MWNDEA2020 6/35

slide-13
SLIDE 13

Possible remedies

Fix the linear solver Taylor your parallel solver to deal with k2(A) ∼ |η|−(2p+1−2/d) Example: [S. Badia, F . Verdugo. Robust and scalable domain decomposition solvers for unfitted finite element methods. Journal of Computational and Applied Mathematics (2018) ]. Fix the linear system (this talk) Enhance the unfitted FE method so that k2(A) ∼ h−2 Use a standard scalable solver Examples: CutFEM, AgFEM

Alberto F . Martín (Monash University) MWNDEA2020 7/35

slide-14
SLIDE 14

Agenda

  • 1. The AgFEM method (serial case)
  • 2. Parallel implementation
  • 3. Performance of parallel AgFEM + AMG solvers

Alberto F . Martín (Monash University) MWNDEA2020 8/35

slide-15
SLIDE 15

Agenda

  • 1. The AgFEM method (serial case)
  • 2. Parallel implementation
  • 3. Performance of parallel AgFEM + AMG solvers

Alberto F . Martín (Monash University) MWNDEA2020 9/35

slide-16
SLIDE 16

AgFEM method for the Poisson Eq. −∆u = f in Ω u = uD

  • n

∂Ω

  • Alberto F

. Martín (Monash University) MWNDEA2020 10/35

slide-17
SLIDE 17

Weak imposition of Dirichlet BCs

Nitsche’s Method Find uh ∈ V h such that ah(uh, vh) = lh(vh) ∀vh ∈ V h (vh does not vanish on ∂Ω!) where ah(uh, vh) :=

  • K∈T act

h

  • K∩Ω

∇uh · ∇vh −

  • F ∈T act

h

∩∂Ω

  • F

(∇uh · n)vh −

  • F ∈(T act

h

∩∂Ω)

  • F

uh(∇vh · n) +

  • F ∈(T act

h

∩∂Ω)

βh−1

F

  • ∂Ω

uh vh lh(vh) :=

  • K∈T act

h

  • K∩Ω

f vh +

  • F ∈(T act

h

∩∂Ω)

βh−1

F

  • F

uDvh −

  • F ∈(T act

h

∩∂Ω)

  • F

uD ∇vh · n

  • The key feature of AgFEM is the definition of the discrete space Vh

Alberto F . Martín (Monash University) MWNDEA2020 11/35

slide-18
SLIDE 18

Starting point: "naive" FE space

V std

h

:= {u ∈ C0(Ωact) : u|K ∈ Qp(K) ∀K ∈ T act

h

} T act

h

, Ωact V std

h Alberto F . Martín (Monash University) MWNDEA2020 12/35

slide-19
SLIDE 19

Aggregated FE space

Basic idea: improve conditioning by removing problematic DOFs V agg

h

:=   u ∈ Vh : u× =

  • ∈masters(×)

Cוu• ∀× ∈ P   

  • well-posed dofs

× problematic dofs (P)

Alberto F . Martín (Monash University) MWNDEA2020 13/35

slide-20
SLIDE 20

Definition of constraints via cell aggregates

Alberto F . Martín (Monash University) MWNDEA2020 14/35

slide-21
SLIDE 21

Definition of constraints via cell aggregates

  • 1. Generate cell aggregates

(1 interior cell + several cut cells)

Alberto F . Martín (Monash University) MWNDEA2020 14/35

slide-22
SLIDE 22

Definition of constraints via cell aggregates

  • 1. Generate cell aggregates

(1 interior cell + several cut cells)

  • 2. Define dof to root cell map root(×)

via the aggregates

Alberto F . Martín (Monash University) MWNDEA2020 14/35

slide-23
SLIDE 23

Definition of constraints via cell aggregates

  • 1. Generate cell aggregates

(1 interior cell + several cut cells)

  • 2. Define dof to root cell map root(×)

via the aggregates

  • 3. Define constraints:

u× =

  • ∈dofs(root(×))

φroot(×)

  • (x×)u•

Alberto F . Martín (Monash University) MWNDEA2020 14/35

slide-24
SLIDE 24

Results for the unfitted aggregated FEM (Poisson Eq.)1 κ(A) ≤ c1h−2 (Condition number bound) β ≤ c2h−2 (Nitsche’s penalty coef.) u − uhH1(Ω) ≤ c3hp (Optimal convergence order) u − uhL2(Ω) ≤ c4hp+1 (Optimal convergence order) and others (inverse/trace inequalities, bound of aggregate size, bound

  • f the extended solution, ...)

1 [Badia, Verdugo, Martín. The aggregated unfitted finite element method for elliptic

  • problems. Comput. Methods Appl. Mech. Eng. (2018).]

Alberto F . Martín (Monash University) MWNDEA2020 15/35

slide-25
SLIDE 25

0.3 0.4 0.5 0.6 0.7

5 10 15 20 25 30

log10(condest(A))

p=1, standard p=2, standard p=1, aggr. p=2, aggr.

Alberto F . Martín (Monash University) MWNDEA2020 16/35

slide-26
SLIDE 26

0.3 0.4 0.5 0.6 0.7

5 10 15 20 25 30

log10(condest(A))

p=1, standard p=2, standard p=1, aggr. p=2, aggr.

Alberto F . Martín (Monash University) MWNDEA2020 17/35

slide-27
SLIDE 27

Convergence test

  • 2.5
  • 2
  • 1.5
  • 1

log10(h)

  • 8
  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

1 log10(Error energy norm) p=1, standard p=2, standard p=1, aggr. p=2, aggr. slope 1 slope 2

(a) 2D

  • 2.5
  • 2
  • 1.5
  • 1

log10(h)

  • 8
  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

1 log10(Error energy norm) p=1, standard p=2, standard p=1, aggr. p=2, aggr. slope 1 slope 2

(b) 3D

Alberto F . Martín (Monash University) MWNDEA2020 18/35

slide-28
SLIDE 28

Extension to the Stokes problem2

75.0 |u| 0.0

−∆u + ∇p = f in Ω ∇ · u = 0 in Ω u = 0

  • n ΓD

(∇u − pI) · n = g

  • n ΓN

          

2[Badia, Martín, Verdugo. Mixed aggregated finite element methods for the unfitted

discretization of the stokes problem. SIAM J. Sci. Comput., 40(6). 2018.]

Alberto F . Martín (Monash University) MWNDEA2020 19/35

slide-29
SLIDE 29

Alberto F . Martín (Monash University) MWNDEA2020 20/35

slide-30
SLIDE 30

0.3 0.4 0.5 0.6 0.7 ℓ 5 10 15 20 25 30 35 40 log10 (❝♦♥❞❡st(A)) ❆❣❣r❡❣❛t❡❞ ❙t❛♥❞❛r❞ 0.3 0.4 0.5 0.6 0.7 ℓ 5 10 15 20 25 30 35 40 log10 (❝♦♥❞❡st(A)) ❆❣❣r❡❣❛t❡❞ ❙t❛♥❞❛r❞

Alberto F . Martín (Monash University) MWNDEA2020 21/35

slide-31
SLIDE 31

1.2 1.4 1.6 1.8 2.0 log10

  • (DOF)1/d

−6 −5 −4 −3 −2 log10

  • uh − uH1

uH1

  • ❆❣❣r❡❣❛t❡❞

❙t❛♥❞❛r❞ s❧♦♣❡ ✲✷ 1.2 1.4 1.6 1.8 2.0 log10

  • (DOF)1/d

−6 −5 −4 −3 −2 log10

  • uh − uL2

uL2

  • ❆❣❣r❡❣❛t❡❞

❙t❛♥❞❛r❞ s❧♦♣❡ ✲✸ 1.2 1.4 1.6 1.8 2.0 log10

  • (DOF)1/d

−6 −5 −4 −3 −2 log10

  • ph − pL2

pL2

  • ❆❣❣r❡❣❛t❡❞

❙t❛♥❞❛r❞ s❧♦♣❡ ✲✷

Alberto F . Martín (Monash University) MWNDEA2020 22/35

slide-32
SLIDE 32

Agenda

  • 1. The AgFEM method (serial case)
  • 2. Parallel implementation
  • 3. Performance of parallel AgFEM + AMG solvers

Alberto F . Martín (Monash University) MWNDEA2020 23/35

slide-33
SLIDE 33

Parallel mesh distribution

D1 D2 (a) D = {D1, D2}. (b) View from D1. (c) View from D2

Main phases to be parallelized:

  • Cell Aggregation
  • Imposition of constraints

Alberto F . Martín (Monash University) MWNDEA2020 24/35

slide-34
SLIDE 34

Cell aggregation (serial)

touched untouched aggregated

Alberto F . Martín (Monash University) MWNDEA2020 25/35

slide-35
SLIDE 35

Cell aggregation (serial)

touched untouched aggregated

Alberto F . Martín (Monash University) MWNDEA2020 25/35

slide-36
SLIDE 36

Cell aggregation (serial)

touched untouched aggregated

Alberto F . Martín (Monash University) MWNDEA2020 25/35

slide-37
SLIDE 37

Cell aggregation (serial)

touched untouched aggregated

Alberto F . Martín (Monash University) MWNDEA2020 25/35

slide-38
SLIDE 38

Cell aggregation (serial)

touched untouched aggregated

Alberto F . Martín (Monash University) MWNDEA2020 25/35

slide-39
SLIDE 39

Aggregates in 3D

Alberto F . Martín (Monash University) MWNDEA2020 26/35

slide-40
SLIDE 40

Cell aggregation (parallel)

(a) Step 1.

✔ Standard nearest neighbor communication to determine root cells

Alberto F . Martín (Monash University) MWNDEA2020 27/35

slide-41
SLIDE 41

Cell aggregation (parallel)

(a) Step 1. (b) Step 2.

✔ Standard nearest neighbor communication to determine root cells

Alberto F . Martín (Monash University) MWNDEA2020 27/35

slide-42
SLIDE 42

Cell aggregation (parallel)

(a) Step 1. (b) Step 2. (c) Comm.

✔ Standard nearest neighbor communication to determine root cells

Alberto F . Martín (Monash University) MWNDEA2020 27/35

slide-43
SLIDE 43

Cell aggregation (parallel)

(a) Step 1. (b) Step 2. (c) Comm. (d) Step 3.

✔ Standard nearest neighbor communication to determine root cells

Alberto F . Martín (Monash University) MWNDEA2020 27/35

slide-44
SLIDE 44

Parallel imposition of constraints

Ds′ Ds′′ Ds

✘ Subdomain-local constraints not even possible in some cases ✔ At the end, only standard neighbor communication required

Alberto F . Martín (Monash University) MWNDEA2020 28/35

slide-45
SLIDE 45

Agenda

  • 1. The AgFEM method (serial case)
  • 2. Parallel implementation
  • 3. Performance of parallel AgFEM + AMG solvers

Alberto F . Martín (Monash University) MWNDEA2020 29/35

slide-46
SLIDE 46

Weak scaling test setup

  • Poisson eq.
  • AgFEM vs "naive" unfitted FEM
  • Linear solver:

PCG from PETSc

  • Preconditioner:

smooth aggregation AMG from PETSc (GAMG)

  • Up to 16K cores and 1000M

background cells

Computed at Mare Nostrum 4

Alberto F . Martín (Monash University) MWNDEA2020 30/35

slide-47
SLIDE 47

Number of PCG iterations (weak scaling)

103 104 105 106 107 108 DOFs 4 6 8 10 12 14 16 GC iterations 103 104 105 106 107 108 DOFs 4 6 8 10 12 14 16 GC iterations 103 104 105 106 107 108 DOFs 4 6 8 10 12 14 GC iterations

(a) Popcorn (b) Spiral (c) Swiss Cheese

agg (load 1) agg (load 2) agg (load 3) std (load 1) std (load 2) std (load 3) Alberto F . Martín (Monash University) MWNDEA2020 31/35

slide-48
SLIDE 48

Computational time (secs) AgFEM stages (weak scaling)

105 106 107 108 DOFs 1 2 3 4 Wall clock time [s] 105 106 107 108 DOFs 1 2 3 4 5 Wall clock time [s] 105 106 107 108 DOFs 5 10 15 Wall clock time [s]

(a) Popcorn (b) Spiral (c) Swiss Cheese

Cell aggregation (Alg. 2) Path reconstruction (Algs. 3 and 4) Import data from root cells (Alg. 5) Setup constraints (Sect. 3.8) Setup of local DOFs ids (Sect. 3.3) Setup of global DOFs ids (Sect. 3.7) FE integration + assembly (Table 2) Alberto F . Martín (Monash University) MWNDEA2020 32/35

slide-49
SLIDE 49

Computational time (secs) AMG solver (weak scaling)

105 106 107 108 DOFs 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Wall clock time [s] 105 106 107 108 DOFs 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Wall clock time [s] 104 105 106 107 DOFs 1 2 3 4 Wall clock time [s]

(a) Popcorn (b) Spiral (c) Swiss Cheese

Linear solver setup Linear solver run

  • Setup degradation (11.94x CPU time for 4,448x larger problem)
  • Similar for standard FEM in a box (2.50x for 355.26x)

Alberto F . Martín (Monash University) MWNDEA2020 33/35

slide-50
SLIDE 50

Conclusions

✔ Embedded FEM enables scalable octree-based meshes ✘ ... but can destroy the scalability of linear solvers ✔ AgFEM allows to recover the optimal scaling of linear solver ✔ ... while keeping the optimal discretization order.

Alberto F . Martín (Monash University) MWNDEA2020 34/35

slide-51
SLIDE 51

For more details, see papers: F . Verdugo, A.F . Martín, S. Badia. Distributed-memory parallelization of the aggregated unfitted finite element method. CMAME, 357, 2019.

  • S. Badia, A.F

. Martín, F . Verdugo. Mixed aggregated finite element methods for the unfitted discretization of the Stokes problem. SIAM J.

  • Sci. Comput., 40(6), 2018.
  • S. Badia, F

. Verdugo, A.F. Martín. The aggregated unfitted finite element method for elliptic problems. CMAME, 336, 2018. https://github.com/fempar/fempar

Alberto F . Martín (Monash University) MWNDEA2020 35/35