Scalability analysis of the distributed-memory implementation of the - - PowerPoint PPT Presentation
Scalability analysis of the distributed-memory implementation of the - - PowerPoint PPT Presentation
Scalability analysis of the distributed-memory implementation of the Aggregated unfitted Finite Element Method (AgFEM) Alberto F. Martn , Santiago Badia, Francesc Verdugo, Eric Neiva MWNDEA 2020, Melbourne, Australia, 12/02/2020 Embedded
Embedded Finite elements
CutFEM, Finite Cell Method, AgFEM, X-FEM, ... body-fitted mesh unfitted mesh ✔ Simplified mesh generation
Alberto F . Martín (Monash University) MWNDEA2020 2/35
Embedded Finite elements
CutFEM, Finite Cell Method, AgFEM, X-FEM, ... body-fitted mesh unfitted mesh ✔ Simplified mesh generation ✘ Dirichlet BC? ✘ Numerical integration? ✘ ill-conditioning? (this talk)
Alberto F . Martín (Monash University) MWNDEA2020 2/35
Parallel distributed-memory simulation pipeline
- 1. Unfitted (adaptive) Cartesian grids
(p4est)
Alberto F . Martín (Monash University) MWNDEA2020 3/35
Parallel distributed-memory simulation pipeline
- 1. Unfitted (adaptive) Cartesian grids
(p4est)
- 2. Partition using space filling-curves
(p4est)
Alberto F . Martín (Monash University) MWNDEA2020 3/35
Parallel distributed-memory simulation pipeline
- 1. Unfitted (adaptive) Cartesian grids
(p4est)
- 2. Partition using space filling-curves
(p4est)
- 3. Unfitted FE discretization (AgFEM)
- 4. AMG linear solver (PETSc)
Alberto F . Martín (Monash University) MWNDEA2020 3/35
Unfitted methods at large scales: pros and cons
✔ Highly scalable mesh generation based on octrees (e.g. p4est)
Alberto F . Martín (Monash University) MWNDEA2020 4/35
Unfitted methods at large scales: pros and cons
✔ Highly scalable mesh generation based on octrees (e.g. p4est) ✔ Highly scalable mesh partition with space-filling curves (Parmetis not needed)
Alberto F . Martín (Monash University) MWNDEA2020 4/35
Unfitted methods at large scales: pros and cons
✔ Highly scalable mesh generation based on octrees (e.g. p4est) ✔ Highly scalable mesh partition with space-filling curves (Parmetis not needed) ✔ Highly scalable adaptive mesh refinement + load balancing
Alberto F . Martín (Monash University) MWNDEA2020 4/35
Unfitted methods at large scales: pros and cons
✔ Highly scalable mesh generation based on octrees (e.g. p4est) ✔ Highly scalable mesh partition with space-filling curves (Parmetis not needed) ✔ Highly scalable adaptive mesh refinement + load balancing ✘ Not guaranteed that highly scalable linear solvers keep their optimal properties for cut elements.
Alberto F . Martín (Monash University) MWNDEA2020 4/35
PETSc CG + AMG preconditioner on unfitted meshes
Poisson equation (weak scaling test with 5 meshes)
AMG+AgFEM AMG + Naive unfitted FEM*
103 104 105 106 107 108
DOFs
4 6 8 10 12 14 16
GC iterations
103 104 105 106 107 108
DOFs
4 6 8 10 12 14 16
GC iterations
* Nitsche BCs + modified integration in cut cells
Alberto F . Martín (Monash University) MWNDEA2020 5/35
Why linear solvers are affected by cut cells?
Condition number estimates (Poisson Eq.)
(a) Body-fitted case
k2(A) ∼ h−2
(b) Naive unfitted FEM
k2(A) ∼ |η|−(2p+1− 2
d )
"small cut cell problem"
Alberto F . Martín (Monash University) MWNDEA2020 6/35
Possible remedies
Fix the linear solver Taylor your parallel solver to deal with k2(A) ∼ |η|−(2p+1−2/d) Example: [S. Badia, F . Verdugo. Robust and scalable domain decomposition solvers for unfitted finite element methods. Journal of Computational and Applied Mathematics (2018) ]. Fix the linear system (this talk) Enhance the unfitted FE method so that k2(A) ∼ h−2 Use a standard scalable solver Examples: CutFEM, AgFEM
Alberto F . Martín (Monash University) MWNDEA2020 7/35
Agenda
- 1. The AgFEM method (serial case)
- 2. Parallel implementation
- 3. Performance of parallel AgFEM + AMG solvers
Alberto F . Martín (Monash University) MWNDEA2020 8/35
Agenda
- 1. The AgFEM method (serial case)
- 2. Parallel implementation
- 3. Performance of parallel AgFEM + AMG solvers
Alberto F . Martín (Monash University) MWNDEA2020 9/35
AgFEM method for the Poisson Eq. −∆u = f in Ω u = uD
- n
∂Ω
- Alberto F
. Martín (Monash University) MWNDEA2020 10/35
Weak imposition of Dirichlet BCs
Nitsche’s Method Find uh ∈ V h such that ah(uh, vh) = lh(vh) ∀vh ∈ V h (vh does not vanish on ∂Ω!) where ah(uh, vh) :=
- K∈T act
h
- K∩Ω
∇uh · ∇vh −
- F ∈T act
h
∩∂Ω
- F
(∇uh · n)vh −
- F ∈(T act
h
∩∂Ω)
- F
uh(∇vh · n) +
- F ∈(T act
h
∩∂Ω)
βh−1
F
- ∂Ω
uh vh lh(vh) :=
- K∈T act
h
- K∩Ω
f vh +
- F ∈(T act
h
∩∂Ω)
βh−1
F
- F
uDvh −
- F ∈(T act
h
∩∂Ω)
- F
uD ∇vh · n
- The key feature of AgFEM is the definition of the discrete space Vh
Alberto F . Martín (Monash University) MWNDEA2020 11/35
Starting point: "naive" FE space
V std
h
:= {u ∈ C0(Ωact) : u|K ∈ Qp(K) ∀K ∈ T act
h
} T act
h
, Ωact V std
h Alberto F . Martín (Monash University) MWNDEA2020 12/35
Aggregated FE space
Basic idea: improve conditioning by removing problematic DOFs V agg
h
:= u ∈ Vh : u× =
- ∈masters(×)
Cוu• ∀× ∈ P
- well-posed dofs
× problematic dofs (P)
Alberto F . Martín (Monash University) MWNDEA2020 13/35
Definition of constraints via cell aggregates
Alberto F . Martín (Monash University) MWNDEA2020 14/35
Definition of constraints via cell aggregates
- 1. Generate cell aggregates
(1 interior cell + several cut cells)
Alberto F . Martín (Monash University) MWNDEA2020 14/35
Definition of constraints via cell aggregates
- 1. Generate cell aggregates
(1 interior cell + several cut cells)
- 2. Define dof to root cell map root(×)
via the aggregates
Alberto F . Martín (Monash University) MWNDEA2020 14/35
Definition of constraints via cell aggregates
- 1. Generate cell aggregates
(1 interior cell + several cut cells)
- 2. Define dof to root cell map root(×)
via the aggregates
- 3. Define constraints:
u× =
- ∈dofs(root(×))
φroot(×)
- (x×)u•
Alberto F . Martín (Monash University) MWNDEA2020 14/35
Results for the unfitted aggregated FEM (Poisson Eq.)1 κ(A) ≤ c1h−2 (Condition number bound) β ≤ c2h−2 (Nitsche’s penalty coef.) u − uhH1(Ω) ≤ c3hp (Optimal convergence order) u − uhL2(Ω) ≤ c4hp+1 (Optimal convergence order) and others (inverse/trace inequalities, bound of aggregate size, bound
- f the extended solution, ...)
1 [Badia, Verdugo, Martín. The aggregated unfitted finite element method for elliptic
- problems. Comput. Methods Appl. Mech. Eng. (2018).]
Alberto F . Martín (Monash University) MWNDEA2020 15/35
0.3 0.4 0.5 0.6 0.7
ℓ
5 10 15 20 25 30
log10(condest(A))
p=1, standard p=2, standard p=1, aggr. p=2, aggr.
Alberto F . Martín (Monash University) MWNDEA2020 16/35
0.3 0.4 0.5 0.6 0.7
ℓ
5 10 15 20 25 30
log10(condest(A))
p=1, standard p=2, standard p=1, aggr. p=2, aggr.
Alberto F . Martín (Monash University) MWNDEA2020 17/35
Convergence test
- 2.5
- 2
- 1.5
- 1
log10(h)
- 8
- 7
- 6
- 5
- 4
- 3
- 2
- 1
1 log10(Error energy norm) p=1, standard p=2, standard p=1, aggr. p=2, aggr. slope 1 slope 2
(a) 2D
- 2.5
- 2
- 1.5
- 1
log10(h)
- 8
- 7
- 6
- 5
- 4
- 3
- 2
- 1
1 log10(Error energy norm) p=1, standard p=2, standard p=1, aggr. p=2, aggr. slope 1 slope 2
(b) 3D
Alberto F . Martín (Monash University) MWNDEA2020 18/35
Extension to the Stokes problem2
75.0 |u| 0.0
−∆u + ∇p = f in Ω ∇ · u = 0 in Ω u = 0
- n ΓD
(∇u − pI) · n = g
- n ΓN
2[Badia, Martín, Verdugo. Mixed aggregated finite element methods for the unfitted
discretization of the stokes problem. SIAM J. Sci. Comput., 40(6). 2018.]
Alberto F . Martín (Monash University) MWNDEA2020 19/35
Alberto F . Martín (Monash University) MWNDEA2020 20/35
0.3 0.4 0.5 0.6 0.7 ℓ 5 10 15 20 25 30 35 40 log10 (❝♦♥❞❡st(A)) ❆❣❣r❡❣❛t❡❞ ❙t❛♥❞❛r❞ 0.3 0.4 0.5 0.6 0.7 ℓ 5 10 15 20 25 30 35 40 log10 (❝♦♥❞❡st(A)) ❆❣❣r❡❣❛t❡❞ ❙t❛♥❞❛r❞
Alberto F . Martín (Monash University) MWNDEA2020 21/35
1.2 1.4 1.6 1.8 2.0 log10
- (DOF)1/d
−6 −5 −4 −3 −2 log10
- uh − uH1
uH1
- ❆❣❣r❡❣❛t❡❞
❙t❛♥❞❛r❞ s❧♦♣❡ ✲✷ 1.2 1.4 1.6 1.8 2.0 log10
- (DOF)1/d
−6 −5 −4 −3 −2 log10
- uh − uL2
uL2
- ❆❣❣r❡❣❛t❡❞
❙t❛♥❞❛r❞ s❧♦♣❡ ✲✸ 1.2 1.4 1.6 1.8 2.0 log10
- (DOF)1/d
−6 −5 −4 −3 −2 log10
- ph − pL2
pL2
- ❆❣❣r❡❣❛t❡❞
❙t❛♥❞❛r❞ s❧♦♣❡ ✲✷
Alberto F . Martín (Monash University) MWNDEA2020 22/35
Agenda
- 1. The AgFEM method (serial case)
- 2. Parallel implementation
- 3. Performance of parallel AgFEM + AMG solvers
Alberto F . Martín (Monash University) MWNDEA2020 23/35
Parallel mesh distribution
D1 D2 (a) D = {D1, D2}. (b) View from D1. (c) View from D2
Main phases to be parallelized:
- Cell Aggregation
- Imposition of constraints
Alberto F . Martín (Monash University) MWNDEA2020 24/35
Cell aggregation (serial)
touched untouched aggregated
Alberto F . Martín (Monash University) MWNDEA2020 25/35
Cell aggregation (serial)
touched untouched aggregated
Alberto F . Martín (Monash University) MWNDEA2020 25/35
Cell aggregation (serial)
touched untouched aggregated
Alberto F . Martín (Monash University) MWNDEA2020 25/35
Cell aggregation (serial)
touched untouched aggregated
Alberto F . Martín (Monash University) MWNDEA2020 25/35
Cell aggregation (serial)
touched untouched aggregated
Alberto F . Martín (Monash University) MWNDEA2020 25/35
Aggregates in 3D
Alberto F . Martín (Monash University) MWNDEA2020 26/35
Cell aggregation (parallel)
(a) Step 1.
✔ Standard nearest neighbor communication to determine root cells
Alberto F . Martín (Monash University) MWNDEA2020 27/35
Cell aggregation (parallel)
(a) Step 1. (b) Step 2.
✔ Standard nearest neighbor communication to determine root cells
Alberto F . Martín (Monash University) MWNDEA2020 27/35
Cell aggregation (parallel)
(a) Step 1. (b) Step 2. (c) Comm.
✔ Standard nearest neighbor communication to determine root cells
Alberto F . Martín (Monash University) MWNDEA2020 27/35
Cell aggregation (parallel)
(a) Step 1. (b) Step 2. (c) Comm. (d) Step 3.
✔ Standard nearest neighbor communication to determine root cells
Alberto F . Martín (Monash University) MWNDEA2020 27/35
Parallel imposition of constraints
Ds′ Ds′′ Ds
✘ Subdomain-local constraints not even possible in some cases ✔ At the end, only standard neighbor communication required
Alberto F . Martín (Monash University) MWNDEA2020 28/35
Agenda
- 1. The AgFEM method (serial case)
- 2. Parallel implementation
- 3. Performance of parallel AgFEM + AMG solvers
Alberto F . Martín (Monash University) MWNDEA2020 29/35
Weak scaling test setup
- Poisson eq.
- AgFEM vs "naive" unfitted FEM
- Linear solver:
PCG from PETSc
- Preconditioner:
smooth aggregation AMG from PETSc (GAMG)
- Up to 16K cores and 1000M
background cells
Computed at Mare Nostrum 4
Alberto F . Martín (Monash University) MWNDEA2020 30/35
Number of PCG iterations (weak scaling)
103 104 105 106 107 108 DOFs 4 6 8 10 12 14 16 GC iterations 103 104 105 106 107 108 DOFs 4 6 8 10 12 14 16 GC iterations 103 104 105 106 107 108 DOFs 4 6 8 10 12 14 GC iterations
(a) Popcorn (b) Spiral (c) Swiss Cheese
agg (load 1) agg (load 2) agg (load 3) std (load 1) std (load 2) std (load 3) Alberto F . Martín (Monash University) MWNDEA2020 31/35
Computational time (secs) AgFEM stages (weak scaling)
105 106 107 108 DOFs 1 2 3 4 Wall clock time [s] 105 106 107 108 DOFs 1 2 3 4 5 Wall clock time [s] 105 106 107 108 DOFs 5 10 15 Wall clock time [s]
(a) Popcorn (b) Spiral (c) Swiss Cheese
Cell aggregation (Alg. 2) Path reconstruction (Algs. 3 and 4) Import data from root cells (Alg. 5) Setup constraints (Sect. 3.8) Setup of local DOFs ids (Sect. 3.3) Setup of global DOFs ids (Sect. 3.7) FE integration + assembly (Table 2) Alberto F . Martín (Monash University) MWNDEA2020 32/35
Computational time (secs) AMG solver (weak scaling)
105 106 107 108 DOFs 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Wall clock time [s] 105 106 107 108 DOFs 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Wall clock time [s] 104 105 106 107 DOFs 1 2 3 4 Wall clock time [s]
(a) Popcorn (b) Spiral (c) Swiss Cheese
Linear solver setup Linear solver run
- Setup degradation (11.94x CPU time for 4,448x larger problem)
- Similar for standard FEM in a box (2.50x for 355.26x)
Alberto F . Martín (Monash University) MWNDEA2020 33/35
Conclusions
✔ Embedded FEM enables scalable octree-based meshes ✘ ... but can destroy the scalability of linear solvers ✔ AgFEM allows to recover the optimal scaling of linear solver ✔ ... while keeping the optimal discretization order.
Alberto F . Martín (Monash University) MWNDEA2020 34/35
For more details, see papers: F . Verdugo, A.F . Martín, S. Badia. Distributed-memory parallelization of the aggregated unfitted finite element method. CMAME, 357, 2019.
- S. Badia, A.F
. Martín, F . Verdugo. Mixed aggregated finite element methods for the unfitted discretization of the Stokes problem. SIAM J.
- Sci. Comput., 40(6), 2018.
- S. Badia, F