www.bsc.es
San José, California
Samuel Rodríguez Bernabeu samuel.rodriguez@bsc.es
New Approaches to the Direct Solution
- f Large-Scale Banded Linear Systems
New Approaches to the Direct Solution of Large-Scale Banded Linear - - PowerPoint PPT Presentation
www.bsc.es New Approaches to the Direct Solution of Large-Scale Banded Linear Systems Samuel Rodrguez Bernabeu samuel.rodriguez@bsc.es San Jos, California 2 Solving the linear system after reordering 10 Million degrees of freedom, double
San José, California
2
3
4
5
Azad, Ariful, et al. "The Reverse Cuthill-McKee Algorithm in Distributed-Memory." arXiv preprint arXiv:1610.08128 (2016)..
7
A parallel hybrid banded system solver: the SPIKE algorithm. Parallel computing, 32(2), 177-194. Polizzi, E., & Sameh, A. H. (2006). A1 A2 A3 A4 B1 C2 C3 C4 B3 B2 Illustration block matrix decomposition layout used by the SPIKE algorithm for a 4 partitions case (p = 4). fully asynchronous work prevents scalability
Algorithm 1: SPIKE solver pseudocode. Given A, P and x, solves the linear system
x = P T b for x and returns the Px Data: A,P,b Result: P T x
1 A := PAP T 2 b := Pb 3 p = ComputeOptimalNumerOfPartitions(A, nrhs) 4 for i in p do 5
Vi = A−1
i
0 Bi
Wi = A−1
i
Ci
Fi = A−1
i bi 8 end 9 ¯
x = SolveReducedSystem(V top,bottom
i
, W top,bottom
i
, F top,bottom
i
)
10 for i in p do 11
xi = A−1
i
✓ Fi − 0 Bi
xtop
i+1 −
Ci
xbottom
i−1
◆
12 end 13 x = P T x
8
A parallel hybrid banded system solver: the SPIKE algorithm. Parallel computing, 32(2), 177-194. Polizzi, E., & Sameh, A. H. (2006). A1 A2 A3 A4 B1 C2 C3 C4 B3 B2 Illustration block matrix decomposition layout used by the SPIKE algorithm for a 4 partitions case (p = 4). fully asynchronous work prevents scalability
Algorithm 1: SPIKE solver pseudocode. Given A, P and x, solves the linear system
x = P T b for x and returns the Px Data: A,P,b Result: P T x
1 A := PAP T 2 b := Pb 3 p = ComputeOptimalNumerOfPartitions(A, nrhs) 4 for i in p do 5
Vi = A−1
i
0 Bi
Wi = A−1
i
Ci
Fi = A−1
i bi 8 end 9 ¯
x = SolveReducedSystem(V top,bottom
i
, W top,bottom
i
, F top,bottom
i
)
10 for i in p do 11
xi = A−1
i
✓ Fi − 0 Bi
xtop
i+1 −
Ci
xbottom
i−1
◆
12 end 13 x = P T x
9
PARDISO
LM-SPIKE (p=5) LM-SPIKE (p=20)
11
12
13
A B C D
I CA−1 I A D − CA−1B I A−1B I
14
(JUST FOR DIAGONALLY DOMINANT CASES?)
16
x11 = A−1
11 b11 + A−1 12 b21 + A−1 13 b31
x = A−1b = A−1
11
A−1
12
A−1
13
A−1
21
A−1
22
A−1
23
A−1
31
A−1
32
A−1
33
b11 b21 b31
Algorithm 1: SPIKE solver pseudocode. Given A, P and x, solves the linear system
x = P T b for x and returns the Px Data: A,P,b Result: P T x
1 A := PAP T 2 b := Pb 3 p = ComputeOptimalNumerOfPartitions(A, nrhs) 4 for i in p do 5
Vi = A−1
i
0 Bi
Wi = A−1
i
Ci
Fi = A−1
i bi 8 end 9 ¯
x = SolveReducedSystem(V top,bottom
i
, W top,bottom
i
, F top,bottom
i
)
10 for i in p do 11
xi = A−1
i
✓ Fi − 0 Bi
xtop
i+1 −
Ci
xbottom
i−1
◆
12 end 13 x = P T x
Kuzmin, A., Luisier, M., & Schenk, O. (2013, August). Fast methods for computing selected elements of the Green’s function in massively parallel nanoelectronic device simulations. In European Conference on Parallel Processing (pp. 533-544). Springer Berlin Heidelberg.
17
x11 = A−1
11 b11 + A−1 12 b21 + A−1 13 b31
x11 ≈ A−1
11 b11
Diagonal Dominance
Algorithm 1: SPIKE solver pseudocode. Given A, P and x, solves the linear system
x = P T b for x and returns the Px Data: A,P,b Result: P T x
1 A := PAP T 2 b := Pb 3 p = ComputeOptimalNumerOfPartitions(A, nrhs) 4 for i in p do 5
Vi = A−1
i
0 Bi
Wi = A−1
i
Ci
Fi = A−1
i bi 8 end 9 ¯
x = SolveReducedSystem(V top,bottom
i
, W top,bottom
i
, F top,bottom
i
)
10 for i in p do 11
xi = A−1
i
✓ Fi − 0 Bi
xtop
i+1 −
Ci
xbottom
i−1
◆
12 end 13 x = P T x
x = A−1b = A−1
11
A−1
12
A−1
13
A−1
21
A−1
22
A−1
23
A−1
31
A−1
32
A−1
33
b11 b21 b31
Demko, S., Moss, W. F., & Smith, P. W. (1984). Decay rates for inverses of band matrices. Mathematics of computation, 43(168), 491-499.
18
x11 = A−1
11 b11 + A−1 12 b21 + A−1 13 b31
x11 ≈ A−1
11 b11
A−1 =
∞
X
k=0
(I A)k ( ) kI Ak∞< 1
via Neumann Series Diagonal Dominance
Algorithm 1: SPIKE solver pseudocode. Given A, P and x, solves the linear system
x = P T b for x and returns the Px Data: A,P,b Result: P T x
1 A := PAP T 2 b := Pb 3 p = ComputeOptimalNumerOfPartitions(A, nrhs) 4 for i in p do 5
Vi = A−1
i
0 Bi
Wi = A−1
i
Ci
Fi = A−1
i bi 8 end 9 ¯
x = SolveReducedSystem(V top,bottom
i
, W top,bottom
i
, F top,bottom
i
)
10 for i in p do 11
xi = A−1
i
✓ Fi − 0 Bi
xtop
i+1 −
Ci
xbottom
i−1
◆
12 end 13 x = P T x
x = A−1b = A−1
11
A−1
12
A−1
13
A−1
21
A−1
22
A−1
23
A−1
31
A−1
32
A−1
33
b11 b21 b31
Demko, S., Moss, W. F., & Smith, P. W. (1984). Decay rates for inverses of band matrices. Mathematics of computation, 43(168), 491-499.
19
(I − A)k (I − A)k−1 (I − A)
Truncated SPIKE Neumann Truncated SPIKE Memory O( n
2 × (ku + kl))
O((ku + kl)2) FLOPs O( n
2 × (ku + kl)2)
O((ku + kl)3) Relative Error 1.83 × 10−13 8.10 × 10−13
A−1 =
∞
X
k=0
(I A)k ( ) kI Ak∞< 1
Space and time complexity comparison between LU (diagonal boosting) and Neumann series approach applied to the factorization stage of the SPIKE al- gorithm (p = 2). Relative error is referred to the solution of the whole linear system using SPIKE.
Mikkelsen, C. C. K., & Manguoglu, M. (2008). Analysis of the truncated SPIKE algorithm. SIAM Journal on Matrix Analysis and Applications, 30(4), 1500-1519.
20
21
A−1 =
∞
X
k=0
(I A)k ( ) kI Ak∞< 1 1 c
−1 =
∞
X
k=0
k ( ) kI P −1 (I cA) k∞< 1 (I A)−1 =
∞
X
k=0
Ak ( ) kAk∞< 1 (I cA)−1 =
∞
X
k=0
ckAk ( ) kcAk∞< 1
Direct evaluation of A−1 : Indirect or Nested evaluation of A−1 :
A−1 = − (I − A)−1 h I − (I − A)−1i−1 =
∞
X
k=0
Ak +
∞
X
j=0
∞ X
k=0
Ak !j ⇐ ⇒ ?
22
23