New Approaches to the Direct Solution of Large-Scale Banded Linear - - PowerPoint PPT Presentation

new approaches to the direct solution of large scale
SMART_READER_LITE
LIVE PREVIEW

New Approaches to the Direct Solution of Large-Scale Banded Linear - - PowerPoint PPT Presentation

www.bsc.es New Approaches to the Direct Solution of Large-Scale Banded Linear Systems Samuel Rodrguez Bernabeu samuel.rodriguez@bsc.es San Jos, California 2 Solving the linear system after reordering 10 Million degrees of freedom, double


slide-1
SLIDE 1

www.bsc.es

San José, California

Samuel Rodríguez Bernabeu samuel.rodriguez@bsc.es

New Approaches to the Direct Solution

  • f Large-Scale Banded Linear Systems
slide-2
SLIDE 2

2

slide-3
SLIDE 3

3

Solving the linear system after reordering

10 Million degrees of freedom, double precision, complex linear system with a bandwidth of 2500 may require 1490 GB just to store LU matrix factors using LU factorization with partial pivoting.

It is possible to solve it with less than 60 GB

slide-4
SLIDE 4

4

Solving the linear system

slide-5
SLIDE 5

5

Solving the linear system after reordering

Azad, Ariful, et al. "The Reverse Cuthill-McKee Algorithm in Distributed-Memory." arXiv preprint arXiv:1610.08128 (2016)..

slide-6
SLIDE 6

BUILDING BLOCK: LM-SPIKE: MEMORY FRIENDLY DIRECT SOLVER

slide-7
SLIDE 7

7

SPIKE algorithm

A parallel hybrid banded system solver: the SPIKE algorithm. Parallel computing, 32(2), 177-194. Polizzi, E., & Sameh, A. H. (2006). A1 A2 A3 A4 B1 C2 C3 C4 B3 B2 Illustration block matrix decomposition layout used by the SPIKE algorithm for a 4 partitions case (p = 4). fully asynchronous work prevents scalability

Algorithm 1: SPIKE solver pseudocode. Given A, P and x, solves the linear system

  • PAP T

x = P T b for x and returns the Px Data: A,P,b Result: P T x

1 A := PAP T 2 b := Pb 3 p = ComputeOptimalNumerOfPartitions(A, nrhs) 4 for i in p do 5

Vi = A−1

i

 0 Bi

  • 6

Wi = A−1

i

Ci

  • 7

Fi = A−1

i bi 8 end 9 ¯

x = SolveReducedSystem(V top,bottom

i

, W top,bottom

i

, F top,bottom

i

)

10 for i in p do 11

xi = A−1

i

✓ Fi −  0 Bi

  • ¯

xtop

i+1 −

Ci

  • ¯

xbottom

i−1

12 end 13 x = P T x

slide-8
SLIDE 8

8

SPIKE algorithm

A parallel hybrid banded system solver: the SPIKE algorithm. Parallel computing, 32(2), 177-194. Polizzi, E., & Sameh, A. H. (2006). A1 A2 A3 A4 B1 C2 C3 C4 B3 B2 Illustration block matrix decomposition layout used by the SPIKE algorithm for a 4 partitions case (p = 4). fully asynchronous work prevents scalability

Algorithm 1: SPIKE solver pseudocode. Given A, P and x, solves the linear system

  • PAP T

x = P T b for x and returns the Px Data: A,P,b Result: P T x

1 A := PAP T 2 b := Pb 3 p = ComputeOptimalNumerOfPartitions(A, nrhs) 4 for i in p do 5

Vi = A−1

i

 0 Bi

  • 6

Wi = A−1

i

Ci

  • 7

Fi = A−1

i bi 8 end 9 ¯

x = SolveReducedSystem(V top,bottom

i

, W top,bottom

i

, F top,bottom

i

)

10 for i in p do 11

xi = A−1

i

✓ Fi −  0 Bi

  • ¯

xtop

i+1 −

Ci

  • ¯

xbottom

i−1

12 end 13 x = P T x

slide-9
SLIDE 9

9

Controlling memory consumption

PARDISO

LM-SPIKE (p=5) LM-SPIKE (p=20)

slide-10
SLIDE 10

FIRST IDEA: LOCAL MATRIX BANDWIDTH REDUCTION

slide-11
SLIDE 11

11

Could we reduce the bandwidth locally?

slide-12
SLIDE 12

12

Could we reduce the bandwidth locally?

slide-13
SLIDE 13

13

Could we reduce the bandwidth locally?

 A B C D

  • =

 I CA−1 I  A D − CA−1B  I A−1B I

slide-14
SLIDE 14

14

Could we reduce the bandwidth locally?

+20 Million degrees of freedom linear system arising from frequency domain discretization of Maxwell’s equations for CSEM. Notice that the inclusion of high-conductivity air layer makes the problem highly ill-conditioned.

slide-15
SLIDE 15

SECOND IDEA: COMPUTING A SUBSET OF THE INVERSE

(JUST FOR DIAGONALLY DOMINANT CASES?)

slide-16
SLIDE 16

16

x11 = A−1

11 b11 + A−1 12 b21 + A−1 13 b31

x = A−1b =   A−1

11

A−1

12

A−1

13

A−1

21

A−1

22

A−1

23

A−1

31

A−1

32

A−1

33

    b11 b21 b31  

Algorithm 1: SPIKE solver pseudocode. Given A, P and x, solves the linear system

  • PAP T

x = P T b for x and returns the Px Data: A,P,b Result: P T x

1 A := PAP T 2 b := Pb 3 p = ComputeOptimalNumerOfPartitions(A, nrhs) 4 for i in p do 5

Vi = A−1

i

 0 Bi

  • 6

Wi = A−1

i

Ci

  • 7

Fi = A−1

i bi 8 end 9 ¯

x = SolveReducedSystem(V top,bottom

i

, W top,bottom

i

, F top,bottom

i

)

10 for i in p do 11

xi = A−1

i

✓ Fi −  0 Bi

  • ¯

xtop

i+1 −

Ci

  • ¯

xbottom

i−1

12 end 13 x = P T x

Computing selected elements of the inverse

Kuzmin, A., Luisier, M., & Schenk, O. (2013, August). Fast methods for computing selected elements of the Green’s function in massively parallel nanoelectronic device simulations. In European Conference on Parallel Processing (pp. 533-544). Springer Berlin Heidelberg.

slide-17
SLIDE 17

17

x11 = A−1

11 b11 + A−1 12 b21 + A−1 13 b31

x11 ≈ A−1

11 b11

? y

Diagonal Dominance

Algorithm 1: SPIKE solver pseudocode. Given A, P and x, solves the linear system

  • PAP T

x = P T b for x and returns the Px Data: A,P,b Result: P T x

1 A := PAP T 2 b := Pb 3 p = ComputeOptimalNumerOfPartitions(A, nrhs) 4 for i in p do 5

Vi = A−1

i

 0 Bi

  • 6

Wi = A−1

i

Ci

  • 7

Fi = A−1

i bi 8 end 9 ¯

x = SolveReducedSystem(V top,bottom

i

, W top,bottom

i

, F top,bottom

i

)

10 for i in p do 11

xi = A−1

i

✓ Fi −  0 Bi

  • ¯

xtop

i+1 −

Ci

  • ¯

xbottom

i−1

12 end 13 x = P T x

x = A−1b =   A−1

11

A−1

12

A−1

13

A−1

21

A−1

22

A−1

23

A−1

31

A−1

32

A−1

33

    b11 b21 b31  

Demko, S., Moss, W. F., & Smith, P. W. (1984). Decay rates for inverses of band matrices. Mathematics of computation, 43(168), 491-499.

Computing selected elements of the inverse

slide-18
SLIDE 18

18

x11 = A−1

11 b11 + A−1 12 b21 + A−1 13 b31

x11 ≈ A−1

11 b11

? y

A−1 =

X

k=0

(I A)k ( ) kI Ak∞< 1

? y

via Neumann Series Diagonal Dominance

Algorithm 1: SPIKE solver pseudocode. Given A, P and x, solves the linear system

  • PAP T

x = P T b for x and returns the Px Data: A,P,b Result: P T x

1 A := PAP T 2 b := Pb 3 p = ComputeOptimalNumerOfPartitions(A, nrhs) 4 for i in p do 5

Vi = A−1

i

 0 Bi

  • 6

Wi = A−1

i

Ci

  • 7

Fi = A−1

i bi 8 end 9 ¯

x = SolveReducedSystem(V top,bottom

i

, W top,bottom

i

, F top,bottom

i

)

10 for i in p do 11

xi = A−1

i

✓ Fi −  0 Bi

  • ¯

xtop

i+1 −

Ci

  • ¯

xbottom

i−1

12 end 13 x = P T x

x = A−1b =   A−1

11

A−1

12

A−1

13

A−1

21

A−1

22

A−1

23

A−1

31

A−1

32

A−1

33

    b11 b21 b31  

Demko, S., Moss, W. F., & Smith, P. W. (1984). Decay rates for inverses of band matrices. Mathematics of computation, 43(168), 491-499.

Computing selected elements of the inverse

slide-19
SLIDE 19

19

· =

(I − A)k (I − A)k−1 (I − A)

Truncated SPIKE Neumann Truncated SPIKE Memory O( n

2 × (ku + kl))

O((ku + kl)2) FLOPs O( n

2 × (ku + kl)2)

O((ku + kl)3) Relative Error 1.83 × 10−13 8.10 × 10−13

A−1 =

X

k=0

(I A)k ( ) kI Ak∞< 1

Space and time complexity comparison between LU (diagonal boosting) and Neumann series approach applied to the factorization stage of the SPIKE al- gorithm (p = 2). Relative error is referred to the solution of the whole linear system using SPIKE.

Computing selected elements of the inverse

Mikkelsen, C. C. K., & Manguoglu, M. (2008). Analysis of the truncated SPIKE algorithm. SIAM Journal on Matrix Analysis and Applications, 30(4), 1500-1519.

slide-20
SLIDE 20

20

Computing selected elements of the inverse

250x 400x

slide-21
SLIDE 21

21

Could we extend this to more general cases?

A−1 =

X

k=0

(I A)k ( ) kI Ak∞< 1 1 c

  • P −1A

−1 =

X

k=0

  • I P −1 (I cA)

k ( ) kI P −1 (I cA) k∞< 1 (I A)−1 =

X

k=0

Ak ( ) kAk∞< 1 (I cA)−1 =

X

k=0

ckAk ( ) kcAk∞< 1

Direct evaluation of A−1 : Indirect or Nested evaluation of A−1 :

A−1 = − (I − A)−1 h I − (I − A)−1i−1 =

X

k=0

Ak +

X

j=0

∞ X

k=0

Ak !j ⇐ ⇒ ?

slide-22
SLIDE 22

22

Takeways

It is possible to use direct solvers for solving large 3D problems. We’re working on on-the-fly factorizations for diagonal blocks. Ninja skills are required, but not sufficient

slide-23
SLIDE 23

www.bsc.es

Thank you!

samuel.rodriguez@bsc.es

23