[PPT] - A compact Arnoldi algorithm for polynomial eigenvalue problems PowerPoint Presentation

SLIDE 1

A compact Arnoldi algorithm for polynomial eigenvalue problems

Yangfeng Su, Junyi Zhang, Zhaojun Bai

School of Mathematical Sciences Fudan University

January, 2008, Taiwan RANMEP2008

SLIDE 2

Goals

Polynomial Eigenvalue Problems (PEPs):

A0 + λA1 + ... + λdAd
x = 0

(λ, x) is called an eigenpair. d=1, SEP,GEP d=2, QEP ( Quadratic Eigenvalue Problem) Goals Solving large scale polynomial eigenvalue problems with Implicitly Restarted Arnoldi (IRA) algorithm with less memory.

SLIDE 3

Goals

Polynomial Eigenvalue Problems (PEPs):

A0 + λA1 + ... + λdAd
x = 0

(λ, x) is called an eigenpair. d=1, SEP,GEP d=2, QEP ( Quadratic Eigenvalue Problem) Goals Solving large scale polynomial eigenvalue problems with Implicitly Restarted Arnoldi (IRA) algorithm with less memory.

SLIDE 4

Outline

1 Background 2 Algorithm 3 Numerical Comparison

SLIDE 5

Outline

1 Background 2 Algorithm 3 Numerical Comparison

SLIDE 6

Linearization

Linearization ⇒ A larger GEP:           A0 I ... I      − λ      −A1 −A2 . . . −Ad I ... I                x λx . . . λd−1x      = 0 Sleijpen, van der Vorst, and van Gijzen. Quadratic eigenproblems are no problem, SIAM News, 1996 Can be solved with ARPACK Not true: structure is neither used nor preserved. size and therefore memory are d times! This work: explore the structure to save memory in ARPACK. It is a byproduct of SOAR [Bai & S. SIMAX 2005].

SLIDE 7

Linearization

Linearization ⇒ A larger GEP:           A0 I ... I      − λ      −A1 −A2 . . . −Ad I ... I                x λx . . . λd−1x      = 0 Sleijpen, van der Vorst, and van Gijzen. Quadratic eigenproblems are no problem, SIAM News, 1996 Can be solved with ARPACK Not true: structure is neither used nor preserved. size and therefore memory are d times! This work: explore the structure to save memory in ARPACK. It is a byproduct of SOAR [Bai & S. SIMAX 2005].

SLIDE 8

Linearization

Linearization ⇒ A larger GEP:           A0 I ... I      − λ      −A1 −A2 . . . −Ad I ... I                x λx . . . λd−1x      = 0 Sleijpen, van der Vorst, and van Gijzen. Quadratic eigenproblems are no problem, SIAM News, 1996 Can be solved with ARPACK Not true: structure is neither used nor preserved. size and therefore memory are d times! This work: explore the structure to save memory in ARPACK. It is a byproduct of SOAR [Bai & S. SIMAX 2005].

SLIDE 9

Linearization

Linearization ⇒ A larger GEP:           A0 I ... I      − λ      −A1 −A2 . . . −Ad I ... I                x λx . . . λd−1x      = 0 Sleijpen, van der Vorst, and van Gijzen. Quadratic eigenproblems are no problem, SIAM News, 1996 Can be solved with ARPACK Not true: structure is neither used nor preserved. size and therefore memory are d times! This work: explore the structure to save memory in ARPACK. It is a byproduct of SOAR [Bai & S. SIMAX 2005].

SLIDE 10

Arnoldi decomposition

An Arnoldi decomposition of order-j: OPQj = QjHj + hj+1,jqj+1eT

j

where OP: a matrix Qj = [q1, q2, . . . , qj]: orthonormal Hj: upper Hessenberg Arnoldi process is used to compute the Arnoldi decomposition with numerical stability

SLIDE 11

Implicitly Restarted Arnoldi (IRA) for SEP

Sorensen [SIMAX92] Given an Arnoldi decomposition of order-p, OPQp = QpHp + βpqp+1eT

p .

1 Extend Arnoldi decomposition from order p to order k:

AQk = QkHk + βkqk+1eT

k .

2 Divide the eigenvalues of Hk as “good” ones µ1, . . . , µp and

“bad” ones µp+1, . . . , µk.

3 For Hk do implicitly QR steps with shifts µk+1, . . . , µk, get

Hk = U ˜ HkUH

4 Take first p columns of

AQkU = QkU ˜ Hk + βkqk+1eT

k U

as a restarted Arnoldi decomposition of order p.

SLIDE 12

ARPACK

ARPACK is an implementation of IRA algorithm a well-coded, well-documented package produced by Lehoucq, Sorensen and Yang during 1992-1997 used in MATLAB as eigs and arpackc

SLIDE 13

IRA for QEP

For simplicity, we only discuss QEPs. For QEP: (λ2M + λD + K)x = 0

1 shift-and-invert: for shift σ, let λ = σ + 1/µ

(µ2I − µA − B)x = 0 where A = −(σ2M + σD + K)−1(2σM + D) B = −(σ2M + σD + K)−1M

2 linearize

A B I µx x

= µ

µx x

3 apply IRA

SLIDE 14

IRA for QEP

Easy use of ARPACK How to utilize the Frobenius structure to save memory?

SLIDE 15

Outline

1 Background 2 Algorithm 3 Numerical Comparison

SLIDE 16

Arnoldi decomposition for QEP

An Arnoldi decomposition with order-j OPQj = Qj+1 Hj For QEP: A B I Qj,1 Qj,2

=

Qj+1,1 Qj+1,2

Hj

Since Qj,1 = Qj+1,2 Hj we have Theorem rank ([Qj,1, Qj,2]) ≡ rj ≤ j + 1 Observed by many people, e.g. Meerbergen [SIMAX06], Bai & S. [SIMAX05]. The key is how to use it with numerical stability.

SLIDE 17

Arnoldi Decomposition for QEP

Theorem rank ([Qj,1, Qj,2]) ≡ rj ≤ j + 1 Let Vj ∈ C n×rj = orth[Qj,1, Qj,2] then Qj = Qj,1 Qj,2

=

VjRj,1 VjRj,2

=

Vj Vj Rj,1 Rj,2

Two levels of orthonormality:

Vj is orthonormal Rj,1 Rj,2

is orthonormal

SLIDE 18

Compact ARnoldi Decomposition (CARD)

Compact ARnoldi Decomposition (CARD) OP Vj Vj Rj,1 Rj,2

=

Vj+1 Vj+1 Rj+1,1 Rj+1,2

Hj

Vj ∈ C n×rj Rj ∈ C rj×j j ≤ rj+1 ≤ j + 1 Memory cost: Arnoldi: 2n(j + 1) ( for PEPs: dn(j + 1)) CARD: nrj+1 ≤ n(j + 2) (for PEPs: ≤ n(j + d + 1))

SLIDE 19

CARD process

CARD process is to compute the CARD with numerical stability! CARD of order j: OP VR1 VR2

=

VR1 VR2

H + βqe1 =

ˆ V ˆ R1 ˆ V ˆ R2

ˆ

H Expand it to a CARD of order j + 1 ⇒ next two pages

SLIDE 20

Expand CARD process

1 compute q1 = Aq1 + Bq2; 2 decompose q1 = ˆ

V x + vα with MGS x = ˆ V Tq1, v = q1 − ˆ V x, α = v, v = v/α

3 update

ˆ V =

ˆ

V , v

, rj+1 = rj + 1,

ˆ R1 = ˆ R1 x α

, ˆ

R2 = ˆ R2 ˆ R2 (:, j + 1)

SLIDE 21

Expand CARD process

ˆ R1 ˆ R2

:=

    ˆ R1 x α ˆ R2 ˆ R1 (:, j + 1)    

4 decompose with MGS:

ˆ R1 (:, j + 2) ˆ R2 (:, j + 2)

ld

= ˆ R1 (:, 1 : j + 1) ˆ R2 (:, 1 : j + 1)

H1:j+1,j+1

+ ˆ R1 (:, j + 2) ˆ R2 (:, j + 2)

new

Hj+2,j+1

SLIDE 22

5 update the current Arnoldi vector q:

q = q1 q2

q1 = ˆ

V ˆ Rj+1,1 [:, j + 2] q2 = ˆ V ˆ Rj+1,2 [:, j + 2] Only GMS (with re-orthogonalization), no inversion CARD is numerically stable!

SLIDE 23

IRA with CARD

Given a CARD of k-order: OP VR1 VR2

=

ˆ V ˆ R1 ˆ V ˆ R2 H βeT

k

IRA does (m − p) QR steps on H with shifts µp+1, ..., µm, i.e.

H = U HUH Then OP VR1 VR2

U =

ˆ V ˆ R1 ˆ V ˆ R2 U ˜ H βeT

k U

Denote

ˆ U = U 1

SLIDE 24

IRA with CARD

Then OP VR1U VR2U

=

ˆ V ˆ R1 ˆ U ˆ V ˆ R2 ˆ U ˜ H βeT

k U

Its first p columns , denoted by

OP VkR1,p VkR2,p

=

Vk+1R1,p+1 Vk+1R2,p+1 Hp ˜ βeT

p

still form an Arnoldi decomposition of order p

However, the Vk has rk (instead of rp) columns, it is not a CARD!

SLIDE 25

IRA with CARD

Since Vk+1R1,p+1 Vk+1R2,p+1

is the orthonormal factor of an Arnoldi decomposition, from

previous theorem, rank[Vk+1R1,p+1, Vk+1R2,p+1] = rank[R1,p+1, R2,p+1] = rp+1 ≤ p + 2, we have a compact SVD: [R1,p+1, R2,p+1] = Prk+1×rp+1Σrp+1×rp+1[G rp+1×(p+1)

1

, G rp+1×(p+1)

2

] ≡ P[R1, R2]

SLIDE 26

IRA with CARD

Therefore,

V n×rk+1

k+1

R1,p+1 Vk+1R2,p+1

=

(Vk+1P)n×rp+1R1 (Vk+1P)R2

The Arnoldi decomposition is expressed in CARD again!

This process can also be implemented by a compact “QR” decomposition, which is similar with the compact Arnoldi decomposition (CARD). Details omitted.

SLIDE 27

POLYAR

POLYAR: modified ARPARK for polynomial eigenvalue problems (not only QEPs)

1 znaitr p: compute CARD 2 znapps p: IRA with CARD;

use LAPACK routine zgesdd to compute SVD decomposition

3 znaupd p, znaupd2 p, zgetv0 p:

slightly revised (arguments, storage)

4 zgemip(added): compute inner product in compact form

SLIDE 28

Outline

1 Background 2 Algorithm 3 Numerical Comparison

SLIDE 29

Example 1: A random QEP

Problem: QEP (pdeg=2) Size: n=500 Environment: PC (EMS memory: 512M) Randomized Matrix M,D,K (each matrix have about 24,000 non-zero elements.) We choose shift σ = 1 and use shift-invert mode to compute eigenvalues close to 1 LU factorization of

Mσ2 + Dσ + K
, L and U contain about

120,000 non-zero elements. To compute 8 eigenvalues close to 1; Use 30 Arnoldi base vectors, says, 31 CARD base vectors.

SLIDE 30

A random QEP

Computed eigenvalues: ARPACK POLYAR Real Imag Real Imag 1

1.02817D+00 1.38768D−01 1.02817D+00 1.38768D-01

2

1.11582D+00 3.61818D-02 1.11582D+00 3.61818D-02

3

1.11582D+00

3.61818D-02

1.11582D+00

3.61818D-02

4

1.05613D+00

1.05380D-01

1.05613D+00

1.05380D-01

5

1.05613D+00 1.05380D-01 1.05613D+00 1.05380D-01

6

9.34692D-01

2.73028D-15

9.34692D-01 4.39496D-15

7

1.00023D+00 5.87804D-02 1.00023D+00 5.87804D-02

8

1.00023D+00

5.87804D-02

1.00023D+00

5.87804D-02

SLIDE 31

A random QEP

Storage comparison: ARPACK POLYAR V 500 × 2 × 30 500 × 31 workd 3 × 2 × 500 (2 + 2) × 500 Resid 500 × 2 500 × 2

SLIDE 32

A random QEP

Iteration and time comparison: ARPACK POLYAR update iteration 4 4 OP × x operations 86 86 reorthogonalization of V 84 84 reorthogonalization of R 84 user’s OP × x operations 1.375000 1.250000 naupd2 1.609375 1.515625 basic Arnoldi iteration loop 1.531250 1.437500 reorthogonalization phrase 0.093750 0.046875 Hessenberg eig subproblem 0.031250 0.031250 applying the shifts 0.046875 0.046875 calling gesdd 0.000000

SLIDE 33

Example 2: A QEP from SLAC

Problem: QEP (pdeg=2) Size: n=5384 Environment: PC (EMS memory: 512M) Genuine Matrix M,D,K (M,D,K have 61425,1183,61425 non-zero elements respectively.) We choose shift σ = −10i and use shift-invert mode to compute eigenvalues close to −10i LU factorization of

Mσ2 + Dσ + K
, L and U have 749610

and 780229 non-zero elements. To compute 8 eigenvalues close to −10i; Use 30 Arnoldi base vectors, says, 31 CARD base vectors.

SLIDE 34

A QEP from SLAC

ARPACK POLYAR Real Imag Real Imag 1

8.17408D-13
1.11248D+01

1.78859D-13

1.11248D+01

2

1.80351D-13
1.10381D+01

1.48935D-13

1.10381D+01

3

3.43204D-12

8.96999D+00

1.85159D-12

8.96999D+00

4

8.92409D-12

1.09810D+01

2.12829D-12

1.09810D+01

5

1.19634D-12

1.07600D+01

2.63738D-13

1.07600D+01

6

1.30082D-13
9.73674D+00
4.44236D-14
9.73674D+00

7

3.83685D-15

9.79517D+00

1.48931D-15

9.79517D+00

8

2.27178D-15

1.00606D+01
3.76020D-15
1.00606D+01

SLIDE 35

A QEP from SLAC

Storage comparison: ARPACK POLYAR V 5000 × 2 × 30 5000 × 31 workd 3 × 2 × 5000 (2 + 2) × 5000 Resid 5000 × 2 5000 × 2

SLIDE 36

A QEP from SLAC

Time comparison: ARPACK POLYAR update iteration 2 2 OP × x operations 48 48 reorthogonalization of V 44 44 reorthogonalization of R 47 user’s OP × x operations 4.078125 4.093750 naupd2 5.343750 5.171875 basic Arnoldi iteration loop 5.203125 5.062500 reorthogonalization phrase 0.593750 0.250000 Hessenberg eig subproblem 0.015625 0.015625 applying the shifts 0.109375 0.078125 calling gesdd 0.000000

SLIDE 37