SLIDE 1 A compact Arnoldi algorithm for polynomial eigenvalue problems
Yangfeng Su, Junyi Zhang, Zhaojun Bai
School of Mathematical Sciences Fudan University
January, 2008, Taiwan RANMEP2008
SLIDE 2 Goals
Polynomial Eigenvalue Problems (PEPs):
- A0 + λA1 + ... + λdAd
- x = 0
(λ, x) is called an eigenpair. d=1, SEP,GEP d=2, QEP ( Quadratic Eigenvalue Problem) Goals Solving large scale polynomial eigenvalue problems with Implicitly Restarted Arnoldi (IRA) algorithm with less memory.
SLIDE 3 Goals
Polynomial Eigenvalue Problems (PEPs):
- A0 + λA1 + ... + λdAd
- x = 0
(λ, x) is called an eigenpair. d=1, SEP,GEP d=2, QEP ( Quadratic Eigenvalue Problem) Goals Solving large scale polynomial eigenvalue problems with Implicitly Restarted Arnoldi (IRA) algorithm with less memory.
SLIDE 4 Outline
1 Background 2 Algorithm 3 Numerical Comparison
SLIDE 5 Outline
1 Background 2 Algorithm 3 Numerical Comparison
SLIDE 6
Linearization
Linearization ⇒ A larger GEP: A0 I ... I − λ −A1 −A2 . . . −Ad I ... I x λx . . . λd−1x = 0 Sleijpen, van der Vorst, and van Gijzen. Quadratic eigenproblems are no problem, SIAM News, 1996 Can be solved with ARPACK Not true: structure is neither used nor preserved. size and therefore memory are d times! This work: explore the structure to save memory in ARPACK. It is a byproduct of SOAR [Bai & S. SIMAX 2005].
SLIDE 7
Linearization
Linearization ⇒ A larger GEP: A0 I ... I − λ −A1 −A2 . . . −Ad I ... I x λx . . . λd−1x = 0 Sleijpen, van der Vorst, and van Gijzen. Quadratic eigenproblems are no problem, SIAM News, 1996 Can be solved with ARPACK Not true: structure is neither used nor preserved. size and therefore memory are d times! This work: explore the structure to save memory in ARPACK. It is a byproduct of SOAR [Bai & S. SIMAX 2005].
SLIDE 8
Linearization
Linearization ⇒ A larger GEP: A0 I ... I − λ −A1 −A2 . . . −Ad I ... I x λx . . . λd−1x = 0 Sleijpen, van der Vorst, and van Gijzen. Quadratic eigenproblems are no problem, SIAM News, 1996 Can be solved with ARPACK Not true: structure is neither used nor preserved. size and therefore memory are d times! This work: explore the structure to save memory in ARPACK. It is a byproduct of SOAR [Bai & S. SIMAX 2005].
SLIDE 9
Linearization
Linearization ⇒ A larger GEP: A0 I ... I − λ −A1 −A2 . . . −Ad I ... I x λx . . . λd−1x = 0 Sleijpen, van der Vorst, and van Gijzen. Quadratic eigenproblems are no problem, SIAM News, 1996 Can be solved with ARPACK Not true: structure is neither used nor preserved. size and therefore memory are d times! This work: explore the structure to save memory in ARPACK. It is a byproduct of SOAR [Bai & S. SIMAX 2005].
SLIDE 10 Arnoldi decomposition
An Arnoldi decomposition of order-j: OPQj = QjHj + hj+1,jqj+1eT
j
where OP: a matrix Qj = [q1, q2, . . . , qj]: orthonormal Hj: upper Hessenberg Arnoldi process is used to compute the Arnoldi decomposition with numerical stability
SLIDE 11 Implicitly Restarted Arnoldi (IRA) for SEP
Sorensen [SIMAX92] Given an Arnoldi decomposition of order-p, OPQp = QpHp + βpqp+1eT
p .
1 Extend Arnoldi decomposition from order p to order k:
AQk = QkHk + βkqk+1eT
k .
2 Divide the eigenvalues of Hk as “good” ones µ1, . . . , µp and
“bad” ones µp+1, . . . , µk.
3 For Hk do implicitly QR steps with shifts µk+1, . . . , µk, get
Hk = U ˜ HkUH
4 Take first p columns of
AQkU = QkU ˜ Hk + βkqk+1eT
k U
as a restarted Arnoldi decomposition of order p.
SLIDE 12
ARPACK
ARPACK is an implementation of IRA algorithm a well-coded, well-documented package produced by Lehoucq, Sorensen and Yang during 1992-1997 used in MATLAB as eigs and arpackc
SLIDE 13 IRA for QEP
For simplicity, we only discuss QEPs. For QEP: (λ2M + λD + K)x = 0
1 shift-and-invert: for shift σ, let λ = σ + 1/µ
(µ2I − µA − B)x = 0 where A = −(σ2M + σD + K)−1(2σM + D) B = −(σ2M + σD + K)−1M
2 linearize
A B I µx x
µx x
SLIDE 14
IRA for QEP
Easy use of ARPACK How to utilize the Frobenius structure to save memory?
SLIDE 15 Outline
1 Background 2 Algorithm 3 Numerical Comparison
SLIDE 16 Arnoldi decomposition for QEP
An Arnoldi decomposition with order-j OPQj = Qj+1 Hj For QEP: A B I Qj,1 Qj,2
Qj+1,1 Qj+1,2
Since Qj,1 = Qj+1,2 Hj we have Theorem rank ([Qj,1, Qj,2]) ≡ rj ≤ j + 1 Observed by many people, e.g. Meerbergen [SIMAX06], Bai & S. [SIMAX05]. The key is how to use it with numerical stability.
SLIDE 17 Arnoldi Decomposition for QEP
Theorem rank ([Qj,1, Qj,2]) ≡ rj ≤ j + 1 Let Vj ∈ C n×rj = orth[Qj,1, Qj,2] then Qj = Qj,1 Qj,2
VjRj,1 VjRj,2
Vj Vj Rj,1 Rj,2
- Two levels of orthonormality:
Vj is orthonormal Rj,1 Rj,2
SLIDE 18 Compact ARnoldi Decomposition (CARD)
Compact ARnoldi Decomposition (CARD) OP Vj Vj Rj,1 Rj,2
Vj+1 Vj+1 Rj+1,1 Rj+1,2
Vj ∈ C n×rj Rj ∈ C rj×j j ≤ rj+1 ≤ j + 1 Memory cost: Arnoldi: 2n(j + 1) ( for PEPs: dn(j + 1)) CARD: nrj+1 ≤ n(j + 2) (for PEPs: ≤ n(j + d + 1))
SLIDE 19 CARD process
CARD process is to compute the CARD with numerical stability! CARD of order j: OP VR1 VR2
VR1 VR2
ˆ V ˆ R1 ˆ V ˆ R2
H Expand it to a CARD of order j + 1 ⇒ next two pages
SLIDE 20 Expand CARD process
1 compute q1 = Aq1 + Bq2; 2 decompose q1 = ˆ
V x + vα with MGS x = ˆ V Tq1, v = q1 − ˆ V x, α = v, v = v/α
3 update
ˆ V =
V , v
ˆ R1 = ˆ R1 x α
R2 = ˆ R2 ˆ R2 (:, j + 1)
SLIDE 21 Expand CARD process
ˆ R1 ˆ R2
ˆ R1 x α ˆ R2 ˆ R1 (:, j + 1)
4 decompose with MGS:
ˆ R1 (:, j + 2) ˆ R2 (:, j + 2)
= ˆ R1 (:, 1 : j + 1) ˆ R2 (:, 1 : j + 1)
+ ˆ R1 (:, j + 2) ˆ R2 (:, j + 2)
Hj+2,j+1
SLIDE 22 5 update the current Arnoldi vector q:
q = q1 q2
V ˆ Rj+1,1 [:, j + 2] q2 = ˆ V ˆ Rj+1,2 [:, j + 2] Only GMS (with re-orthogonalization), no inversion CARD is numerically stable!
SLIDE 23 IRA with CARD
Given a CARD of k-order: OP VR1 VR2
ˆ V ˆ R1 ˆ V ˆ R2 H βeT
k
- IRA does (m − p) QR steps on H with shifts µp+1, ..., µm, i.e.
H = U HUH Then OP VR1 VR2
ˆ V ˆ R1 ˆ V ˆ R2 U ˜ H βeT
k U
ˆ U = U 1
SLIDE 24 IRA with CARD
Then OP VR1U VR2U
ˆ V ˆ R1 ˆ U ˆ V ˆ R2 ˆ U ˜ H βeT
k U
- Its first p columns , denoted by
OP VkR1,p VkR2,p
Vk+1R1,p+1 Vk+1R2,p+1 Hp ˜ βeT
p
- still form an Arnoldi decomposition of order p
However, the Vk has rk (instead of rp) columns, it is not a CARD!
SLIDE 25 IRA with CARD
Since Vk+1R1,p+1 Vk+1R2,p+1
- is the orthonormal factor of an Arnoldi decomposition, from
previous theorem, rank[Vk+1R1,p+1, Vk+1R2,p+1] = rank[R1,p+1, R2,p+1] = rp+1 ≤ p + 2, we have a compact SVD: [R1,p+1, R2,p+1] = Prk+1×rp+1Σrp+1×rp+1[G rp+1×(p+1)
1
, G rp+1×(p+1)
2
] ≡ P[R1, R2]
SLIDE 26 IRA with CARD
Therefore,
k+1
R1,p+1 Vk+1R2,p+1
(Vk+1P)n×rp+1R1 (Vk+1P)R2
- The Arnoldi decomposition is expressed in CARD again!
This process can also be implemented by a compact “QR” decomposition, which is similar with the compact Arnoldi decomposition (CARD). Details omitted.
SLIDE 27 POLYAR
POLYAR: modified ARPARK for polynomial eigenvalue problems (not only QEPs)
1 znaitr p: compute CARD 2 znapps p: IRA with CARD;
use LAPACK routine zgesdd to compute SVD decomposition
3 znaupd p, znaupd2 p, zgetv0 p:
slightly revised (arguments, storage)
4 zgemip(added): compute inner product in compact form
SLIDE 28 Outline
1 Background 2 Algorithm 3 Numerical Comparison
SLIDE 29 Example 1: A random QEP
Problem: QEP (pdeg=2) Size: n=500 Environment: PC (EMS memory: 512M) Randomized Matrix M,D,K (each matrix have about 24,000 non-zero elements.) We choose shift σ = 1 and use shift-invert mode to compute eigenvalues close to 1 LU factorization of
- Mσ2 + Dσ + K
- , L and U contain about
120,000 non-zero elements. To compute 8 eigenvalues close to 1; Use 30 Arnoldi base vectors, says, 31 CARD base vectors.
SLIDE 30 A random QEP
Computed eigenvalues: ARPACK POLYAR Real Imag Real Imag 1
1.02817D+00 1.38768D−01 1.02817D+00 1.38768D-01
2
1.11582D+00 3.61818D-02 1.11582D+00 3.61818D-02
3
1.11582D+00
1.11582D+00
4
1.05613D+00
1.05613D+00
5
1.05613D+00 1.05380D-01 1.05613D+00 1.05380D-01
6
9.34692D-01
9.34692D-01 4.39496D-15
7
1.00023D+00 5.87804D-02 1.00023D+00 5.87804D-02
8
1.00023D+00
1.00023D+00
SLIDE 31
A random QEP
Storage comparison: ARPACK POLYAR V 500 × 2 × 30 500 × 31 workd 3 × 2 × 500 (2 + 2) × 500 Resid 500 × 2 500 × 2
SLIDE 32
A random QEP
Iteration and time comparison: ARPACK POLYAR update iteration 4 4 OP × x operations 86 86 reorthogonalization of V 84 84 reorthogonalization of R 84 user’s OP × x operations 1.375000 1.250000 naupd2 1.609375 1.515625 basic Arnoldi iteration loop 1.531250 1.437500 reorthogonalization phrase 0.093750 0.046875 Hessenberg eig subproblem 0.031250 0.031250 applying the shifts 0.046875 0.046875 calling gesdd 0.000000
SLIDE 33 Example 2: A QEP from SLAC
Problem: QEP (pdeg=2) Size: n=5384 Environment: PC (EMS memory: 512M) Genuine Matrix M,D,K (M,D,K have 61425,1183,61425 non-zero elements respectively.) We choose shift σ = −10i and use shift-invert mode to compute eigenvalues close to −10i LU factorization of
- Mσ2 + Dσ + K
- , L and U have 749610
and 780229 non-zero elements. To compute 8 eigenvalues close to −10i; Use 30 Arnoldi base vectors, says, 31 CARD base vectors.
SLIDE 34 A QEP from SLAC
ARPACK POLYAR Real Imag Real Imag 1
1.78859D-13
2
1.48935D-13
3
3.43204D-12
1.85159D-12
4
8.92409D-12
2.12829D-12
5
1.19634D-12
2.63738D-13
6
- 1.30082D-13
- 9.73674D+00
- 4.44236D-14
- 9.73674D+00
7
3.83685D-15
1.48931D-15
8
2.27178D-15
- 1.00606D+01
- 3.76020D-15
- 1.00606D+01
SLIDE 35
A QEP from SLAC
Storage comparison: ARPACK POLYAR V 5000 × 2 × 30 5000 × 31 workd 3 × 2 × 5000 (2 + 2) × 5000 Resid 5000 × 2 5000 × 2
SLIDE 36
A QEP from SLAC
Time comparison: ARPACK POLYAR update iteration 2 2 OP × x operations 48 48 reorthogonalization of V 44 44 reorthogonalization of R 47 user’s OP × x operations 4.078125 4.093750 naupd2 5.343750 5.171875 basic Arnoldi iteration loop 5.203125 5.062500 reorthogonalization phrase 0.593750 0.250000 Hessenberg eig subproblem 0.015625 0.015625 applying the shifts 0.109375 0.078125 calling gesdd 0.000000
SLIDE 37
Thank you for your attention!