Computing Sparse Hessians with AD Andrea Walther Institute of - - PowerPoint PPT Presentation
Computing Sparse Hessians with AD Andrea Walther Institute of - - PowerPoint PPT Presentation
Computing Sparse Hessians with AD Andrea Walther Institute of Scientific Computing TU Dresden Hatfield, May 21, 2007 Overview Motivation 1 Computing Sparsity Patterns 2 Evaluating Hessian-matrix Products 3 Numerical Results 4
Overview
1
Motivation
2
Computing Sparsity Patterns
3
Evaluating Hessian-matrix Products
4
Numerical Results
5
Conclusion and Outlook
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 1 / 17
Motivation
Efficient computation of Hessian H(x)
2 3 1
f function P sparsity pattern S seed matrix B = H(x)S calculation
- f entries
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 2 / 17
Motivation
Efficient computation of Hessian H(x)
2 3 1
f function P sparsity pattern S seed matrix B = H(x)S calculation
- f entries
Remarks: Steps
1 and 2 are performed only once
Step
3 evaluated for each x
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 2 / 17
Motivation
Efficient computation of Hessian H(x)
2 3 1
f function P sparsity pattern S seed matrix B = H(x)S calculation
- f entries
Remarks: Steps
1 and 2 are performed only once
Step
3 evaluated for each x
Topics of this talk: Step
1 : Generation of sparsity pattern
Step
3 : Evaluating Hessian-matrix products
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 2 / 17
Motivation
Assumptions: f : Rn → R, y = f(x), twice continuously differentiable function evaluation consists of unary or binary operations
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 3 / 17
Motivation
Assumptions: f : Rn → R, y = f(x), twice continuously differentiable function evaluation consists of unary or binary operations Algorithm I: Function Evaluation for i = 1, . . . , n vi−n = xi for i = 1, . . . , l vi = ϕi(vj)j≺i y = vl with precedence relation j ≺ i: ϕi(vj)j≺i = ϕi(vj)
- r
ϕi(vj)j≺i = ϕi(vj, vl) with j, l < i
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 3 / 17
Computing Sparsity Patterns
Determining the sparsity pattern
Index domains [Griewank 2000]: Xi ≡ {j ≤ n : j − n ≺∗ i} for i = 1 − n, . . . , l One has:
- j ≤ n : ∂vi
∂xj = 0
- ⊆ Xi
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 4 / 17
Computing Sparsity Patterns
Determining the sparsity pattern
Index domains [Griewank 2000]: Xi ≡ {j ≤ n : j − n ≺∗ i} for i = 1 − n, . . . , l One has:
- j ≤ n : ∂vi
∂xj = 0
- ⊆ Xi
For sparse Hessians additionally nonlinear interaction domains
- j ≤ n :
∂2y ∂xi∂xj = 0
- ⊆ Ni
for i = 1, . . . , n.
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 4 / 17
Computing Sparsity Patterns
Algorithm II: Computation of Xi and Ni for i = 1, . . . , n Xi−n ← {i}, Ni ← ∅ for i = 1, . . . , l Xi ←
j≺i Xj
if ϕi nonlinear then if vi = ϕi(vj) then ∀k ∈ Xi : Nk ← Nk ∪ Xi if vi = ϕi(vj, vˆ
) then
if vi linear in vj then ∀k ∈ Xj : Nk ← Nk ∪ Xˆ
else ∀k ∈ Xj : Nk ← Nk ∪ Xi if vi linear in vˆ
then ∀k ∈ Xˆ : Nk ← Nk ∪ Xj
else ∀k ∈ Xˆ
: Nk ← Nk ∪ Xi
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 5 / 17
Computing Sparsity Patterns
Algorithm II: Computation of Xi and Ni for i = 1, . . . , n Xi−n ← {i}, Ni ← ∅ for i = 1, . . . , l Xi ←
j≺i Xj
if ϕi nonlinear then if vi = ϕi(vj) then ∀k ∈ Xi : Nk ← Nk ∪ Xi if vi = ϕi(vj, vˆ
) then
if vi linear in vj then ∀k ∈ Xj : Nk ← Nk ∪ Xˆ
else ∀k ∈ Xj : Nk ← Nk ∪ Xi if vi linear in vˆ
then ∀k ∈ Xˆ : Nk ← Nk ∪ Xj
else ∀k ∈ Xˆ
: Nk ← Nk ∪ Xi
Complexity ??
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 5 / 17
Computing Sparsity Patterns
Algorithm II: Computation of Xi and Ni for i = 1, . . . , n Xi−n ← {i}, Ni ← ∅ for i = 1, . . . , l Xi ←
j≺i Xj
(1) if ϕi nonlinear then if vi = ϕi(vj) then ∀k ∈ Xi : Nk ← Nk ∪ Xi (2) if vi = ϕi(vj, vˆ
) then
if vi linear in vj then ∀k ∈ Xj : Nk ← Nk ∪ Xˆ
(3) else ∀k ∈ Xj : Nk ← Nk ∪ Xi (4) if vi linear in vˆ
then ∀k ∈ Xˆ : Nk ← Nk ∪ Xj
(5) else ∀k ∈ Xˆ
: Nk ← Nk ∪ Xi
(6) Complexity ??
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 5 / 17
Computing Sparsity Patterns
Theorem (Complexity result for Algorithm II)
Let OPS(NID) denote the number of operations needed to generate all Ni, 1 ≤ i ≤ n by Algorithm II. Then, the inequality OPS
- NID
- ≤ 6(1 + ˆ
n)
l
- i=1
¯ ni is valid, where l is the number of elemental functions evaluated to compute the function value, ¯ ni = |Xi|, and ˆ n = max1≤i≤n |Ni|.
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 6 / 17
Computing Sparsity Patterns
Theorem (Complexity result for Algorithm II)
Let OPS(NID) denote the number of operations needed to generate all Ni, 1 ≤ i ≤ n by Algorithm II. Then, the inequality OPS
- NID
- ≤ 6(1 + ˆ
n)
l
- i=1
¯ ni is valid, where l is the number of elemental functions evaluated to compute the function value, ¯ ni = |Xi|, and ˆ n = max1≤i≤n |Ni|. Proof: Analyse set operations: (1): OPS(Xi =
j≺i Xj) ≤ 3¯
ni (2): OPS(∀k ∈ Xi : Nk = Nk ∪ Xi) ≤ ¯ ni(2ˆ n + ˆ n) = 3ˆ n¯ ni (3) - (6): same as (2) Details see [Walther 2007]
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 6 / 17
Evaluating Hessian-matrix Products
Second order adjoints computed with AD
y = f(x) reverse diff ¯ x = ¯ y⊤f ′(x) forward diff ˙ ¯ x = ¯ y⊤f ′′(x) ˙ x + ˙ ¯ y⊤f ′(x)
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 7 / 17
Evaluating Hessian-matrix Products
Second order adjoints computed with AD
y = f(x) reverse diff ¯ x = ¯ y⊤f ′(x) forward diff ˙ ¯ x = ¯ y⊤f ′′(x) ˙ x + ˙ ¯ y⊤f ′(x) Complexity [Griewank 2000]: TIME(H(x) ˙ x) ≤ ωsoadTIME(f(x)) with ωsoad ∈ [7, 10] TIME(H(x)S) ≤ ωsoad p TIME(f(x)) with ωsoad ∈ [7, 10]. Reduction possible ??
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 7 / 17
Evaluating Hessian-matrix Products
Computation of H(x) ˙ x ∈ Rn
for i = 1, . . . , n vi−n = xi, ˙ vi−n = ˙ xi, ¯ vi−n = 0, ˙ ¯ vi−n = 0 for i = 1, . . . , l vi = ϕi(vj)j≺i, ˙ vi =
- j≺i
∂ ∂vj ϕi(vj)j≺i ˙ vj, y = vl, ˙ y = ˙ vl, ¯ vl = ¯ y for i = l, . . . , 1 ¯ vj += ¯ vi ∂ ∂vj ϕi(vj)j≺i for j ≺ i ˙ ¯ vj += ¯ vi
- k≺i
∂2 ∂vj∂vk ϕi(vj)j≺i ˙ vk for j ≺ i for i = 1, . . . , n ¯ xi = ¯ vi−n, ˙ ¯ xi = ˙ ¯ vi−n
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 8 / 17
Evaluating Hessian-matrix Products
Computation of H(x) ˙ x ∈ Rn
for i = 1, . . . , n vi−n = xi, ˙ vi−n = ˙ xi, ¯ vi−n = 0, ˙ ¯ vi−n = 0 for i = 1, . . . , l vi = ϕi(vj)j≺i, ˙ vi =
- j≺i
∂ ∂vj ϕi(vj)j≺i ˙ vj, y = vl, ˙ y = ˙ vl, ¯ vl = ¯ y for i = l, . . . , 1 ¯ vj += ¯ vi ∂ ∂vj ϕi(vj)j≺i for j ≺ i ˙ ¯ vj += ¯ vi
- k≺i
∂2 ∂vj∂vk ϕi(vj)j≺i ˙ vk for j ≺ i for i = 1, . . . , n ¯ xi = ¯ vi−n, ˙ ¯ xi = ˙ ¯ vi−n
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 8 / 17
Evaluating Hessian-matrix Products
Computation of H(x) ˙ X ∈ Rn×p
for i = 1, . . . , n vi−n = xi, ˙ Vi−n = ˙ Xi, ¯ vi−n = 0, ˙ ¯ Vi−n = 0 for i = 1, . . . , l vi = ϕi(vj)j≺i, ˙ Vi =
- j≺i
∂ ∂vj ϕi(vj)j≺i ˙ Vj, y = vl, ˙ Y = ˙ Vl, ¯ vl = ¯ y for i = l, . . . , 1 ¯ vj += ¯ vi ∂ ∂vj ϕi(vj)j≺i for j ≺ i ˙ ¯ Vj += ¯ vi
- k≺i
∂2 ∂vj∂vk ϕi(vj)j≺i ˙ Vk for j ≺ i for i = 1, . . . , n ¯ xi = ¯ vi−n, ˙ ¯ Xi = ˙ ¯ Vi−n
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 8 / 17
Evaluating Hessian-matrix Products
Complexity ?
TIME( ˙ ¯ X) ≤ ωsoadpTIME(f(x)) with ωsoadp ∈ [4 + 3p, 4 + 6p].
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 9 / 17
Evaluating Hessian-matrix Products
Complexity ?
TIME( ˙ ¯ X) ≤ ωsoadpTIME(f(x)) with ωsoadp ∈ [4 + 3p, 4 + 6p]. Reduction of run time complexity: From [7p, 10p] to [4 + 3p, 4 + 6p] due to reduced number of recalculations. For details see [Walther 2007]
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 9 / 17
Numerical Results
Implementation
New drivers for AD-tool ADOL-C hess_pat(tag, n, x, P, option);
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 10 / 17
Numerical Results
Implementation
New drivers for AD-tool ADOL-C hess_pat(tag, n, x, P, option); generate_seed_hess(n, P, S, p);
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 10 / 17
Numerical Results
Implementation
New drivers for AD-tool ADOL-C hess_pat(tag, n, x, P, option); generate_seed_hess(n, P, S, p); hess_mat(tag, n, p, x, S, B);
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 10 / 17
Numerical Results
Implementation
New drivers for AD-tool ADOL-C hess_pat(tag, n, x, P, option); generate_seed_hess(n, P, S, p); hess_mat(tag, n, p, x, S, B); sparse_hess(tag, n, repeat, x, nnz, r_ind, c_ind, H_val); sparse_jac(tag, n, repeat, x, nnz, r_ind, c_ind, J_val);
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 10 / 17
Numerical Results
Implementation
New drivers for AD-tool ADOL-C hess_pat(tag, n, x, P, option); generate_seed_hess(n, P, S, p); hess_mat(tag, n, p, x, S, B); sparse_hess(tag, n, repeat, x, nnz, r_ind, c_ind, H_val); sparse_jac(tag, n, repeat, x, nnz, r_ind, c_ind, J_val); Performance of new drivers?? Tests with problems from the CUTE test set collection
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 10 / 17
Numerical Results
Computation of Sparsity Patterns
OPS
- NID
- ≤ 6(1 + ˆ
n)
l
- i=1
¯ ni ⇒ TIME
- hess_pat(...)
- (1 + ˆ
n)TIME(f)
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 5 10 15 20 25 30 35 dimension run−time ratio morebv lminsurf chainwoo broydnbd
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 11 / 17
Numerical Results
Varying the number of nonzeros I
2000 4000 6000 8000 10000 5 10 15 dimension run−time ratio chainwoo, nnz = 4 chainwoo, nnz = 7 chainwoo, nnz = 9 chainwoo, nnz = 11
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 12 / 17
Numerical Results
Varying the number of nonzeros II
2000 4000 6000 8000 10000 5 10 15 20 25 30 35 40 dimension run−time ratio morebv, nnz = 5 morebv, nnz = 7 morebv, nnz = 9 morebv, nnz = 11
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 13 / 17
Numerical Results
Computation of Hessian-matrix products I
TIME(H(x)S) TIME(f(x)) ≤ ωsoad p ≤ 10p, TIME( ˙ ¯ X) TIME(f(x)) ≤ ωsoadp ≤ 4 + 6p
1 2 3 4 5 6 50 100 150 200 # vectors / columns run−time ratio (dtoc2) Hessian−vector products Hessian−matrix products
slopes scalar soad vector soad dtoc2 28.6 13.2
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 14 / 17
Numerical Results
Computation of Hessian-matrix products II
TIME(H(x)S) TIME(f(x)) ≤ ωsoad p ≤ 10p, TIME( ˙ ¯ X) TIME(f(x)) ≤ ωsoadp ≤ 4 + 6p,
10 20 30 40 50 200 400 600 800 # vectors / columns run−time ratio (eigena2) Hessian−vector products Hessian−matrix products
slopes scalar soad vector soad dtoc2 16.5 7.5
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 15 / 17
Numerical Results
Optimization of power-distribution network
provided by Fabrice Zaoui at RTE Number of Variables xj is 22 5 40 Number of Equality Constraints hj = 0 is 11270 Number of Inequality Constraints gj ≥ 0 is 15446
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 16 / 17
Numerical Results
Optimization of power-distribution network
provided by Fabrice Zaoui at RTE Number of Variables xj is 22 5 40 Number of Equality Constraints hj = 0 is 11270 Number of Inequality Constraints gj ≥ 0 is 15446 Jacobians h′ and g′ evaluated by compression number of differentiation variables are 28 and 10, respect. Hessian of Lagrangian is evaluated by compression too.
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 16 / 17
Numerical Results
Optimization of power-distribution network
provided by Fabrice Zaoui at RTE Number of Variables xj is 22 5 40 Number of Equality Constraints hj = 0 is 11270 Number of Inequality Constraints gj ≥ 0 is 15446 Jacobians h′ and g′ evaluated by compression number of differentiation variables are 28 and 10, respect. Hessian of Lagrangian is evaluated by compression too. Solution by interior point method takes 1 min and 15 seconds. Sparse linear algebra (MA27) less costly than evalutions. In simple cases handcoded derivatives ten times faster.
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 16 / 17
Conclusion and Outlook
Conclusion and Outlook
Efficient calculation of sparse Hessians using
Sparsity patterns available for O((1 + ˆ n)TIME(f)) coloring algorithms Hessian-matrix products complexity significantly reduced
Numerical tests using ADOL-C confirm theoretical results TIME(Sparsity pattern) = O(ˆ n) using reverse mode? Substitution-based methods for generation of seed matrix?
Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 17 / 17