[PPT] - Computing Sparse Hessians with AD Andrea Walther Institute of PowerPoint Presentation

SLIDE 1

Computing Sparse Hessians with AD

Andrea Walther Institute of Scientific Computing TU Dresden Hatfield, May 21, 2007

SLIDE 2

Overview

1

Motivation

2

Computing Sparsity Patterns

3

Evaluating Hessian-matrix Products

4

Numerical Results

5

Conclusion and Outlook

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 1 / 17

SLIDE 3

Motivation

Efficient computation of Hessian H(x)

2 3 1

f function P sparsity pattern S seed matrix B = H(x)S calculation

f entries

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 2 / 17

SLIDE 4

Motivation

Efficient computation of Hessian H(x)

2 3 1

f function P sparsity pattern S seed matrix B = H(x)S calculation

f entries

Remarks: Steps

1 and 2 are performed only once

Step

3 evaluated for each x

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 2 / 17

SLIDE 5

Motivation

Efficient computation of Hessian H(x)

2 3 1

f function P sparsity pattern S seed matrix B = H(x)S calculation

f entries

Remarks: Steps

1 and 2 are performed only once

Step

3 evaluated for each x

Topics of this talk: Step

1 : Generation of sparsity pattern

Step

3 : Evaluating Hessian-matrix products

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 2 / 17

SLIDE 6

Motivation

Assumptions: f : Rn → R, y = f(x), twice continuously differentiable function evaluation consists of unary or binary operations

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 3 / 17

SLIDE 7

Motivation

Assumptions: f : Rn → R, y = f(x), twice continuously differentiable function evaluation consists of unary or binary operations Algorithm I: Function Evaluation for i = 1, . . . , n vi−n = xi for i = 1, . . . , l vi = ϕi(vj)j≺i y = vl with precedence relation j ≺ i: ϕi(vj)j≺i = ϕi(vj)

r

ϕi(vj)j≺i = ϕi(vj, vl) with j, l < i

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 3 / 17

SLIDE 8

Computing Sparsity Patterns

Determining the sparsity pattern

Index domains [Griewank 2000]: Xi ≡ {j ≤ n : j − n ≺∗ i} for i = 1 − n, . . . , l One has:

j ≤ n : ∂vi

∂xj = 0

⊆ Xi

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 4 / 17

SLIDE 9

Computing Sparsity Patterns

Determining the sparsity pattern

Index domains [Griewank 2000]: Xi ≡ {j ≤ n : j − n ≺∗ i} for i = 1 − n, . . . , l One has:

j ≤ n : ∂vi

∂xj = 0

⊆ Xi

For sparse Hessians additionally nonlinear interaction domains

j ≤ n :

∂2y ∂xi∂xj = 0

⊆ Ni

for i = 1, . . . , n.

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 4 / 17

SLIDE 10

Computing Sparsity Patterns

Algorithm II: Computation of Xi and Ni for i = 1, . . . , n Xi−n ← {i}, Ni ← ∅ for i = 1, . . . , l Xi ←

j≺i Xj

if ϕi nonlinear then if vi = ϕi(vj) then ∀k ∈ Xi : Nk ← Nk ∪ Xi if vi = ϕi(vj, vˆ

) then

if vi linear in vj then ∀k ∈ Xj : Nk ← Nk ∪ Xˆ



else ∀k ∈ Xj : Nk ← Nk ∪ Xi if vi linear in vˆ

 then ∀k ∈ Xˆ  : Nk ← Nk ∪ Xj

else ∀k ∈ Xˆ

 : Nk ← Nk ∪ Xi

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 5 / 17

SLIDE 11

Computing Sparsity Patterns

Algorithm II: Computation of Xi and Ni for i = 1, . . . , n Xi−n ← {i}, Ni ← ∅ for i = 1, . . . , l Xi ←

j≺i Xj

if ϕi nonlinear then if vi = ϕi(vj) then ∀k ∈ Xi : Nk ← Nk ∪ Xi if vi = ϕi(vj, vˆ

) then

if vi linear in vj then ∀k ∈ Xj : Nk ← Nk ∪ Xˆ



else ∀k ∈ Xj : Nk ← Nk ∪ Xi if vi linear in vˆ

 then ∀k ∈ Xˆ  : Nk ← Nk ∪ Xj

else ∀k ∈ Xˆ

 : Nk ← Nk ∪ Xi

Complexity ??

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 5 / 17

SLIDE 12

Computing Sparsity Patterns

Algorithm II: Computation of Xi and Ni for i = 1, . . . , n Xi−n ← {i}, Ni ← ∅ for i = 1, . . . , l Xi ←

j≺i Xj

(1) if ϕi nonlinear then if vi = ϕi(vj) then ∀k ∈ Xi : Nk ← Nk ∪ Xi (2) if vi = ϕi(vj, vˆ

) then

if vi linear in vj then ∀k ∈ Xj : Nk ← Nk ∪ Xˆ



(3) else ∀k ∈ Xj : Nk ← Nk ∪ Xi (4) if vi linear in vˆ

 then ∀k ∈ Xˆ  : Nk ← Nk ∪ Xj

(5) else ∀k ∈ Xˆ

 : Nk ← Nk ∪ Xi

(6) Complexity ??

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 5 / 17

SLIDE 13

Computing Sparsity Patterns

Theorem (Complexity result for Algorithm II)

Let OPS(NID) denote the number of operations needed to generate all Ni, 1 ≤ i ≤ n by Algorithm II. Then, the inequality OPS

NID
≤ 6(1 + ˆ

n)

l

i=1

¯ ni is valid, where l is the number of elemental functions evaluated to compute the function value, ¯ ni = |Xi|, and ˆ n = max1≤i≤n |Ni|.

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 6 / 17

SLIDE 14

Computing Sparsity Patterns

Theorem (Complexity result for Algorithm II)

Let OPS(NID) denote the number of operations needed to generate all Ni, 1 ≤ i ≤ n by Algorithm II. Then, the inequality OPS

NID
≤ 6(1 + ˆ

n)

l

i=1

¯ ni is valid, where l is the number of elemental functions evaluated to compute the function value, ¯ ni = |Xi|, and ˆ n = max1≤i≤n |Ni|. Proof: Analyse set operations: (1): OPS(Xi =

j≺i Xj) ≤ 3¯

ni (2): OPS(∀k ∈ Xi : Nk = Nk ∪ Xi) ≤ ¯ ni(2ˆ n + ˆ n) = 3ˆ n¯ ni (3) - (6): same as (2) Details see [Walther 2007]

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 6 / 17

SLIDE 15

Evaluating Hessian-matrix Products

Second order adjoints computed with AD

y = f(x) reverse diff ¯ x = ¯ y⊤f ′(x) forward diff ˙ ¯ x = ¯ y⊤f ′′(x) ˙ x + ˙ ¯ y⊤f ′(x)

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 7 / 17

SLIDE 16

Evaluating Hessian-matrix Products

Second order adjoints computed with AD

y = f(x) reverse diff ¯ x = ¯ y⊤f ′(x) forward diff ˙ ¯ x = ¯ y⊤f ′′(x) ˙ x + ˙ ¯ y⊤f ′(x) Complexity [Griewank 2000]: TIME(H(x) ˙ x) ≤ ωsoadTIME(f(x)) with ωsoad ∈ [7, 10] TIME(H(x)S) ≤ ωsoad p TIME(f(x)) with ωsoad ∈ [7, 10]. Reduction possible ??

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 7 / 17

SLIDE 17

Evaluating Hessian-matrix Products

Computation of H(x) ˙ x ∈ Rn

for i = 1, . . . , n vi−n = xi, ˙ vi−n = ˙ xi, ¯ vi−n = 0, ˙ ¯ vi−n = 0 for i = 1, . . . , l vi = ϕi(vj)j≺i, ˙ vi =

j≺i

∂ ∂vj ϕi(vj)j≺i ˙ vj, y = vl, ˙ y = ˙ vl, ¯ vl = ¯ y for i = l, . . . , 1 ¯ vj += ¯ vi ∂ ∂vj ϕi(vj)j≺i for j ≺ i ˙ ¯ vj += ¯ vi

k≺i

∂2 ∂vj∂vk ϕi(vj)j≺i ˙ vk for j ≺ i for i = 1, . . . , n ¯ xi = ¯ vi−n, ˙ ¯ xi = ˙ ¯ vi−n

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 8 / 17

SLIDE 18

Evaluating Hessian-matrix Products

Computation of H(x) ˙ x ∈ Rn

for i = 1, . . . , n vi−n = xi, ˙ vi−n = ˙ xi, ¯ vi−n = 0, ˙ ¯ vi−n = 0 for i = 1, . . . , l vi = ϕi(vj)j≺i, ˙ vi =

j≺i

∂ ∂vj ϕi(vj)j≺i ˙ vj, y = vl, ˙ y = ˙ vl, ¯ vl = ¯ y for i = l, . . . , 1 ¯ vj += ¯ vi ∂ ∂vj ϕi(vj)j≺i for j ≺ i ˙ ¯ vj += ¯ vi

k≺i

∂2 ∂vj∂vk ϕi(vj)j≺i ˙ vk for j ≺ i for i = 1, . . . , n ¯ xi = ¯ vi−n, ˙ ¯ xi = ˙ ¯ vi−n

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 8 / 17

SLIDE 19

Evaluating Hessian-matrix Products

Computation of H(x) ˙ X ∈ Rn×p

for i = 1, . . . , n vi−n = xi, ˙ Vi−n = ˙ Xi, ¯ vi−n = 0, ˙ ¯ Vi−n = 0 for i = 1, . . . , l vi = ϕi(vj)j≺i, ˙ Vi =

j≺i

∂ ∂vj ϕi(vj)j≺i ˙ Vj, y = vl, ˙ Y = ˙ Vl, ¯ vl = ¯ y for i = l, . . . , 1 ¯ vj += ¯ vi ∂ ∂vj ϕi(vj)j≺i for j ≺ i ˙ ¯ Vj += ¯ vi

k≺i

∂2 ∂vj∂vk ϕi(vj)j≺i ˙ Vk for j ≺ i for i = 1, . . . , n ¯ xi = ¯ vi−n, ˙ ¯ Xi = ˙ ¯ Vi−n

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 8 / 17

SLIDE 20

Evaluating Hessian-matrix Products

Complexity ?

TIME( ˙ ¯ X) ≤ ωsoadpTIME(f(x)) with ωsoadp ∈ [4 + 3p, 4 + 6p].

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 9 / 17

SLIDE 21

Evaluating Hessian-matrix Products

Complexity ?

TIME( ˙ ¯ X) ≤ ωsoadpTIME(f(x)) with ωsoadp ∈ [4 + 3p, 4 + 6p]. Reduction of run time complexity: From [7p, 10p] to [4 + 3p, 4 + 6p] due to reduced number of recalculations. For details see [Walther 2007]

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 9 / 17

SLIDE 22

Numerical Results

Implementation

New drivers for AD-tool ADOL-C hess_pat(tag, n, x, P, option);

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 10 / 17

SLIDE 23

Numerical Results

Implementation

New drivers for AD-tool ADOL-C hess_pat(tag, n, x, P, option); generate_seed_hess(n, P, S, p);

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 10 / 17

SLIDE 24

Numerical Results

Implementation

New drivers for AD-tool ADOL-C hess_pat(tag, n, x, P, option); generate_seed_hess(n, P, S, p); hess_mat(tag, n, p, x, S, B);

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 10 / 17

SLIDE 25

Numerical Results

Implementation

New drivers for AD-tool ADOL-C hess_pat(tag, n, x, P, option); generate_seed_hess(n, P, S, p); hess_mat(tag, n, p, x, S, B); sparse_hess(tag, n, repeat, x, nnz, r_ind, c_ind, H_val); sparse_jac(tag, n, repeat, x, nnz, r_ind, c_ind, J_val);

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 10 / 17

SLIDE 26

Numerical Results

Implementation

New drivers for AD-tool ADOL-C hess_pat(tag, n, x, P, option); generate_seed_hess(n, P, S, p); hess_mat(tag, n, p, x, S, B); sparse_hess(tag, n, repeat, x, nnz, r_ind, c_ind, H_val); sparse_jac(tag, n, repeat, x, nnz, r_ind, c_ind, J_val); Performance of new drivers?? Tests with problems from the CUTE test set collection

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 10 / 17

SLIDE 27

Numerical Results

Computation of Sparsity Patterns

OPS

NID
≤ 6(1 + ˆ

n)

l

i=1

¯ ni ⇒ TIME

hess_pat(...)
(1 + ˆ

n)TIME(f)

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 5 10 15 20 25 30 35 dimension run−time ratio morebv lminsurf chainwoo broydnbd

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 11 / 17

SLIDE 28

Numerical Results

Varying the number of nonzeros I

2000 4000 6000 8000 10000 5 10 15 dimension run−time ratio chainwoo, nnz = 4 chainwoo, nnz = 7 chainwoo, nnz = 9 chainwoo, nnz = 11

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 12 / 17

SLIDE 29

Numerical Results

Varying the number of nonzeros II

2000 4000 6000 8000 10000 5 10 15 20 25 30 35 40 dimension run−time ratio morebv, nnz = 5 morebv, nnz = 7 morebv, nnz = 9 morebv, nnz = 11

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 13 / 17

SLIDE 30

Numerical Results

Computation of Hessian-matrix products I

TIME(H(x)S) TIME(f(x)) ≤ ωsoad p ≤ 10p, TIME( ˙ ¯ X) TIME(f(x)) ≤ ωsoadp ≤ 4 + 6p

1 2 3 4 5 6 50 100 150 200 # vectors / columns run−time ratio (dtoc2) Hessian−vector products Hessian−matrix products

slopes scalar soad vector soad dtoc2 28.6 13.2

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 14 / 17

SLIDE 31

Numerical Results

Computation of Hessian-matrix products II

TIME(H(x)S) TIME(f(x)) ≤ ωsoad p ≤ 10p, TIME( ˙ ¯ X) TIME(f(x)) ≤ ωsoadp ≤ 4 + 6p,

10 20 30 40 50 200 400 600 800 # vectors / columns run−time ratio (eigena2) Hessian−vector products Hessian−matrix products

slopes scalar soad vector soad dtoc2 16.5 7.5

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 15 / 17

SLIDE 32

Numerical Results

Optimization of power-distribution network

provided by Fabrice Zaoui at RTE Number of Variables xj is 22 5 40 Number of Equality Constraints hj = 0 is 11270 Number of Inequality Constraints gj ≥ 0 is 15446

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 16 / 17

SLIDE 33

Numerical Results

Optimization of power-distribution network

provided by Fabrice Zaoui at RTE Number of Variables xj is 22 5 40 Number of Equality Constraints hj = 0 is 11270 Number of Inequality Constraints gj ≥ 0 is 15446 Jacobians h′ and g′ evaluated by compression number of differentiation variables are 28 and 10, respect. Hessian of Lagrangian is evaluated by compression too.

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 16 / 17

SLIDE 34

Numerical Results

Optimization of power-distribution network

provided by Fabrice Zaoui at RTE Number of Variables xj is 22 5 40 Number of Equality Constraints hj = 0 is 11270 Number of Inequality Constraints gj ≥ 0 is 15446 Jacobians h′ and g′ evaluated by compression number of differentiation variables are 28 and 10, respect. Hessian of Lagrangian is evaluated by compression too. Solution by interior point method takes 1 min and 15 seconds. Sparse linear algebra (MA27) less costly than evalutions. In simple cases handcoded derivatives ten times faster.

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 16 / 17

SLIDE 35

Conclusion and Outlook

Efficient calculation of sparse Hessians using

Sparsity patterns available for O((1 + ˆ n)TIME(f)) coloring algorithms Hessian-matrix products complexity significantly reduced

Numerical tests using ADOL-C confirm theoretical results TIME(Sparsity pattern) = O(ˆ n) using reverse mode? Substitution-based methods for generation of seed matrix?

Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 17 / 17