Analysis of the Tradeoffs between Energy and Run Time for Multilevel - - PowerPoint PPT Presentation

analysis of the tradeoffs between energy and run time for
SMART_READER_LITE
LIVE PREVIEW

Analysis of the Tradeoffs between Energy and Run Time for Multilevel - - PowerPoint PPT Presentation

Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing Prasanna Balaprakash, Leonardo A. Bautista Gomez , Slim Bouguerra, Stefan M. Wild, Franck Cappello , and Paul D. Hovland ANL PMBS workshop @ SC14


slide-1
SLIDE 1

Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing

Prasanna Balaprakash, Leonardo A. Bautista Gomez, Slim Bouguerra, Stefan M. Wild, Franck Cappello , and Paul D. Hovland

ANL

PMBS workshop @ SC’14

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 1

slide-2
SLIDE 2

Context and motivations

Context: The Need For Speed

Figure : From http://www.scidacreview.org/

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 2

slide-3
SLIDE 3

Context and motivations

Motivation: Failures

Sequoia MTBF ≈ 1 day. Blue Waters 2 nodes failure per day. Titan MTBF < 1 day.

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 3

slide-4
SLIDE 4

Context and motivations

Motivation: Failures

Sequoia MTBF ≈ 1 day. Blue Waters 2 nodes failure per day. Titan MTBF < 1 day. ≈ 20 % of the computation is wasted in recovery and re-execution (Implies energy waste)

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 3

slide-5
SLIDE 5

Context and motivations

Motivation: Failures

Sequoia MTBF ≈ 1 day. Blue Waters 2 nodes failure per day. Titan MTBF < 1 day. ≈ 20 % of the computation is wasted in recovery and re-execution (Implies energy waste) Exascale: The number of components for both memory and processors will increase by a factor of 100. Shrinking the circuit sizes and running at lower voltages, increases the SDC probability.

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 3

slide-6
SLIDE 6

Context and motivations

Motivation: Failures

Sequoia MTBF ≈ 1 day. Blue Waters 2 nodes failure per day. Titan MTBF < 1 day. ≈ 20 % of the computation is wasted in recovery and re-execution (Implies energy waste) Exascale: The number of components for both memory and processors will increase by a factor of 100. Shrinking the circuit sizes and running at lower voltages, increases the SDC probability. In exascale failures will occur at higher frequency, optimistic MTBF is couple of hours.

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 3

slide-7
SLIDE 7

Context and motivations

Motivation: Energy

The power draw of the interconnect on Blue Gene/Q appears to be independent of load. CPU varies only by some 20% Power draw under different loads is DRAM change by a factor of 2 or more. Exascale: http://www.scidacreview.org/1001/html/hardware.html Data movement and IO will consume more than 70% of the total system power (most of the 20 MW will go just to power the 10 PB of total system memory.) Flops/Watt VS Communication/Watts Avoid checkpointing and data movement do more re-computations. VS Avoid re-computations via checkpointing more often.

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 4

slide-8
SLIDE 8

Context and motivations

Related Work

ECOTFIT, Diouri et al Blocking checkpointing Message logging Conclusion no big tradeoff observed. Meneses et al Parallel recovery vs global recovery Used RALP API (No communication or IO are covered) Parallel is better since it reduces the overall time Aupy et al Blocking vs no-blocking single level. No experiment.

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 5

slide-9
SLIDE 9

Context and motivations

Related Work

ECOTFIT, Diouri et al Blocking checkpointing Message logging Conclusion no big tradeoff observed. Meneses et al Parallel recovery vs global recovery Used RALP API (No communication or IO are covered) Parallel is better since it reduces the overall time Aupy et al Blocking vs no-blocking single level. No experiment. The missing episode What about multilevel checkpointing ?

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 5

slide-10
SLIDE 10

Problem formulation and notations

1 Context and motivations 2 Problem formulation and notations

Multilevel checkpointing Energy model Multiobjective optimization

3 Simulation and experimentations

Experimentations Tradeoff analysis

4 Conclusion and future work

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 6

slide-11
SLIDE 11

Problem formulation and notations Multilevel checkpointing

Multilevel Checkpointing

Multiple levels of storage (DRAM, NVM, PFS). Coupled with data replication and erasure codes. Low levels offer high performance and partial reliability. High levels offer high reliability but impose large overhead. Different ckpt. levels have different frequencies. After a failure the application restart from the lowest available level. If unable to recover, try next level of checkpoint (Further in the past).

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 7

slide-12
SLIDE 12

Problem formulation and notations Multilevel checkpointing

Wasted time model

L levels of checkpoint (4 with FTI) Checkpoint strategy: τi, for i = 1 · · · L Checkpoint cost: ci for level i ri time for a restart from level i di downtime after a failure affecting level i. µi rate of failures affecting level i.

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 8

slide-13
SLIDE 13

Problem formulation and notations Energy model

Wasted energy model

Pc

i power for a level i checkpoint Watts.

Pr

i power for a restart from level i Watts.

Pa power for a failure-free computation without checkpointing Watts. µi rate for failure affecting level i.

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 9

slide-14
SLIDE 14

Problem formulation and notations Multiobjective optimization

Problem solving

Checkpoint time Wch =

L

  • i=1

 ci τ i + µiτi

i−1

  • j=1

cj 2τj  

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 10

slide-15
SLIDE 15

Problem formulation and notations Multiobjective optimization

Problem solving

Checkpoint time Wch =

L

  • i=1

 ci τ i + µiτi

i−1

  • j=1

cj 2τj   Rework time Wrew =

L

  • i=1

µiτi 2

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 10

slide-16
SLIDE 16

Problem formulation and notations Multiobjective optimization

Problem solving

Checkpoint time Wch =

L

  • i=1

 ci τ i + µiτi

i−1

  • j=1

cj 2τj   Rework time Wrew =

L

  • i=1

µiτi 2 Downtime and restart time Wdown =

L

  • i=1

µi(ri + di)

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 10

slide-17
SLIDE 17

Problem formulation and notations Multiobjective optimization

Problem solving

Checkpoint wasted energy Ech =

L

  • i=1

Pc

i

ci τi + µiτi

i−1

  • j=1

Pc

j cj

2τj

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 11

slide-18
SLIDE 18

Problem formulation and notations Multiobjective optimization

Problem solving

Checkpoint wasted energy Ech =

L

  • i=1

Pc

i

ci τi + µiτi

i−1

  • j=1

Pc

j cj

2τj Rework wasted energy Erew =

L

  • i=1

Pa µiτi 2

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 11

slide-19
SLIDE 19

Problem formulation and notations Multiobjective optimization

Problem solving

Checkpoint wasted energy Ech =

L

  • i=1

Pc

i

ci τi + µiτi

i−1

  • j=1

Pc

j cj

2τj Rework wasted energy Erew =

L

  • i=1

Pa µiτi 2 Downtime and restart wasted energy Edown =

L

  • i=1

Pr

i µi(ri + di)

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 11

slide-20
SLIDE 20

Problem formulation and notations Multiobjective optimization

Total wasted time

W =

L

  • i=1
  • ci

τi + µiτi 2

  • 1 +

i−1

  • j=1

cj 2τj

  • + µi (ri + di)
  • (1)

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 12

slide-21
SLIDE 21

Problem formulation and notations Multiobjective optimization

Total wasted time

W =

L

  • i=1
  • ci

τi + µiτi 2

  • 1 +

i−1

  • j=1

cj 2τj

  • + µi (ri + di)
  • (1)

Total wasted energy E = L

i=1

Pc

i ci

τi

+ µiτi

  • Pa

2 + i−1 j=1 Pc

j cj

2τj

  • + L

i=1 Pr i µi(ri + di),

(2)

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 12

slide-22
SLIDE 22

Problem formulation and notations Multiobjective optimization

First derivatives

∂W ∂τi = µi 2

  • 1 +

i−1

  • j=1

cj τj

  • − ci

τ 2

i

  • 1 +

L

  • j=i+1

µjτj 2

  • (3)

∂E ∂τi = µi 2

  • Pa +

i−1

  • j=1

Pc

j cj

τj

  • − Pc

i ci

τ 2

i

  • 1 +

L

  • j=i+1

µjτj 2

  • (4)

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 13

slide-23
SLIDE 23

Problem formulation and notations Multiobjective optimization

First derivatives

∂W ∂τi = µi 2

  • 1 +

i−1

  • j=1

cj τj

  • − ci

τ 2

i

  • 1 +

L

  • j=i+1

µjτj 2

  • (3)

∂E ∂τi = µi 2

  • Pa +

i−1

  • j=1

Pc

j cj

τj

  • − Pc

i ci

τ 2

i

  • 1 +

L

  • j=i+1

µjτj 2

  • (4)

Solutions τ W

i

=

  • ci(2 + L

j=i+1 µjτ W j )

µi(1 + i−1

j=1 cj τ W

j )

τ E

i =

  • ρici(2 + L

j=i+1 µjτ E j )

µi(1 + i−1

j=1 ρjcj τ E

j )

ρi = Pc

i /Pa

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 13

slide-24
SLIDE 24

Problem formulation and notations Multiobjective optimization

Solutions τ W

i

=

  • ci(2 + L

j=i+1 µjτ W j )

µi(1 + i−1

j=1 cj τ W

j )

τ E

i =

  • ρici(2 + L

j=i+1 µjτ E j )

µi(1 + i−1

j=1 ρjcj τ E

j )

ρi = Pc

i /Pa

Solutions For one single level we have : τ W =

  • 2c/µ and τ E = τ W

Pc/Pa Whenever Pc = Pa, we have that τ W = τ E, and hence the two objectives are conflicting.

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 14

slide-25
SLIDE 25

Problem formulation and notations Multiobjective optimization

Pareto front

Definition τ i is said to be Pareto-optimal if it is not dominated by any other τ j.

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 15

slide-26
SLIDE 26

Problem formulation and notations Multiobjective optimization

Pareto front

Definition τ i is said to be Pareto-optimal if it is not dominated by any other τ j. Convex combination If the Pareto front is convex, any point on the front can be obtained by minimizing a linear combination of the objectives. fλ(τ) = λW(τ) + (1 − λ)E(τ), for λ ∈ [0, 1]. Theorem The Hessian ∇2

ττW(τ) is diagonally dominant, and thus W is a convex

function of τ over the domain (same for E(τ))

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 15

slide-27
SLIDE 27

Problem formulation and notations Multiobjective optimization

Pareto front

Convex combination If the Pareto front is convex, any point on the front can be obtained by minimizing a linear combination of the objectives. fλ(τ) = λW(τ) + (1 − λ)E(τ), for λ ∈ [0, 1]. Theorem The Hessian ∇2

ττW(τ) is diagonally dominant, and thus W is a convex

function of τ over the domain (same for E(τ))

τ ∗

i (λ) =

  • ci(λ + (1 − λ)Pc

i )

  • 2 +

L

  • j=i+1

µjτ ∗

j

  • µi
  • λ + (1 − λ)Pa +

i−1

  • j=1

(λ + (1 − λ)Pc

j ) cj τ∗

j

, (5)

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 15

slide-28
SLIDE 28

Problem formulation and notations Multiobjective optimization

Pareto front

Theorem The Hessian ∇2

ττW(τ) is diagonally dominant, and thus W is a convex

function of τ over the domain (same for E(τ))

τ ∗

i (λ) =

  • ci(λ + (1 − λ)Pc

i )

  • 2 +

L

  • j=i+1

µjτ ∗

j

  • µi
  • λ + (1 − λ)Pa +

i−1

  • j=1

(λ + (1 − λ)Pc

j ) cj τ∗

j

, (5)

Case one level (L = 1) τ ∗(λ) = τ W

  • λ + (1 − λ)Pc

λ + (1 − λ)Pa . (6)

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 15

slide-29
SLIDE 29

Problem formulation and notations Multiobjective optimization

Pareto Front

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 16

slide-30
SLIDE 30

Simulation and experimentations

1 Context and motivations 2 Problem formulation and notations

Multilevel checkpointing Energy model Multiobjective optimization

3 Simulation and experimentations

Experimentations Tradeoff analysis

4 Conclusion and future work

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 17

slide-31
SLIDE 31

Simulation and experimentations Experimentations

Platform

Mira 10 PF IBM Blue Gene/Q (BG/Q) 49,152 nodes organized in 48 racks 16 cores of 1.6 GHz PowerPC A2 and 16 GB of DDR3 memory. 5-D torus network. Vesta (developmental platform for Mira) Same as Mira’s but with 2,048 nodes

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 18

slide-32
SLIDE 32

Simulation and experimentations Experimentations

Platform

Mira 10 PF IBM Blue Gene/Q (BG/Q) 49,152 nodes organized in 48 racks 16 cores of 1.6 GHz PowerPC A2 and 16 GB of DDR3 memory. 5-D torus network. Vesta (developmental platform for Mira) Same as Mira’s but with 2,048 nodes MonEQ for power measurement (resolution of 560 ms) Chip core DRAM Network Collect power data only at the node card level (every 32 nodes) The library can not measure the I/O power consumption !!

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 18

slide-33
SLIDE 33

Simulation and experimentations Experimentations

Applications

LAMMPS Production-level molecular dynamics application. Lennard-Jones simulation of 1.3 billion atoms. Checkpoint size per node ≈ 200 MB (≈100GB application’s footprint) Checkpoints intervals 4, 8, 16 and 32 minutes. CORAL

Qbox: first-principles molecular dynamics AMG: is a parallel algebraic multigrid solver for linear systems

arising from problems on unstructured grids.

LULESH: performs hydrodynamics stencil calculations miniFE: is a finite-element code.

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 19

slide-34
SLIDE 34

Simulation and experimentations Experimentations

LAMMPS: synchronous

Figure : Synchronous multilevel checkpointing

1.3 billion atoms Lennard-Jones simulation. 512 nodes running 64 MPI ranks per node (32,678 proc.).

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 20

slide-35
SLIDE 35

Simulation and experimentations Experimentations

LAMMPS: asynchronous

Figure : Asynchronous multilevel checkpointing

1.3 billion atoms Lennard-Jones simulation. 512 nodes running 64 MPI ranks per node (32,678 proc.).

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 21

slide-36
SLIDE 36

Simulation and experimentations Experimentations

CORAL benchmark

(a) LULESH (b) MiniFE (c) AMG (d) Qbox

32 nodes. Each application is run with a configuration of 16 MPI ranks

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 22

slide-37
SLIDE 37

Simulation and experimentations Tradeoff analysis

Pareto fronts: Level 4 power consumption

(e) Level 4 power consumption Pc

4 leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 23

slide-38
SLIDE 38

Simulation and experimentations Tradeoff analysis

Pareto fronts :Computation power

(f) Computation power Pa

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 24

slide-39
SLIDE 39

Simulation and experimentations Tradeoff analysis

Pareto fronts : Power ratio Pc

Pa (g) Power ratio Pc

Pa leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 25

slide-40
SLIDE 40

Conclusion and future work

1 Context and motivations 2 Problem formulation and notations

Multilevel checkpointing Energy model Multiobjective optimization

3 Simulation and experimentations

Experimentations Tradeoff analysis

4 Conclusion and future work

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 26

slide-41
SLIDE 41

Conclusion and future work

Summary

Analytical models of performance and energy for multilevel checkpoint schemes. The pareto-front is obtained using convex combination. Power measurement experiments with production-level scientific applications running on over 32,000 MPI processes: The relative energy overhead of using FTI is minor and thus the tradeoffs are relatively small. Richer tradeoff exist when the power consumption of checkpointing is greater than that of the computation.

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 27

slide-42
SLIDE 42

Conclusion and future work

What is Next?

Analyzing power profile of different fault tolerance protocols such as full/partial replication and message logging. The viability of replication with respect to the power cap of future exascale platforms

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 28

slide-43
SLIDE 43

Conclusion and future work

Questions ?

Thank You !!

leobago@anl.gov (ANL) Tradeoffs between Energy and Run Time PMBS workshop @ SC’14 29