Star-Cap: Cluster Power Management Using Software-Only Models John - - PowerPoint PPT Presentation

star cap cluster power management using software only
SMART_READER_LITE
LIVE PREVIEW

Star-Cap: Cluster Power Management Using Software-Only Models John - - PowerPoint PPT Presentation

Star-Cap: Cluster Power Management Using Software-Only Models John D. Davis Suzanne Rivoire (rivoire@sonoma.edu) Moiss Goldszmidt (Microsoft Research) ICPP Workshop on Power-aware Algorithms, Systems, and Architectures (PASA) Sept. 10, 2014


slide-1
SLIDE 1

Star-Cap: Cluster Power Management Using Software-Only Models

John D. Davis Suzanne Rivoire (rivoire@sonoma.edu) Moisés Goldszmidt (Microsoft Research) ICPP Workshop on Power-aware Algorithms, Systems, and Architectures (PASA)

  • Sept. 10, 2014
slide-2
SLIDE 2

Power capping motivation

  •  Reduce waste from overprovisioning
  •  Provision for actual maximum power

instead of sum of nameplate power

  •  Have a mechanism to throttle power

consumption

  •  Major server manufacturers offer this

feature; Intel offers at chip level (RAPL)

2

[Femal ICAC ‘05, Ranganathan ISCA ‘06, Lefurgy ICAC ’07…]

slide-3
SLIDE 3

The problem with vendor solutions

  •  Additional management hardware,

additional cost or limited to chip

  •  Compare to trend of customized bare-

bones servers…

  •  …and “wimpy nodes” for data-intensive

workloads Goal: eliminate cost of hardware instrumentation

3

slide-4
SLIDE 4

Outline

  •  Star-Cap overview
  •  Software-only power models
  •  Power capping schemes
  •  Evaluation

4

slide-5
SLIDE 5

Two-level scheme

  •  Top level: determine node power budgets
  •  Node level: enforce and report

5

!

Machine(1 Machine(N Power( Model Power( Control Power( Model Power( Control Power( Management( Policies

slide-6
SLIDE 6

Sensors and Actuators

  •  Sensors: OS-level, architecture-

independent performance counters

  •  Actuators:

n For this work, DVFS states n Nothing prevents other mechanisms from being used

6

slide-7
SLIDE 7

Outline

  •  Star-Cap overview
  •  Software-only power models
  •  Power capping schemes
  •  Evaluation

7

slide-8
SLIDE 8

OS-level counters

  •  Full-system, not a specific component
  •  OS-level, architecture-independent

counters

  •  Piecewise quadratic model, fit with MARS

[Davis et al., IISWC ‘12]

8

f(x)

OS-level counters Node AC power

slide-9
SLIDE 9

Model training process

1 ETW (Event Tracing for Windows)

n Architecture counters: ~250 n Processor, physical and logical disk, network, memory, filesystem

2 Remove redundant counters: ~45

n Correlation Matrix (> |0.95|) n Performance counter definitions

3 Select features: ~10

n R glmpath with L1 regularization n Stepwise refinement

9

slide-10
SLIDE 10

Outline

  •  Star-Cap overview
  •  Software-only power models
  •  Power capping schemes
  •  Evaluation

10

slide-11
SLIDE 11

Star-Cap Overview

  •  Inputs to all schemes

n Target node-level power consumption (set at top level) n Current power (modeled or measured) n List of available frequency states

  •  Outputs

n List of frequency states available to OS n Let current OS policy select from available states

11

slide-12
SLIDE 12

Threshold-based

  •  If Pcurrent < Plo

n Make the next highest frequency state available

  •  If Pcurrent > Phi

n Remove highest frequency state from available list

  •  Our thresholds:

n Phi = 95% of cap n Plo = 90% of cap

12

slide-13
SLIDE 13

Reactive Capping (ReCap)

  •  Adjust frequency state based on Pcurrent
  •  After making a change, wait for it to settle

before making another (reduce oscillations)

  •  Three versions:

n M-ReCap: Pcurrent is measured power n L-ReCap: Pcurrent is predicted by a CPU- utilization-based linear model n C-ReCap: Pcurrent is predicted by quadratic power model in previous section

13

slide-14
SLIDE 14

Proactive Capping (ProCap)

  •  Use quadratic power model to predict

Pcurrent

  •  Before changing available frequencies,

predict Pnext

n Using next allowable frequency state n Keeping all other counters constant (oversimplification!)

  •  If Pnext would violate threshold, don’t bother

adjusting available frequencies

14

slide-15
SLIDE 15

Outline

  •  Star-Cap overview
  •  Software-only power models
  •  Power capping schemes
  •  Evaluation

15

slide-16
SLIDE 16

Workloads

  •  Primes (CPU)
  •  Staticrank (Net)
  •  Sort (Disk, Net)
  •  Wordcount (Disk)
  •  All run across 5

homogeneous nodes

16

Primes Staticrank Sort Wordcount

slide-17
SLIDE 17

Hardware Systems

17 Cluster Intel Core 2 Duo (laptop) AMD Opteron (server) CPU Intel Core 2 Duo X2 2.26 GHz AMD Opteron 2X4 2.0 GHz Storage SSD HDD Idle Power (W) 25 135 Dyn Power range (W) 20 55 OS Windows Server 2008 R2

slide-18
SLIDE 18

Hardware Systems

18 Cluster Intel Core 2 Duo (laptop) AMD Opteron (server) CPU Intel Core 2 Duo X2 2.26 GHz AMD Opteron 2X4 2.0 GHz Storage SSD HDD Idle Power (W) 25 135 Dyn Power range (W) 20 55 OS Windows Server 2008 R2

slide-19
SLIDE 19

Hardware Systems

19 Cluster Intel Core 2 Duo (laptop) AMD Opteron (server) CPU Intel Core 2 Duo X2 2.26 GHz AMD Opteron 2X4 2.0 GHz Storage SSD HDD Idle Power (W) 25 135 Dyn Power range (W) 20 55 OS Windows Server 2008 R2

4 frequency states: 100%, 94%, 82%, 70%

slide-20
SLIDE 20

Power profiles

20

!

20 25 30 35 40 45 50 1 3601 7201 10801

Power&(W) Time&(Hrs) Node-02 Node-03 Node-04 Node-05

20 25 45 50 40 30 35 1 2 3 No)Frequency)Cap 94%)Frequency)Cap 82%)Frequency)Cap 70%)Frequency)Cap

WordCount Sort PageRank Prime

{ NodeD02 NodeD03 NodeD04 NodeD05 Server)Power)(W)

If DVFS is the only actuator, some power budgets will be much easier to deal with than others.

slide-21
SLIDE 21

Reactive capping: modeled vs. measured power

  •  Low power cap (38 W)
  •  Graph shows 1 node
  •  Blue: ReCap based on measured power
  •  Gray: ReCap based on model power

21

450 500 550 600 650 700 250 300 350 400 450 500 M+ReCap C+ReCap ProCap 25 30 35 40 45 50

slide-22
SLIDE 22

Reactive vs. proactive capping

  •  Same power cap
  •  Blue: ReCap based on measured power
  •  Purple: ProCap

22

!

200 250 300 350 400 50 100 150 200 25 30 35 40 40 10 20 30 50 60 WordCount Sort PageRank Prime (A) (B)

slide-23
SLIDE 23

Higher power cap

  •  42W cap
  •  Left: M-Recap

Center: L-Recap Right: ProCap

  •  Model accuracy matters!

23

20 25 30 35 40 45 50 1 101 201 Time&(s) Cluster,model,,prediction 20 25 30 35 40 45 50 1 101 201 Time&(s) Linear,model,,window

(B) (C)

20 25 30 35 40 45 50 1 101 201

Power&(W)

Time&(s) Measured,power,,window Power,Cap

(A) WordCount Prime WordCount Prime WordCount Prime MEReCap Power,Cap,Threshold ProCap LEReCap 25 30 35 40 45 50 20 0 100 100 100 200 200 200 Time(s) Time(s) Time(s) Server,Power,(W)

Figure'4."42W"power"cap"examples"for"WordCount"and"Primes"using"the"Reactive"Power"Capping"with"(A)"(M?ReCap)"measured"power"and"

slide-24
SLIDE 24

Conclusion

  •  Demonstrated the potential of high-

accuracy, software-only models for server- level power capping

  •  Suitable for low-power, low-cost “wimpy

nodes”

  •  Extensible to other power management

hooks and policies

24

slide-25
SLIDE 25

Backup slides

25

slide-26
SLIDE 26

Dynamic Range Error

Report error as a percent of the dynamic range – idle power shouldn’t count.

26

Cluster Power

Idle power Max power Dynamic Range

slide-27
SLIDE 27

Model Accuracy

27

0% 2% 4% 6% 8% 10% 12% 14% 16% Linear Piecewise Linear Quadratic Switching Linear

Average DRE Modeling Techniques

CPU utilization CPU utilization and MHz Cluster specific General

Model Features

slide-28
SLIDE 28

Model Features

  •  Automatically selected from over 200 OS

counters

  •  Processor: utilization, frequency
  •  Memory: cache faults/sec; pool nonpaged

allocations

  •  Disk: total disk time %
  •  Filesystem and virtual memory: file system

pin read/sec, peak page file bytes

28