Lin Li, Xiuyi Zhou, Jun Yang, Victor Puchkarev University of - - PowerPoint PPT Presentation

lin li xiuyi zhou jun yang victor puchkarev university of
SMART_READER_LITE
LIVE PREVIEW

Lin Li, Xiuyi Zhou, Jun Yang, Victor Puchkarev University of - - PowerPoint PPT Presentation

Lin Li, Xiuyi Zhou, Jun Yang, Victor Puchkarev University of Pittsburgh 1 Outline Introduction ThresHot Algorithm Experiment and Results Conclusion 2 Thermal Management is Critical Technology Power Density


slide-1
SLIDE 1

Lin Li, Xiuyi Zhou, Jun Yang, Victor Puchkarev University of Pittsburgh

1

slide-2
SLIDE 2

Outline

Introduction ThresHot Algorithm Experiment and Results Conclusion

2

slide-3
SLIDE 3

3

Thermal Management is Critical

Technology ↓ → Power Density ↑ Temperature ↑

Circuit performance ↓ Reliability ↓ Thermal runaway

T ↑ → Pleakage↑ → Ptotal ↑

Packaging and cooling cost ↑

kT E A

e C MTTF × =

slide-4
SLIDE 4

4

Task Scheduling Can Help

Conventionally

Performance throttling, e.g. DVFS

Our objective

Preserve performance – ↓DVFS

Rationale

Workloads stress processor differently in space and

time

Approach

Find a good schedule of workloads to keep

temperature low

slide-5
SLIDE 5

Task Scheduling Trade‐offs

Pros:

No need to change hardware Flexible: scheduling algorithm can be changed in OS

Cons:

Scheduling overhead Lack accurate hardware details

5

slide-6
SLIDE 6

Task Scheduling Algorithms

Objective: Reduce thermal emergencies

Improve performance Improve reliability

Naïve scheduling algorithms

Random Round‐Robin Power balancing

6

slide-7
SLIDE 7

Outline

Introduction ThresHot Algorithm Experiment and Results Conclusion

7

slide-8
SLIDE 8

Temperature Slack

Temporal temperature slack in a single processor

  • Task scheduling can reduce thermal emergencies [Yang et al. ISPASS 2008]

Spatial and temporal temperature slack in CMP

How to schedule tasks to minimize total thermal emergencies?

8

slide-9
SLIDE 9

Thermal Model

  • Han, Koren, Krishna, “TILTS: A Fast Architectural‐Level Transient

Thermal Simulation Method,” J. of Low Power Electronics, 2007.

9

T(n) = AT(n-1) + BP(n-1)

slide-10
SLIDE 10

Understanding the Model 1

AT(n‐1) describes the temperature drop at time n, if

there is no power

Available temperature slacks formed

10

slide-11
SLIDE 11

Understanding the Model 2

BP(n‐1) describes the temperature increase due to

injected power of different tasks

Task scheduling is to find a mapping between these

increases and the thermal slacks.

11

slide-12
SLIDE 12

Fast Temperature Calculation

Temperature rises due to power hardly change from core

to core

Calculate AT(n‐1) and BP(n‐1) once → T(n) for all

possible schedule

12

AT(n-1) BP(n-1)

slide-13
SLIDE 13

TSM: Temperature Slack Matrix

+

  • 1

2 3 4 1 2 3 4 Core Tasks

13

slide-14
SLIDE 14

Scheduling Hot Hazard Tasks

Hot hazard jobs

Too hot even on the coolest core Decision: Map it to the coolest core Minimize DVFS penalty in the current scheduling cycle

14

n-1 n t DVFS-on Temperature DVFS-off Temperature

……

c1 c2 c3 c4

slide-15
SLIDE 15

Scheduling Mild Tasks

Mild jobs

A schedule can be found w/o DVFS Goal is not to average the temperature Rather, reserve cool core resources for hot hazard tasks in the

future

15

n-1 n t DVFS-on Temperature DVFS-off Temperature Reserve cool core resource Not exceeding threshold c1 c3 c4 c2 J3 J4 J2 J1

slide-16
SLIDE 16

ThresHot Scheduling with TSM

0.415 8.973

  • 7.617

12.322 0.773 9.285

  • 7.259

12.635 0.524 10.158

  • 7.507

16.503 0.857 9.407

  • 7.175

12.975

1 2 3 4 1 2 3 4 Core Tasks

2.569 37.823

  • 29.558

54.435

16

slide-17
SLIDE 17

Outline

Introduction ThresHot Algorithm Experiment and Results Conclusion

17

slide-18
SLIDE 18

Experiment Methodology

Thermal model: HotSpot 4.0 + TILTS Power trace

Running real SPEC2K benchmarks Extracted from performance counter

Hardware DVFS:

Triggered on/off at 86.5/85.5 Frequency scaling: 0.7 Voltage scaling: 0.92 DTM triggering overhead: 30 us Schedule interval: 8ms

18

slide-19
SLIDE 19

Experiment Methodology

Quad core floorplan based on P4 Northwood: 93 function

units with shared L2 cache

19

slide-20
SLIDE 20

Performance Comparison

  • ThresHot minimizes thermal emergencies to mitigate the performance loss from DVFS
  • 13% and 6% reduction in performance penalty over “Base” and “Balancing”

20

slide-21
SLIDE 21

Reliability Comparison

  • Thermal cyclings caused by the significant temperature variations are

minimal in ThresHot

21

Algorithm <10°C [10°C~15 °C] [15°C~20 °C] >20°C Baseline 99.91 0.07 0.02 0.01 Random 97.45 1.55 0.68 0.32 Balancing 95.50 2.67 1.23 0.60 RR-1 95.83 2.60 1.05 0.52 RR-2 96.91 1.93 0.78 0.38 ThresHot 98.22 1.21 0.43 0.14

slide-22
SLIDE 22

Thermal Behavior Comparison

22

Baseline RR Balancing ThresHot

slide-23
SLIDE 23

Conclusion

ThresHot algorithm does better than RR and Balancing in

reducing thermal emergencies, and thermal cyclings

ThresHot improves the performance in penalized time

period by 13% and 6% compared to Baseline and Balancing

Function unit level thermal control

23

slide-24
SLIDE 24

24