AUDIT: Stress Testing the Automatic Way Youngtaek Kim, Lizy Kurian - - PowerPoint PPT Presentation

audit stress testing the automatic way
SMART_READER_LITE
LIVE PREVIEW

AUDIT: Stress Testing the Automatic Way Youngtaek Kim, Lizy Kurian - - PowerPoint PPT Presentation

MICRO 2012 AUDIT: Stress Testing the Automatic Way Youngtaek Kim, Lizy Kurian John ECE, The University of Texas at Austin Sanjay Pant, Srilatha Manne, Michael Schulte, W. Lloyd Bircher, Madhu S. Sibi Govindan AMD Research and PPO Lab,


slide-1
SLIDE 1

MICRO 2012

AUDIT: Stress Testing the Automatic Way

Youngtaek Kim, Lizy Kurian John

ECE, The University of Texas at Austin

Sanjay Pant, Srilatha Manne, Michael Schulte,

  • W. Lloyd Bircher, Madhu S. Sibi Govindan

AMD Research and PPO Lab, Advanced Micro Devices, Inc.

Laboratory for Computer Architecture 12/04/2012

slide-2
SLIDE 2

AUDIT: Stress Testing the Automatic Way

Outline

  • AUDIT: AUtomated DI/dT Stressmark Generation

– is an automation framework for stressmark generation

  • Target: Multi-core processor
  • Finding Max. Voltage Droop: Genetic Algorithm with hardware measurement

– generates effective di/dt stressmarks in a short time

  • Larger voltage droop than other benchmarks/stressmarks
  • Higher voltage failure points than other benchmarks/stressmarks

– works well with different configurations / architectures

  • Throttling off / on
  • Different processor

Laboratory for Computer Architecture 12/04/2012 2

slide-3
SLIDE 3

AUDIT: Stress Testing the Automatic Way

Introduction: Reliability (di/dt = inductive noise) Issue

Laboratory for Computer Architecture 12/04/2012 3

Physical Structure of PDN Corresponding Circuit Representation

Inductive Noise (di/dt noise)

MB Package Die

RMB idie(t) DVdd(t) VRM CMB RMB LMB LMB Rpkg1 Lpkg1 Rpkg1 Lpkg1 Cpkg Cdie Rdie Rdie Rpkg2 Rpkg2 Lpkg2 Lpkg2 Ldie Ldie

V = Vdd – L*di/dt – RI

current voltage

Vdd Supply Voltage Margin

CPU cycles CPU cycles

Insufficient voltage Increase in delay Timing violation!

slide-4
SLIDE 4

AUDIT: Stress Testing the Automatic Way

  • Unexpected Value / Wrong Result

– bit-flip: 0  1 or 1  0

  • OS Freezing / System Hang
  • Blue Screen
  • Sudden Shutdown

Introduction: Supply Voltage Failure Symptoms

Laboratory for Computer Architecture 12/04/2012 4

slide-5
SLIDE 5

AUDIT: Stress Testing the Automatic Way

Background: Characteristics of di/dt Voltage Noise

12/04/2012 Laboratory for Computer Architecture 5

slide-6
SLIDE 6

AUDIT: Stress Testing the Automatic Way

Related Work & Motivation

  • To characterize di/dt voltage noise in a microprocessor

– Using standard benchmarks

  • SPEC benchmarks: ineffective to test voltage margin

– Generating and running di/dt stressmarks

  • Manual stressmark [Joseph, HPCA’03]

– Inefficient to make a new manual stressmark for different configurations

  • Instruction Scheduling using Integer Linear Programming (ILP)

[Ketkar, MICRO’09] – Difficult to make linear algebra formula for a complex system

  • Genetic Algorithm [Joshi, HPCA’08] [Kim, ISLPED’11]

– Single-core, simulation only

Automatic di/dt Stressmark Generation using Genetic Algorithm with Post-Silicon Hardware Measurement

Laboratory for Computer Architecture 12/04/2012 6

slide-7
SLIDE 7

AUDIT: Stress Testing the Automatic Way

AUDIT: AUtomated DI/dT Stressmark Generation

Laboratory for Computer Architecture 12/04/2012 7

Code Gen. Genetic Algorithm Simulator Control Params Opcode List Opcode Seq

(no regs)

x86 Assembly HSPICE Current Trace Cost Function PDN Initial Seed Entries POPULATION Measure

HW

Met Exit Cond? End

No Yes

slide-8
SLIDE 8

AUDIT: Stress Testing the Automatic Way

AUDIT: AUtomated DI/dT Stressmark Generation

Laboratory for Computer Architecture 12/04/2012 8

Code Gen. Genetic Algorithm Simulator Control Params Opcode List Opcode Seq

(no regs)

x86 Assembly HSPICE Current Trace Cost Function PDN Initial Seed Entries POPULATION Measure

HW

Met Exit Cond? End

No Yes

slide-9
SLIDE 9

AUDIT: Stress Testing the Automatic Way

AUDIT – Genetic Algorithm: Operational Concept

Laboratory for Computer Architecture 12/04/2012 9

after mutation initial after crossover 1.0V

  • max. vdroop

Voltage Time

slide-10
SLIDE 10

AUDIT: Stress Testing the Automatic Way

AUDIT: AUtomated DI/dT Stressmark Generation

Laboratory for Computer Architecture 12/04/2012 10

Code Gen. Genetic Algorithm Simulator Control Params Opcode List Opcode Seq

(no regs)

x86 Assembly HSPICE Current Trace Cost Function PDN Initial Seed Entries POPULATION Measure

HW

Met Exit Cond? End

No Yes

slide-11
SLIDE 11

AUDIT: Stress Testing the Automatic Way

AUDIT – Instruction Sequence Generation

Laboratory for Computer Architecture 12/04/2012 11

slide-12
SLIDE 12

AUDIT: Stress Testing the Automatic Way

AUDIT: AUtomated DI/dT Stressmark Generation

Laboratory for Computer Architecture 12/04/2012 12

Code Gen. Genetic Algorithm Simulator Control Params Opcode List Opcode Seq

(no regs)

x86 Assembly HSPICE Current Trace Cost Function PDN Initial Seed Entries POPULATION Measure

HW

Met Exit Cond? End

No Yes

slide-13
SLIDE 13

AUDIT: Stress Testing the Automatic Way

  • Hardware Measurement for Max. Voltage Droop and Power

AUDIT – Hardware Measurement

Laboratory for Computer Architecture 12/04/2012

Target Board Oscilloscope Target Monitor Host Monitor

Differential Probe

DAQ

13

slide-14
SLIDE 14

AUDIT: Stress Testing the Automatic Way

12/04/2012

Step 1: Frequency Sweep

  • To find 1st droop resonant frequency,

frequency sweep = increasing code length with simple instructions

– HP length: 4, 8, 12, …, 4n – LP length: 4, 8, 12, …, 4n – Total length: (4+4), (8+8), (12+12), …, (4n+4n)

Voltage droop (V)

8 16 24 32 40 48 56

Loop Length

Laboratory for Computer Architecture 14

Larger Droop

14

slide-15
SLIDE 15

AUDIT: Stress Testing the Automatic Way

Step 2: Using Sub-blocks for GA

  • Scaling & Replicating the Base part

1. Schedule “Base” 2. Replicate “Base” according to resonant cycles

Laboratory for Computer Architecture 15 12/04/2012 15

slide-16
SLIDE 16

AUDIT: Stress Testing the Automatic Way

1. Prepare a core part – one high-low power pattern 2. Make multiple copies of <1> to increase the intensity of resonance 3. Add a header part that contains initialization codes 4. Make multiple copies of <3> according to the number of threads

Step 3: Code Generation for Multiple Threads

1. 2. 3. 4. 5.

12/04/2012 Laboratory for Computer Architecture 16

slide-17
SLIDE 17

AUDIT: Stress Testing the Automatic Way

  • Natural dithering: thread alignment shifts due to OS

Step 3: Code Generation for Multiple Threads

12/04/2012 Laboratory for Computer Architecture 17

Droop amplitude changes due to OS alignment shifts Max droop 16 ms

slide-18
SLIDE 18

AUDIT: Stress Testing the Automatic Way

1. Prepare a core part – one high-low power pattern 2. Make multiple copies of <1> to increase the intensity of resonance 3. Add a header part that contains initialization codes 4. Make multiple copies of <3> according to the number of threads 5. Attach dithering parts to each thread for alignment

Step 3: Code Generation for Multiple Threads

1. 2. 3. 4. 5.

12/04/2012 Laboratory for Computer Architecture 18

Aligned

slide-19
SLIDE 19

AUDIT: Stress Testing the Automatic Way

  • Benchmark

– Standard benchmark

  • SPEC CPU2006 (12 INTs and 17 FPs): multi-programed
  • PARSEC: multi-threaded

– Stressmark

  • Manual: SM1 and SM2 (single+resonant) and SM-Res (resonant)
  • AUDIT: A-Ex (single) and A-Res (resonant)
  • Compiler: NASM, gcc 4.6.2
  • OS: Windows 7, RedHat Linux Enterprise 6

Experimental Methodology - Benchmark

Laboratory for Computer Architecture 12/04/2012 19

slide-20
SLIDE 20

AUDIT: Stress Testing the Automatic Way

Experimental Methodology - Thread Configuration

Laboratory for Computer Architecture 12/04/2012

Core 2-3 Core 0-1 Core 4-5 Core 6-7

1 T 1 T 1 T 1 T 1 T 1 T 1 T 1 T 1 T 1 T 1 T 1 T 1 T 1 T

1 T

1T 2T 4T 8T

Thread

20

AMD Bulldozer Cores Shared 2 cores per Bulldozer Front-end FPU L2 Cache 1T per module 2T per module

slide-21
SLIDE 21

AUDIT: Stress Testing the Automatic Way

Experimental Results - Max. Voltage Droop

Laboratory for Computer Architecture 12/04/2012 21

SPEC INT SPEC FP

Larger Droop

Relative to Manual (SM1) PARSEC

slide-22
SLIDE 22

AUDIT: Stress Testing the Automatic Way

Experimental Results - Max. Voltage Droop

Laboratory for Computer Architecture 12/04/2012 22

Larger Droop

Relative to Manual (SM1)

  • Manual: SM1, SM2, SM-Res
  • AUDIT Single Droop: A-Ex
  • AUDIT Resonant: A-Res
slide-23
SLIDE 23

AUDIT: Stress Testing the Automatic Way

Experimental Results - Max. Voltage Droop

Laboratory for Computer Architecture 12/04/2012 23

Larger Droop

Relative to Manual (SM1)

  • Manual: SM1, SM2, SM-Res
  • AUDIT Single Droop: A-Ex
  • AUDIT Resonant: A-Res, A-Res-8T
slide-24
SLIDE 24

AUDIT: Stress Testing the Automatic Way

Experimental Results - Voltage at Failure

Laboratory for Computer Architecture 12/04/2012 24

  • Lowering the operating voltage & finding the voltage at failure
  • Higher voltage at failure  more stressful benchmark

More Stress

Benchmark V at Fail A-Res VF SM-Res VF – 12.5 mV SM1 VF – 62.5 mV A-Ex VF – 75.0 mV SM2 VF – 87.5 mV zeusmp VF – 125 mV swaptions VF – 125 mV

slide-25
SLIDE 25

AUDIT: Stress Testing the Automatic Way

Experimental Results - Droop Probability

Laboratory for Computer Architecture 12/04/2012

Vnom Vnom Vnom

Freq of droop events Freq of droop events Freq of droop events

undershoot

  • vershoot

undershoot

  • vershoot

undershoot

  • vershoot

Proc supply voltage Proc supply voltage Proc supply voltage

Vmin Vmin Vmin

25

  • Histogram of Droop Event

– 8M samples are captured at Max. voltage droop – More frequent, larger droop  more probability of failure * Max. Droop = Vnom - Vmin

slide-26
SLIDE 26

AUDIT: Stress Testing the Automatic Way

  • Floating-Point Unit (FPU) Throttling

– FPU Throttling adjusts power ramp-up/down rates to mitigate di/dt effect – A-Res-Th is generated with FPU Throttling

Experimental Results - FPU Throttling

Laboratory for Computer Architecture 12/04/2012 26

Larger Droop

Relative to Manual (SM1)

slide-27
SLIDE 27

AUDIT: Stress Testing the Automatic Way

  • AMD Phenom II X4

– Different #of cores: 4 cores ( 8 cores in AMD Bulldozer case) – Different issue widths: 3 issues ( 4 issues in AMD Bulldozer case)

Experimental Results - Different Processor

Laboratory for Computer Architecture 12/04/2012

zeusmp SM2 A-Res Relative Droop 0.82 1 1.10 Failure Point VF – 50 mV VF VF

27

slide-28
SLIDE 28

AUDIT: Stress Testing the Automatic Way

Conclusion

  • AUDIT: AUtomatic DI/dT Stressmark Generation

– Targets Multi-core processor – Genetic Algorithm with Hardware Measurement – 40% more voltage droop than manual stressmark (SM1) – 62mV higher voltage failure points than manual stressmark (SM1) – Generation time is less than 5 hours – Works well with different configurations / architectures

Laboratory for Computer Architecture 12/04/2012 28

slide-29
SLIDE 29

AUDIT: Stress Testing the Automatic Way

Laboratory for Computer Architecture 12/04/2012

Thank You! Any Questions?

29