Branch Prediction Branch Prediction vs vs Execution Time - - PDF document

branch prediction branch prediction vs vs execution time
SMART_READER_LITE
LIVE PREVIEW

Branch Prediction Branch Prediction vs vs Execution Time - - PDF document

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction Prediction Jakob Engblom, PhD Jakob Engblom, PhD Uppsala Unive University rsity & Virtutech Inc. & Virtutech Inc. Uppsala virtutech virtu


slide-1
SLIDE 1

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction Prediction

Jakob Engblom, PhD Jakob Engblom, PhD

Uppsala Uppsala Unive University rsity & Virtutech Inc. & Virtutech Inc.

jakob.engblom@it.uu.se jakob.engblom@it.uu.se jakob@virtutech.com jakob@virtutech.com

tech virtutech virtutech virtutech virtu

slide-2
SLIDE 2

RTAS 2003 Branch Prediction & WCET 2

The Question The Question

  • Branch prediction

Branch prediction

  • Performance enhancing technique

Performance enhancing technique

  • Necessary with deep pipelines

Necessary with deep pipelines

  • Works well, on average

Works well, on average

  • Execution time prediction

Execution time prediction

  • Determining the extremes

Determining the extremes

  • Especially the worst case

Especially the worst case

  • Does ”BP” make ”ETP” harder?

Does ”BP” make ”ETP” harder?

slide-3
SLIDE 3

RTAS 2003 Branch Prediction & WCET 3

Execution Time Estimates Execution Time Estimates

  • WCET

WCET = Worst case = Worst case

  • BCET

BCET = Best case = Best case

tighter tighter safe BCET estimates safe WCET estimates actual BCET actual WCET possible execution times

  • WCET

WCET = Worst case = Worst case (main interest here)

(main interest here)

  • BCET

BCET = Best case = Best case

  • ACET

ACET = Average case = Average case

slide-4
SLIDE 4

RTAS 2003 Branch Prediction & WCET 4

Branch Branch Prediction Prediction

slide-5
SLIDE 5

RTAS 2003 Branch Prediction & WCET 5

The Performance Problem The Performance Problem

cmp r7,5 bne B A: add r4,r5 ... B: bset r5,1 cmp r7,5 cmp r7,5 bne B bne B A: A: add r4,r5 add r4,r5 ... ... B: B: bset r5,1 bset r5,1

IF IF ID ID EX EX MEM MEM WB WB You need to redirect instruction fetch here You need You need to redirect to redirect instruction instruction fetch here fetch here Result of branch is not known until here Result of Result of branch is branch is not known not known until here until here Conditional branch: execution will continue at A or B Conditional Conditional branch: branch: execution will execution will continue at continue at A A or

  • r B

B

= wait to see where branch goes = stall = wait to see = wait to see where branch where branch goes = stall goes = stall

slide-6
SLIDE 6

RTAS 2003 Branch Prediction & WCET 6

Static Techniques Static Techniques

  • Keep fetching ahead

Keep fetching ahead

  • Always assume not taken

Always assume not taken

  • Or introduce ”branch delay slot”

Or introduce ”branch delay slot”

  • BTFN

BTFN

  • Backwards

Backwards-

  • taken

taken

  • Forwards

Forwards-

  • not taken

not taken

  • Recognize branches in IF or ID

Recognize branches in IF or ID

  • Make speculative decision early

Make speculative decision early

  • About 70% correct

About 70% correct

NEC V850, ARM7 NEC V850, NEC V850, ARM7 ARM7 ARM10, + base case in more advanced predictors ARM10, ARM10, + base case + base case in more in more advanced advanced predictors predictors

  • riginal

SPARC & MIPS

  • riginal
  • riginal

SPARC & SPARC & MIPS MIPS

slide-7
SLIDE 7

RTAS 2003 Branch Prediction & WCET 7

Dynamic Techniques Dynamic Techniques

”History will repeat itself” History will repeat itself”

  • Use history of taken/not taken

Use history of taken/not taken

  • One

One-

  • level:

level:

  • One counter per branch

One counter per branch

  • Actually a state machine

Actually a state machine

  • Usually with hysteresis

Usually with hysteresis

  • Implementation:

Implementation:

  • Cache of counters

Cache of counters

  • Indexed by branch address

Indexed by branch address 01 10 11 T T T T NT NT NT NT NT NT 00 NT NT T T T T

Predict: not taken P r e d i c t : t a k e n

Pentium 1, Alpha 21064, UltraSparc II Pentium 1, Pentium 1, Alpha 21064, Alpha 21064, UltraSparc II UltraSparc II

slide-8
SLIDE 8

RTAS 2003 Branch Prediction & WCET 8

Two Two-

  • Level Dynamic

Level Dynamic

”History has a pattern” History has a pattern”

  • Use

Use pattern pattern of taken/not taken

  • f taken/not taken
  • ”taken every other time”, for example

”taken every other time”, for example

  • History register tracks outcomes

History register tracks outcomes

  • History per branch or global

History per branch or global

  • Table of counters

Table of counters

  • Combination of history and address

Combination of history and address

  • 2D table, XOR, ... lots of possibilities

2D table, XOR, ... lots of possibilities address: address: 11100... 11100... history: history: 01001... 01001... +

01 10 11 00 01 10 11 00

. . .

UltraSparc III, Athlon, Pentium 3, Pentium 4, PowerPC G3, G4 UltraSparc III, UltraSparc III, Athlon, Pentium 3, Athlon, Pentium 3, Pentium 4, Pentium 4, PowerPC G3, G4 PowerPC G3, G4

slide-9
SLIDE 9

RTAS 2003 Branch Prediction & WCET 9

The The Experiment Experimental al setup setup

slide-10
SLIDE 10

RTAS 2003 Branch Prediction & WCET 10

Experimental Setup Experimental Setup

for{k=1; k<32; k++) { starttimer(); for(n=0; n < 10000000; n++) { for(i=0; i < k; i++) { __nop(); } } stoptimer(); recordtime(); } for{k=1; k<32; k++) { starttimer(); for(n=0; n < 10000000; n++) { for(i=0; i < k; i++) { __nop(); } } stoptimer(); recordtime(); }

inner: nop dec ri cmp ri,0 bnz inner inner: nop dec ri cmp ri,0 bnz inner

  • uter:

ri=rk

  • uter:

ri=rk dec rn cmp rn,0 bnz outer dec rn cmp rn,0 bnz outer

time this part time this part

slide-11
SLIDE 11

RTAS 2003 Branch Prediction & WCET 11

Baseline Result Baseline Result

  • Static prediction: total time

Static prediction: total time

V850E Time

0,00 2,00 4,00 6,00 8,00 10,00 12,00 14,00 16,00 18,00 20,00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Monotone increase Monotone Monotone increase increase Smooth straight line Smooth Smooth straight straight line line

Perfectly easy to predict Perfectly Perfectly easy to easy to predict predict

slide-12
SLIDE 12

RTAS 2003 Branch Prediction & WCET 12

V850E Time/Count

1,00 1,10 1,20 1,30 1,40 1,50 1,60 1,70 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Baseline Result Baseline Result

  • Static prediction: / inner count

Static prediction: / inner count

Smooth monotone decrease Smooth Smooth monotone monotone decrease decrease Cost of outer loop is spread across more & more iterations

  • f inner loop

Cost of outer Cost of outer loop is spread loop is spread across more & across more & more iterations more iterations

  • f inner loop
  • f inner loop
slide-13
SLIDE 13

RTAS 2003 Branch Prediction & WCET 13

The The Results Results

slide-14
SLIDE 14

RTAS 2003 Branch Prediction & WCET 14

One One-

  • Level Dynamic

Level Dynamic

UltraSparc II Time

1 3 5 7 9 11 13 15 17 19 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Monotone increase, but not exactly smooth Monotone Monotone increase, increase, but not but not exactly exactly smooth smooth Takes some time for predictor to tune in Takes some Takes some time for time for predictor to predictor to tune in tune in

Analyzing or measuring max # iterations is safe Analyzing or Analyzing or measuring max # measuring max # iterations is safe iterations is safe

slide-15
SLIDE 15

RTAS 2003 Branch Prediction & WCET 15

Two Two-

  • Level Dynamic, Local

Level Dynamic, Local

Pentium III Time

1 3 5 7 9 11 13 15 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Inversion: doing more iterations takes less time Inversion Inversion: : doing doing more more iterations iterations takes less takes less time time

Increases the search space for the worst case considerably Increases the Increases the search space for search space for the worst case the worst case considerably considerably

slide-16
SLIDE 16

RTAS 2003 Branch Prediction & WCET 16

Inversions Explained Inversions Explained

nop dec ri cmp ri,0 bnz inner nop dec ri cmp ri,0 bnz inner nop dec ri cmp ri,0 bnz inner nop dec ri cmp ri,0 bnz inner nop dec ri cmp ri,0 bnz inner nop dec ri cmp ri,0 bnz inner dec rn cmp rn,0 bnz outer dec rn cmp rn,0 bnz outer nop dec ri cmp ri,0 bnz inner nop dec ri cmp ri,0 bnz inner nop dec ri cmp ri,0 bnz inner nop dec ri cmp ri,0 bnz inner dec rn cmp rn,0 bnz outer dec rn cmp rn,0 bnz outer

Cost of the mispredict is greater than the cost of executing an extra inner loop Cos Cost of the t of the mispredict is mispredict is greater than the greater than the cost of cost of executing an executing an extra inner loop extra inner loop

n+1 n+1 iterations, iterations, takes takes T T cycles cycles n n iterations, iterations, takes takes >T >T cycles cycles

slide-17
SLIDE 17

RTAS 2003 Branch Prediction & WCET 17

Pentium III Time/Count

1,00 1,10 1,20 1,30 1,40 1,50 1,60 1,70 1,80 1,90 2,00 2,10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Two Two-

  • Level Dynamic, Local

Level Dynamic, Local

Wide variation in time per iteration Wide Wide variation in variation in time time per per iteration iteration Four iterations, length of history register, is a local minimum Four iterations, Four iterations, length of history length of history register, is a register, is a local local minimum minimum

slide-18
SLIDE 18

RTAS 2003 Branch Prediction & WCET 18

Two Two-

  • Level Dynamic, Global

Level Dynamic, Global

UltraSparc III Time

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Nice smooth curve, with a single bump Nice smooth Nice smooth curve, with a curve, with a single bump single bump But still inversions But still But still inversions inversions

Looks better, but no easier than the local two-level predictor Looks better, but Looks better, but no easier than the no easier than the local two local two-

  • level

level predictor predictor

Maybe inversions stop appearing Maybe Maybe inversions inversions stop stop appearing appearing Generalizing from early trend gives a too low execution time estimate Generalizing Generalizing from early trend from early trend gives a too low gives a too low execution time execution time estimate estimate

slide-19
SLIDE 19

RTAS 2003 Branch Prediction & WCET 19

Two Two-

  • Level Dynamic, Global

Level Dynamic, Global

UltraSparc III Time/Count

0,50 0,75 1,00 1,25 1,50 1,75 2,00 2,25 2,50 2,75 3,00 3,25 3,50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Drop here is effect of BTFN initial guess Drop here is Drop here is effect of BTFN effect of BTFN initial guess initial guess Bump after the history register wraps around Bump after the Bump after the history register history register wraps around wraps around This is the lowest point on the curve! This is the This is the lowest point on lowest point on the curve! the curve!

slide-20
SLIDE 20

RTAS 2003 Branch Prediction & WCET 20

Pentium 4 Pentium 4

Pentium 4 Time

0,00 1,00 2,00 3,00 4,00 5,00 6,00 7,00 8,00 9,00 10,00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Inversions keep occuring Inversions Inversions keep occuring keep occuring Very uneven curve, strangest machine in all experiments Very uneven Very uneven curve, strangest curve, strangest machine in all machine in all experiments experiments

Very difficult to predict the worst case Ver Very difficult to y difficult to predict the worst predict the worst case case

slide-21
SLIDE 21

RTAS 2003 Branch Prediction & WCET 21

Pentium 4 Pentium 4

Pentium 4 Time/Count

0,50 1,00 1,50 2,00 2,50 3,00 3,50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Very impressive average-case performance Very Very impressive impressive average average-

  • case

case performance performance Worse after 16-17 iterations! Worse after Worse after 16 16-

  • 17

17 iterations! iterations! BTFN assumption is false for 1 iteration BTFN assumption BTFN assumption is false for 1 is false for 1 iteration iteration

slide-22
SLIDE 22

RTAS 2003 Branch Prediction & WCET 22

Sneak peak Sneak peak at future at future work work

slide-23
SLIDE 23

RTAS 2003 Branch Prediction & WCET 23

Testing Size of Inner Loop Testing Size of Inner Loop

Pentium 4, total time

0,0 1000,0 2000,0 3000,0 4000,0 5000,0 6000,0 7000,0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 6 7 8 9 10 12 14 16

Curve gets smoother as size of inner loop increases Curve gets Curve gets smoother as smoother as size of inner size of inner loop increases loop increases Inversions disappear with sufficiently large loop body? Inversions Inversions disappear with disappear with sufficiently large sufficiently large loop body? loop body? Pack more NOPs into the inner loop Pack more Pack more NOPs into the NOPs into the inner loop inner loop

Larger loop bodies can hide BP effects Larger Larger loop loop bodies can hide bodies can hide BP effects BP effects

slide-24
SLIDE 24

RTAS 2003 Branch Prediction & WCET 24

Conclusions Conclusions

slide-25
SLIDE 25

RTAS 2003 Branch Prediction & WCET 25

Conclusions Conclusions

  • Static branch prediction

Static branch prediction

  • No problem for WCET estimation

No problem for WCET estimation

  • Dynamic branch prediction

Dynamic branch prediction

  • Inversions: more work = less time

Inversions: more work = less time

  • Execution time increases unevenly

Execution time increases unevenly

  • Not easy to find the worst case

Not easy to find the worst case

  • Overall: BP makes ETS harder

Overall: BP makes ETS harder

  • To minimize impact,

To minimize impact, maximize size of loop bodies maximize size of loop bodies

slide-26
SLIDE 26

RTAS 2003 Branch Prediction & WCET 26

The The End! End!

Questions? Questions?