Modeling Critical Sections in Amdahls Law and its Implications for - - PowerPoint PPT Presentation

modeling critical sections in amdahl s law and its
SMART_READER_LITE
LIVE PREVIEW

Modeling Critical Sections in Amdahls Law and its Implications for - - PowerPoint PPT Presentation

Modeling Critical Sections in Amdahls Law and its Implications for Multicore Design Stijn Eyerman and Lieven Eeckhout Ghent University, Belgium ISCA, Saint-Malo, France June 23, 2010 Amdahls Law Speedup by parallelizing fraction f


slide-1
SLIDE 1

Modeling Critical Sections in Amdahl’s Law and its Implications for Multicore Design

Stijn Eyerman and Lieven Eeckhout

Ghent University, Belgium

ISCA, Saint-Malo, France June 23, 2010

slide-2
SLIDE 2

Amdahl’s Law

  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

2

Speedup by parallelizing fraction f across n processors: Parallel performance is bounded by sequential part: S = 1 (1− f ) + f n

lim

n→∞S =

1 1− f

slide-3
SLIDE 3

Amdahl’s software model

  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

3

fseq

f par =1− fseq

Can we model critical sections in Amdahl’s Law?

sequential fraction: parallel fraction:

slide-4
SLIDE 4

Extending Amdahl’s software model

  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

4

fseq + f par,cs + f par,ncs =1

sequential part parallel part outside critical sections parallel part inside critical sections

P

ctn = probability for two critical sections to contend

slide-5
SLIDE 5

Extending Amdahl’s software model

  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

5

Assumptions

Each thread is executed equal share

  • f the critical sections

Critical sections are entered at random times Critical sections contend randomly

slide-6
SLIDE 6

Compute parallel speedup in the presence of critical sections?

Case #1: Low contention: all threads execute equally long total exec time ≅ avg per-thread exec time Case #2: High contention total exec time ≅ avg exec time slowest thread

  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

6

slide-7
SLIDE 7

Case #1

  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

7

Each thread executes a fraction of critical sections f par,cs n

= f par,cs n = ( j +1) f par,cs n = Pr[contend with j threads]⋅ ( j +1) f par,cs n

j= 0 n−1

If contention with j threads: exec time Avg time spent in critical section: If no contention: exec time

slide-8
SLIDE 8
  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

8

Pr[contend with j threads]⋅ j +1

( )

f par,cs n

j= 0 n−1

= Pr[i of n −1other threads in critical sections

i= 0 n−1

]⋅ Pr[ j of i critical sections

j= 0 i

contend]⋅ j +1

( )

f par,cs n = n −1 i       P

cs i 1− P cs

( )

n−1−i i= 0 n−1

⋅ i j       P

ctn j 1− P ctn

( )

i− j ⋅ j= 0 i

j +1

( )

f par,cs n P

cs =

f par,cs f par,cs + f par,ncs

with

slide-9
SLIDE 9
  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

9

n −1 i       P

cs i 1− P cs

( )

n−1−i i= 0 n−1

⋅ i j       P

ctn j 1− P ctn

( )

i− j ⋅ j= 0 i

j +1

( )

f par,cs n

= f par,cs ⋅ P

csP ctn + 1− P csP ctn

n      

sequential part parallel part

Avg time spent in critical section =

slide-10
SLIDE 10

Back to Amdahl’s Law

  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

10

S = 1 fseq + f par,cs ⋅ P

csP ctn + f par,cs ⋅ 1− P csP ctn

( ) + f par,ncs

n

Impact of critical sections can be modeled as a sequential plus a parallel part

slide-11
SLIDE 11

Case #2

  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

11

Exec time determined by chain

  • f contending critical sections

Approx total exec time as the avg exec time of slowest thread

slide-12
SLIDE 12

Avg exec time of slowest thread

  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

12

Length of chain of contending critical sections

= fseq + f par,csP

ctn

= f par,csP

ctn

= fseq + f par,csP

ctn + f par,cs 1− P ctn

( ) + f par,ncs

n = fseq + f par,csP

ctn + f par,cs 1− P ctn

( ) + f par,ncs

2⋅ n

Minimum execution time Maximum execution time Average execution time

slide-13
SLIDE 13

Putting it together & validation

Q: Total exec time for parallel workload? A: Max (case #1, case #2)

  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

13

0.2 0.4 0.6 0.8 1 1.2 2 4 6 8 10

normalized exec time number of threads formula 1 formula 2 synthetic simulation case #1 case #2 synthetic simulation

f par,cs = 0.5, f par,ncs = 0.5,P

ctn = 0.5

Avg error of 3% compared to synthetic simulation

slide-14
SLIDE 14

Theoretical result:

  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

14

Parallel performance is fundamentally limited by critical sections

lim

n→∞S =

1 fseq + f par,cs ⋅ P

ctn

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 2000 4000 6000 8000 10000 0.01 0.03 0.05 0.07 0.09

f par,cs P

ctn

S fseq = 0

slide-15
SLIDE 15

What are the implications for multicore design?

  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

15

slide-16
SLIDE 16

Amdahl’s Law suggests wimpy small cores in asymmetric multicore

  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

16

S = 1 1− f p + f n + p

[M. Hill and M. Marty, IEEE Computer, 2008]

linear speedup w/ increasing

  • no. small cores

sublinear speedup in single- thread performance (Pollack’s law)

slide-17
SLIDE 17

Critical sections have big impact on asymmetric multicore performance

  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

17

lim

n→∞S =

1 fseq p + f par,cs ⋅ P

ctn

sequential part is executed on big core sequential part due to critical sections is executed on small cores

slide-18
SLIDE 18

Implication: small cores in asymmetric multicore should not be wimpy but middle-of-the-road

  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

18

256 BCEs (base core equivalents) – Hill & Marty

Intuition: small cores should be sufficiently large to execute critical sections quickly

slide-19
SLIDE 19

Asymmetric vs symmetric multicores

  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

19

slide-20
SLIDE 20

Accelerating Critical Sections (ACS)

  • Execute critical sections on big core
  • Naive ACS

– Accelerate all critical sections

  • Perfect ACS

– Accelerate contending critical sections only

  • Selective ACS

– Predict whether critical sections will contend – mitigate false serialization

  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

20

by Suleman et al. [ASPLOS’09]

slide-21
SLIDE 21

Evaluating ACS

  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

21

slide-22
SLIDE 22

Conclusions

  • Model impact of critical sections in Amdahl’s Law
  • Theoretical result

– Parallel performance is fundamentally limited by critical sections

  • Implications for multicore design

– Small cores in asymmetric multicore should not be wimpy but middle-of-the-road – Symmetric multicores may yield better performance than asymmetric multicores (w/ wimpy small cores) – Accelerating critical sections is a promising idea

  • ACS, DVFS, SMT, scalable cores
  • Longue Vie à la Microarchitecture!
  • S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

22

slide-23
SLIDE 23

Modeling Critical Sections in Amdahl’s Law and its Implications for Multicore Design

Stijn Eyerman and Lieven Eeckhout

Ghent University, Belgium

Thank you !