[PPT] - Modeling Critical Sections in Amdahls Law and its Implications for PowerPoint Presentation

SLIDE 1

Modeling Critical Sections in Amdahl’s Law and its Implications for Multicore Design

Stijn Eyerman and Lieven Eeckhout

Ghent University, Belgium

ISCA, Saint-Malo, France June 23, 2010

SLIDE 2

Amdahl’s Law

S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

2

Speedup by parallelizing fraction f across n processors: Parallel performance is bounded by sequential part: S = 1 (1− f ) + f n

lim

n→∞S =

1 1− f

SLIDE 3

Amdahl’s software model

S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

3

fseq

f par =1− fseq

Can we model critical sections in Amdahl’s Law?

sequential fraction: parallel fraction:

SLIDE 4

Extending Amdahl’s software model

S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

4

fseq + f par,cs + f par,ncs =1

sequential part parallel part outside critical sections parallel part inside critical sections

P

ctn = probability for two critical sections to contend

SLIDE 5

Extending Amdahl’s software model

S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

5

Assumptions

Each thread is executed equal share

f the critical sections

Critical sections are entered at random times Critical sections contend randomly

SLIDE 6

Compute parallel speedup in the presence of critical sections?

Case #1: Low contention: all threads execute equally long total exec time ≅ avg per-thread exec time Case #2: High contention total exec time ≅ avg exec time slowest thread

S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

6

SLIDE 7

Case #1

S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

7

Each thread executes a fraction of critical sections f par,cs n

= f par,cs n = ( j +1) f par,cs n = Pr[contend with j threads]⋅ ( j +1) f par,cs n

j= 0 n−1

∑

If contention with j threads: exec time Avg time spent in critical section: If no contention: exec time

SLIDE 8

S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

8

Pr[contend with j threads]⋅ j +1

( )

f par,cs n

j= 0 n−1

∑

S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

9

n −1 i       P

cs i 1− P cs

( )

n−1−i i= 0 n−1

∑

⋅ i j       P

ctn j 1− P ctn

( )

i− j ⋅ j= 0 i

∑

j +1

( )

f par,cs n

= f par,cs ⋅ P

csP ctn + 1− P csP ctn

n      

sequential part parallel part

Avg time spent in critical section =

SLIDE 10

Back to Amdahl’s Law

S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

10

S = 1 fseq + f par,cs ⋅ P

csP ctn + f par,cs ⋅ 1− P csP ctn

( ) + f par,ncs

n

Impact of critical sections can be modeled as a sequential plus a parallel part

SLIDE 11

Case #2

S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

11

Exec time determined by chain

f contending critical sections

Approx total exec time as the avg exec time of slowest thread

SLIDE 12

Avg exec time of slowest thread

S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

12

Length of chain of contending critical sections

= fseq + f par,csP

ctn

= f par,csP

ctn

= fseq + f par,csP

ctn + f par,cs 1− P ctn

( ) + f par,ncs

n = fseq + f par,csP

ctn + f par,cs 1− P ctn

( ) + f par,ncs

2⋅ n

Minimum execution time Maximum execution time Average execution time

SLIDE 13

Putting it together & validation

Q: Total exec time for parallel workload? A: Max (case #1, case #2)

S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

13

0.2 0.4 0.6 0.8 1 1.2 2 4 6 8 10

normalized exec time number of threads formula 1 formula 2 synthetic simulation case #1 case #2 synthetic simulation

f par,cs = 0.5, f par,ncs = 0.5,P

ctn = 0.5

Avg error of 3% compared to synthetic simulation

SLIDE 14

Theoretical result:

S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

14

Parallel performance is fundamentally limited by critical sections

lim

n→∞S =

1 fseq + f par,cs ⋅ P

ctn

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 2000 4000 6000 8000 10000 0.01 0.03 0.05 0.07 0.09

f par,cs P

ctn

S fseq = 0

SLIDE 15

What are the implications for multicore design?

S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

15

SLIDE 16

Amdahl’s Law suggests wimpy small cores in asymmetric multicore

S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

16

S = 1 1− f p + f n + p

[M. Hill and M. Marty, IEEE Computer, 2008]

linear speedup w/ increasing

no. small cores

sublinear speedup in single- thread performance (Pollack’s law)

SLIDE 17

Critical sections have big impact on asymmetric multicore performance

S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

17

lim

n→∞S =

1 fseq p + f par,cs ⋅ P

ctn

sequential part is executed on big core sequential part due to critical sections is executed on small cores

SLIDE 18

Implication: small cores in asymmetric multicore should not be wimpy but middle-of-the-road

S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

18

256 BCEs (base core equivalents) – Hill & Marty

Intuition: small cores should be sufficiently large to execute critical sections quickly

SLIDE 19

Asymmetric vs symmetric multicores

S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

19

SLIDE 20

Accelerating Critical Sections (ACS)

Execute critical sections on big core
Naive ACS

– Accelerate all critical sections

Perfect ACS

– Accelerate contending critical sections only

Selective ACS

– Predict whether critical sections will contend – mitigate false serialization

S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

20

by Suleman et al. [ASPLOS’09]

SLIDE 21

Evaluating ACS

S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

21

SLIDE 22

Conclusions

Model impact of critical sections in Amdahl’s Law
Theoretical result

– Parallel performance is fundamentally limited by critical sections

Implications for multicore design

– Small cores in asymmetric multicore should not be wimpy but middle-of-the-road – Symmetric multicores may yield better performance than asymmetric multicores (w/ wimpy small cores) – Accelerating critical sections is a promising idea

ACS, DVFS, SMT, scalable cores
Longue Vie à la Microarchitecture!
S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010

22

SLIDE 23

Modeling Critical Sections in Amdahl’s Law and its Implications for Multicore Design

Stijn Eyerman and Lieven Eeckhout

Ghent University, Belgium