Measuring the performance improvements as you parallelize and - - PowerPoint PPT Presentation

▶

Sep 09, 2022 116 likes •220 views

Measuring the performance improvements as you parallelize and optimize your software 0.25 s -O2 -march=native -mtune=native -ftree-vectorize 0.27 s -O2 -march=native -mtune=native -ftree-vectorize Multiplication of 0.359 s 0.7x -O2

SLIDE 1

Measuring the performance improvements as you parallelize and optimize your software

SLIDE 2

O2 -march=native -mtune=native -ftree-vectorize

0.25 s

O2 -march=native -mtune=native -ftree-vectorize

0.27 s managing race conditions with... OpenMP pragma atomic

O2 -march=native -mtune=native -ftree-vectorize -fopenmp

0.359 s 0.7x

O2 -march=native -mtune=native -ftree-vectorize -fopenmp

0.3470 s 0.78x OpenMP privatization

f arrays
O2 -march=native -mtune=native -ftree-vectorize -fopenmp

0.1172 s 2.13x

O2 -march=native -mtune=native -ftree-vectorize -fopenmp

0.1177 s 2.29x

Multiplication of Transposed Sparse Matrix by Vector

SLIDE 3

Evaluate the performance of your serial, parallel and optimized code

○ ○ ○

Tuning and optimization:

○ ○ ○

SLIDE 4

Avoiding ‘too much parallel’

○ ○

Ideal Real Threads Processes Cores Speedup

SLIDE 5

Serial and parallel performance
Take regular parallel performance measurements as you progress

○ ○

Understand your performance limits

○ ○

Use Speedup and Efficiency measures

SLIDE 6

Measure the relative performance between serial and parallel code.
Improvement in speed of execution of a task executed on the same architecture but

with different resources Speedup, S, for problem size N on P processes/threads/cores

Tips:

○ ○ ○

T(N,1) T(N,P) S(N, P)

=

SLIDE 7

Measure the efficiency of the parallel code.
100% efficiency = using double the resources, but taking half the runtime (i.e. the

same resources are used in total) Parallel efficiency, E, for problem size N on P processes/threads/cores

S(N,P) P T(N,1) (P*T(N,P)) E(N, P) = =

SLIDE 8

We can never parallelize every single part of code (e.g. initialising and distributing the data).
A fraction of the runtime, α, is completely serial, limiting the parallel runtime even with 100%

efficiency of the parallel fraction on P processors/threads/cores.

Limited by the serial fraction:

○ α ○ α ■ α

For runtime T, using problem size N for P processes

Known as ‘Amdahl’s Law’

(1-α)T(N,1) P T= α T(N,1) +

SLIDE 9

Gene Amdahl, 1967

(1-α)=90% α=10%

α=50% α=25% α=10% α =5%

α =sequential

portion 1x 2x 3.7x 1 2 4 8 Processors Parallel Serial 1x 1.33x 1.6x 1.8x 1 2 4 8 Processors Parallel Serial

α=50% (1-α)=50%

Source: Wikipedia

SLIDE 10

Use the spreadsheet assigned to your team to record timings

○ ■ ■ ■ ○

Use this regularly, particularly once you start trying multiple parallelization

methods and tuning your implementations.

Try and gain an understanding of your serial fraction, α

Measuring the performance improvements as you parallelize and optimize your software

○ ○ ○

○ ○ ○

○ ○

Ideal Real Threads Processes Cores Speedup

○ ○

○ ○

Use Speedup and Efficiency measures

with different resources Speedup, S, for problem size N on P processes/threads/cores

○ ○ ○

T(N,1) T(N,P) S(N, P)

=

same resources are used in total) Parallel efficiency, E, for problem size N on P processes/threads/cores

S(N,P) P T(N,1) (P*T(N,P)) E(N, P) = =

efficiency of the parallel fraction on P processors/threads/cores.

○ α ○ α ■ α

For runtime T, using problem size N for P processes

Known as ‘Amdahl’s Law’

(1-α)T(N,1) P T= α T(N,1) +

α=50% α=25% α=10% α =5%

α =sequential

○ ■ ■ ■ ○

methods and tuning your implementations.

○ ○ α