PlaFRIM Exploration The Roofline model Performance Methodology - - PowerPoint PPT Presentation

plafrim
SMART_READER_LITE
LIVE PREVIEW

PlaFRIM Exploration The Roofline model Performance Methodology - - PowerPoint PPT Presentation

PlaFRIM Court` es L., Ru e F. Introduction General PlaFRIM Exploration The Roofline model Performance Methodology Court` es L., Ru e F. November 8, 2019 Table of contents PlaFRIM Court` es L., Ru e F. Introduction


slide-1
SLIDE 1

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

PlaFRIM

Court` es L., Ru´ e F. November 8, 2019

slide-2
SLIDE 2

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Table of contents

1

Introduction

2

General Exploration

3

The Roofline model

4

Performance Methodology

slide-3
SLIDE 3

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

The hard way

printf(”%i”,time(NULL));

slide-4
SLIDE 4

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

The hard way

printf(”%i”,time(NULL));

slide-5
SLIDE 5

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

The optimization objectives

Improve the speed of execution

slide-6
SLIDE 6

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

The optimization objectives

Improve the speed of execution Reduce memory footprint

slide-7
SLIDE 7

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

The optimization objectives

Improve the speed of execution Reduce memory footprint Reduce energy consumption

slide-8
SLIDE 8

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

The optimization objectives

Improve the speed of execution Reduce memory footprint Reduce energy consumption Consume fewer resources

slide-9
SLIDE 9

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

The Process

Identify bottlenecks (Profiling) Choose better algorithms or improve implementation (Optimization)

slide-10
SLIDE 10

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

How profilers do it

Call stack sampling

slide-11
SLIDE 11

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

How profilers do it

Call stack sampling Optional function call instrumentation

slide-12
SLIDE 12

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

How profilers do it

Call stack sampling Optional function call instrumentation Hardware simulation

slide-13
SLIDE 13

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

How profilers do it

Call stack sampling Optional function call instrumentation Hardware simulation Hardware counter

slide-14
SLIDE 14

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Memory

Understanding memory locality

slide-15
SLIDE 15

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

General Exploration

Optimization and granularity

slide-16
SLIDE 16

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

The easiest way

Time command Real, user & sys time Best way to evaluate scalability

slide-17
SLIDE 17

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

The easiest way

Time command Real, user & sys time Best way to evaluate scalability Accuracy of the evaluation?

slide-18
SLIDE 18

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Profiler

static instrumentation - gprof Sampling technique no instrumentation needed 2 types of view (flat profile and call graph)

slide-19
SLIDE 19

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Profiler

static instrumentation - gprof Sampling technique no instrumentation needed 2 types of view (flat profile and call graph) Annotated code

slide-20
SLIDE 20

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Profiler

static instrumentation - gprof use the -pg option to compile evaluate the output : gprof ’binary name’ gmon.out

slide-21
SLIDE 21

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Profiler

static instrumentation - gprof use the -pg option to compile evaluate the output : gprof ’binary name’ gmon.out

slide-22
SLIDE 22

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Profiler

static instrumentation - gprof gprof -A -l ’binary name’ gmon.out

slide-23
SLIDE 23

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Profiler

static instrumentation - gprof gprof -A -l ’binary name’ gmon.out

slide-24
SLIDE 24

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Profiler

and for memory usage ?

slide-25
SLIDE 25

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Profiler

Dynamic instrumentation - valgrind Done at execution time no instrumentation needed different tools for differents analysis

massif - heap profiler callgrind - call history among functions cachegrind - interactions with machine cache

slide-26
SLIDE 26

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Profiler

Dynamic instrumentation - valgrind valgrind –tool=massif –time-unit=ms ./bin/wave0 5 5 5 100 100 100 0.0005 50 ms print massif.out.%pid

slide-27
SLIDE 27

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Profiler

Dynamic instrumentation - valgrind valgrind –tool=massif –time-unit=ms ./bin/wave0 5 5 5 100 100 100 0.0005 50 ms print massif.out.%pid

slide-28
SLIDE 28

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Profiler

what kind of expertise ?

slide-29
SLIDE 29

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Profiler

what kind of image of your program do you need ?

slide-30
SLIDE 30

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Profiler

what kind of image of your program do you need ?

slide-31
SLIDE 31

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

The Roofline model

roofline

slide-32
SLIDE 32

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

The model

cache aware roofline model

Figure: IBM - ICSC 2014, Shanghai, China

slide-33
SLIDE 33

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

The model

Figure: PICSAR Project

slide-34
SLIDE 34

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

The model

Figure: Thomas Jefferson National Accelerator Facility

slide-35
SLIDE 35

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

The model

cache aware roofline model

Figure: IBM - ICSC 2014, Shanghai, China

slide-36
SLIDE 36

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

The model

How to construct this model ?

slide-37
SLIDE 37

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

The model

How to construct this model ? How to evaluate your Arithmetic Intensity ?

slide-38
SLIDE 38

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Roofline evaluation

...

slide-39
SLIDE 39

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Roofline evaluation

evaluate the performance you can achieve

slide-40
SLIDE 40

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Performance achievement Understanding memory locality

Figure: Memory Bound Figure: Compute Bound

slide-41
SLIDE 41

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

  • ne tool to do that ...
slide-42
SLIDE 42

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

Figure: The 3D stencil: its memory access pattern (a) and the data points it uses (b). - Raul de la Cruz, BSC

slide-43
SLIDE 43

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

Figure: Stencil 1 thread - roofline

slide-44
SLIDE 44

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

module load compiler/gcc/9.1.0 compiler/intel/2019 update4 intel/vtune-advisor advixe-cl -collect roofline –project-dir=wave0 –ignore-checksums ./bin/wave0 5 5 5 100 100 100 0.0005 500 advixe-gui

slide-45
SLIDE 45

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

module load compiler/gcc/9.1.0 compiler/intel/2019 update4 intel/vtune-advisor advixe-cl -collect roofline –project-dir=wave0 –ignore-checksums ./bin/wave0 5 5 5 100 100 100 0.0005 500 advixe-gui RTFM : the README file

slide-46
SLIDE 46

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

Figure: Stencil 1 thread - roofline

slide-47
SLIDE 47

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

Figure: Stencil 1 thread - memory access pattern

slide-48
SLIDE 48

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

Figure: Stencil 1 thread - inverse loop - roofline

slide-49
SLIDE 49

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

strides distribution - better performance

slide-50
SLIDE 50

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

strides distribution - better performance cache blocking technic?

slide-51
SLIDE 51

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

Figure: Stencil 1 thread - inverse loop & cache blocking - roofline

slide-52
SLIDE 52

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

OpenMP ?

slide-53
SLIDE 53

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

Figure: Stencil 20 threads - inverse loop & OpenMP - roofline

slide-54
SLIDE 54

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

Figure: Intel Methodology to achieve performance

slide-55
SLIDE 55

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

Figure: Intel Methodology to achieve performance

slide-56
SLIDE 56

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

and beyond ...

slide-57
SLIDE 57

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

With MPI - do it in 2 steps:

mpirun -np 1 advixe-cl -collect survey –project-dir=wave0 –ignore-checksums –no-auto-finalize ./bin/wave0 5 5 5 100 100 100 0.0005 500

slide-58
SLIDE 58

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

With MPI - do it in 2 steps:

mpirun -np 1 advixe-cl -collect survey –project-dir=wave0 –ignore-checksums –no-auto-finalize ./bin/wave0 5 5 5 100 100 100 0.0005 500 mpirun -np 1 advixe-cl –collect tripcounts –ignore-checksums –project-dir=wave0 –flop –no-trip-counts – ./bin/wave0 5 5 5 100 100 100 0.0005 500

slide-59
SLIDE 59

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

With MPI - do it in 2 steps:

mpirun -np 1 advixe-cl -collect survey –project-dir=wave0 –ignore-checksums –no-auto-finalize ./bin/wave0 5 5 5 100 100 100 0.0005 500 mpirun -np 1 advixe-cl –collect tripcounts –ignore-checksums –project-dir=wave0 –flop –no-trip-counts – ./bin/wave0 5 5 5 100 100 100 0.0005 500

  • ne trace per rank
slide-60
SLIDE 60

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

try it with hou10ni

slide-61
SLIDE 61

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

guix environment –pure maphys –ad-hoc maphys pastix starpu vim – /bin/bash –norc export PATH=$PATH:/cm/shared/modules/intel/ivybridge/parallel studio/2019 update4/advisor/bin64 mpirun -np 1 advixe-cl -collect survey –project-dir=Hou10ni –ignore-checksums –no-auto-finalize ./hou10ni lite.out ¡ param simple maphys.txt mpirun -np 1 advixe-cl –collect tripcounts –ignore-checksums –project-dir=Hou10ni –flop –no-trip-counts – ./hou10ni lite.out ¡ param simple maphys.txt

slide-62
SLIDE 62

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

Figure: hou10ni - profiling

slide-63
SLIDE 63

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

Figure: hou10ni - profiling

slide-64
SLIDE 64

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

Intel Advisor tool

Figure: hou10ni - profiling

slide-65
SLIDE 65

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

KEEP CALM

slide-66
SLIDE 66

PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology

KEEP CALM this is my LAST SLIDE