Overview of Common Strategies for Paralleliza5on Ivan Giro8o - - PowerPoint PPT Presentation

overview of common strategies for paralleliza5on
SMART_READER_LITE
LIVE PREVIEW

Overview of Common Strategies for Paralleliza5on Ivan Giro8o - - PowerPoint PPT Presentation

Overview of Common Strategies for Paralleliza5on Ivan Giro8o igiro8o@ictp.it Interna?onal Centre for Theore?cal Physics (ICTP) Ivan Giro*o - igiro*o@ictp.it Overview of Common Strategies for Paralleliza5on 1 Cinvestav Abacus, 16 Feb 2018


slide-1
SLIDE 1

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 1

Overview of Common Strategies for Paralleliza5on

Ivan Giro8o – igiro8o@ictp.it

Interna?onal Centre for Theore?cal Physics (ICTP)

slide-2
SLIDE 2

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 2

Serial Programming

CPU Memory Program Data Load/Store

A problem is broken into a discrete series of instructions. Instructions are executed one after another. Only one instruction may execute at any moment in time.

slide-3
SLIDE 3

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 3

Parallel Programming

CPU Memory CPU Memory communica?on

slide-4
SLIDE 4

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 4

  • A problem is broken into discrete parts that can be solved concurrently
  • Each part is further broken down to a series of instruc?ons
  • Instruc?ons from each part execute simultaneously on different processors
  • An overall control / coordina?on mechanism is employed

The first step in developing a parallel algorithm is to decompose the problem into tasks that can be executed concurrently

Concurrency

slide-5
SLIDE 5

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 5

What is a Parallel Program

init init

Read and Distribute Data Read and Distribute Data

terminate terminate

1

Compute on Sub Domain A

comm.

Compute on Sub Domain B Reduce data update Sub Domain Reduce data update Sub Domain

comm.

slide-6
SLIDE 6

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 6

Fundamental Steps of Parallel Design

  • Iden?fy por?ons of the work that can be performed

concurrently

  • Mapping the concurrent pieces of work onto mul?ple

processes running in parallel

  • Distribu?ng the input, output and intermediate data

associated within the program

  • Managing accesses to data shared by mul?ple processors
  • Synchronizing the processors at various stages of the

parallel program execu?on

slide-7
SLIDE 7

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 7

Type of Parallelism

  • Func5onal (or task) parallelism:

different people are performing different task at the same ?me

  • Data Parallelism:

different people are performing the same task, but on different equivalent and independent objects

slide-8
SLIDE 8

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 8

Process Interac?ons

  • The effec?ve speed-up obtained by the paralleliza?on depend by the

amount of overhead we introduce making the algorithm parallel

  • There are mainly two key sources of overhead:
  • 1. Time spent in inter-process interac?ons (communica5on)
  • 2. Time some process may spent being idle (synchroniza5on)
slide-9
SLIDE 9

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 9

all here?

Barrier and Synchroniza?on

slide-10
SLIDE 10

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 10

Limita?ons of Parallel Compu?ng

  • Frac?on of serial code limits parallel speedup
  • Degree to which tasks/data can be subdivided

is limit to concurrency and parallel execu?on

  • Load imbalance:
  • parallel tasks have a different amount of work
  • CPUs are par?ally idle
  • redistribu?ng work helps but has limita?ons
  • communica?on and synchroniza?on overhead
slide-11
SLIDE 11

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 11

Shared Resources

  • In parallel programming, developers must manage

exclusive access to shared resources

  • Resources are in different forms:

– concurrent read/write (including parallel write) to shared memory loca?ons – concurrent read/write (including parallel write) to shared devices – a message that must be send and received

slide-12
SLIDE 12

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 12

Thread 1 Thread 2

load a

Program Private data Shared data 10 10 10 11 11 11 11

add a 1 store a load a add a 1 store a

slide-13
SLIDE 13

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 13

Parallelism - 101

  • there are two main reasons to write a parallel

program:

  • access to larger amount of memory (aggregated, going bigger)
  • reduce ?me to solu?on (going faster)
slide-14
SLIDE 14

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 14

Scalable Programming

slide-15
SLIDE 15

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 15 NETWORK

slide-16
SLIDE 16

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 16

Granularity

  • Granularity is determined by the decomposi?on level

(number of task) on which we want divide the problem

  • The degree to which task/data can be subdivided is limit to

concurrency and parallel execu?on

  • Paralleliza?on has to become “topology aware”

§ coarse grain and fine grained paralleliza?on has to be mapped to the topology to reduce memory and I/O conten?on § make your code modularized to enhance different levels of granularity and consequently to become more “plaeorm adaptable”

slide-17
SLIDE 17

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 17

Sta?c Data Par??oning

The simplest data decomposi5on schemes for dense matrices are 1-D block distribu5on schemes.

slide-18
SLIDE 18

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 18

Block Array Distribu?on Schemes

Block distribu5on schemes can be generalized to higher dimensions as well.

Degree to which tasks/data can be subdivided is limit to concurrency and parallel execu5on!!

slide-19
SLIDE 19

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 19

1D Distribu?on of a 3D domain

slide-20
SLIDE 20

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 20

Distributed Data Vs Replicated Data

  • Replicated data distribu?on is useful if it helps to

reduce the communica?on among process at the cost of bounding scalability

  • Distributed data is the ideal data distribu?on but

not always applicable for all data-sets

  • Usually complex applica?on are a mix of those

techniques => distribute large data sets; replicate small data

slide-21
SLIDE 21

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 21

Global Vs Local Indexes

  • In sequen?al code you always refer to global indexes
  • With distributed data you must handle the dis?nc?on

between global and local indexes (and possibly implemen?ng u?li?es for transparent conversion) 1 2 3 1 2 3 1 2 3 1 2 3 4 5 6 7 8 9

Local Idx Global Idx

slide-22
SLIDE 22

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 22

Collaterals to Domain Decomposi?on /1

Are all the domain’s dimensions always mul5ple of the number

  • f tasks/processes we are

willing to use?

slide-23
SLIDE 23

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 23

Again on Domain Decomposi?on

slide-24
SLIDE 24

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 24

P0

slide-25
SLIDE 25

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 25

P0 (root) P1 P2 P3 call MPI_BCAST( ... )

slide-26
SLIDE 26

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 26

P0 P1 P2 P3 call evolve( d]act )

slide-27
SLIDE 27

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 27

P0 (root) P1 P2 P3 call MPI_Gather( ..., ..., ... )

slide-28
SLIDE 28

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 28

Replicated data

  • Compute domain (and workload) distribu?on

among processes

  • Master-slaves: P0 drives all processes
  • Large amount of data communica?on

– at each step P0 distribute data to all processes and collect the contribu?on of each process

  • Problem size scaling limited in memory

capacity

slide-29
SLIDE 29

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 29

Collaterals to Domain Decomposi?on /2

slide-30
SLIDE 30

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 30

P0 P1 P2 P3

The Transport Code - Parallel Version

call evolve( d]act )

slide-31
SLIDE 31

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 31

P0 P1 P2 P3

Data exchange among processes

slide-32
SLIDE 32

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 32

P0 P1 P2 P3

proc_up = mod(proc_me + 1 , nprocs) proc_down = mod(proc_me - 1 + nprocs , nprocs)

slide-33
SLIDE 33

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 33

Sendrecv

slide-34
SLIDE 34

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 34

  • Global and Local Indexes
  • Ghost Cells Exchange Between Processes

– Compute Neighbor Processes

  • Parallel Output

Distributed Data

slide-35
SLIDE 35

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 35

A B C

slide-36
SLIDE 36

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 36

x = P0 P1 P2 P3

At every step all the processes receive a block of columns of the Matrix B

slide-37
SLIDE 37

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 37

slide-38
SLIDE 38

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 38

MPI Allgather

38

int MPI_Allgather(const void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm)

sendbuf star?ng address of send buffer (choice) sendcount number of elements in send buffer (integer) sendtype data type of send buffer elements (handle) recvcount number of elements received from any process (integer) recvtype data type of receive buffer elements (handle) comm communicator (handle)

slide-39
SLIDE 39

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 39

Master/Slave

Master W1 W1 W2 W3 W2 W3 W4 W4

slide-40
SLIDE 40

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 40

Task Farming

  • Many independent programs (tasks) running at once

– each task can be serial or parallel – “independent” means they don’t communicate directly – Processes possibly driven by the mpirun framework

[igirotto@localhost]$ more my_shell_wrapper.sh #!/bin/bash #example for the OpenMPI implementation ./prog.x --input input_${OMPI_COMM_WORLD_RANK}.dat [igirotto@localhost]$ mpirun -np 400 ./my_shell_wrapper.sh

slide-41
SLIDE 41

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 41

Easy Parallel Compu?ng

  • Farming, embarrassingly parallel

– Execu?ng mul?ple instances on the same program with different inputs/ini?al cond. – Reading large binary files by splikng the workload among processes – Searching elements on large data-sets – Other parallel execu?on of embarrassingly parallel problem (no communica?on among tasks)

  • Ensemble simula?ons (weather forecast)
  • Parameter space (find the best wing shape)
slide-42
SLIDE 42

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 42

Parallel I/O File System

P0

I/O Bandwidth

P1 P2 P3 P4

slide-43
SLIDE 43

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 43

Parallel I/O

File System

P0

I/O Bandwidth

P1 P2 P3

File System

I/O Bandwidth

File System

I/O Bandwidth

File System

I/O Bandwidth

slide-44
SLIDE 44

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 44

Parallel I/O

P0 P1 P2 P3

I/O I/O I/O I/O

Parallel File System

MPI I/O & Parallel I/O Libraries (Hdf5, Netcdf, etc…)

slide-45
SLIDE 45

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 45

Make Use Freely Available Parallel Libraries

  • Scalable Parallel Random Number Generators

Library (SPRNG)

  • Parallel Linear Algebra (ScaLAPACK)
  • Parallel Library for Solu?on of Finite Elements

(dealii)

  • Parallel Library for FFT (FFTW)
  • Parallel Linear Solver for Sparce Matrices (PETSc)
slide-46
SLIDE 46

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 46

Programming Parallel Paradigms

  • Are the tools we use to express the parallelism for on

a given architecture (see also SPMD, SIMD, etc... )

  • They differ in how programmers can manage and

define key features like:

– parallel regions – concurrency – process communica?on – synchronism

slide-47
SLIDE 47

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 47

Fundamental Tools of Parallel Programming

slide-48
SLIDE 48

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 48

Phases of an MPI Program

  • 1. Startup

– Parse arguments (mpirun may add some!) – Iden?fy parallel environment and rank of process – Read and distribute all data

  • 2. Execu?on

– Proceed to subrou?ne with parallel work (can be same of different for all parallel tasks)

  • 3. Cleanup

CAUTION: this sequence may be run only once

slide-49
SLIDE 49

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 49

program bcast implicit none include "mpif.h" integer :: myrank, ncpus, imesg, ierr integer, parameter :: comm = MPI_COMM_WORLD call MPI_INIT(ierr) call MPI_COMM_RANK(comm, myrank, ierr) call MPI_COMM_SIZE(comm, ncpus, ierr) imesg = myrank print *, "Before Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_BCAST(imesg, 1, MPI_INTEGER, 0, comm, ierr) print *, "Ater Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_FINALIZE(ierr) end program bcast

slide-50
SLIDE 50

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 50

program bcast implicit none include "mpif.h" integer :: myrank, ncpus, imesg, ierr integer, parameter :: comm = MPI_COMM_WORLD call MPI_INIT(ierr) call MPI_COMM_RANK(comm, myrank, ierr) call MPI_COMM_SIZE(comm, ncpus, ierr) imesg = myrank print *, "Before Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_BCAST(imesg, 1, MPI_INTEGER, 0, comm, ierr) print *, "Ater Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_FINALIZE(ierr) end program bcast

P0 P1 P2 P3

myrank = ?? ncpus = ?? imesg = ?? ierr = ?? comm = MPI_C... myrank = ?? ncpus = ?? imesg = ?? ierr = ?? comm = MPI_C... myrank = ?? ncpus = ?? imesg = ?? ierr = ?? comm = MPI_C... myrank = ?? ncpus = ?? imesg = ?? ierr = ?? comm = MPI_C...

slide-51
SLIDE 51

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 51

program bcast implicit none include "mpif.h" integer :: myrank, ncpus, imesg, ierr integer, parameter :: comm = MPI_COMM_WORLD call MPI_INIT(ierr) call MPI_COMM_RANK(comm, myrank, ierr) call MPI_COMM_SIZE(comm, ncpus, ierr) imesg = myrank print *, "Before Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_BCAST(imesg, 1, MPI_INTEGER, 0, comm, ierr) print *, "Ater Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_FINALIZE(ierr) end program bcast

P0 P1 P2 P3

myrank = ?? ncpus = ?? imesg = ?? ierr = MPI_SUC... comm = MPI_C... myrank = ?? ncpus = ?? imesg = ?? ierr = MPI_SUC... comm = MPI_C... myrank = ?? ncpus = ?? imesg = ?? ierr = MPI_SUC... comm = MPI_C... myrank = ?? ncpus = ?? imesg = ?? ierr = MPI_SUC... comm = MPI_C...

slide-52
SLIDE 52

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 52

program bcast implicit none include "mpif.h" integer :: myrank, ncpus, imesg, ierr integer, parameter :: comm = MPI_COMM_WORLD call MPI_INIT(ierr) call MPI_COMM_SIZE(comm, ncpus, ierr) call MPI_COMM_RANK(comm, myrank, ierr)

P0 P1 P2 P3

myrank = ?? ncpus = 4 imesg = ?? ierr = MPI_SUC... comm = MPI_C... myrank = ?? ncpus = 4 imesg = ?? ierr = MPI_SUC... comm = MPI_C... myrank = ?? ncpus = 4 imesg = ?? ierr = MPI_SUC... comm = MPI_C... myrank = ?? ncpus = 4 imesg = ?? ierr = MPI_SUC... comm = MPI_C...

slide-53
SLIDE 53

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 53

program bcast implicit none include "mpif.h" integer :: myrank, ncpus, imesg, ierr integer, parameter :: comm = MPI_COMM_WORLD call MPI_INIT(ierr) call MPI_COMM_SIZE(comm, ncpus, ierr) call MPI_COMM_RANK(comm, myrank, ierr)

P0 P1 P2 P3

myrank = 0 ncpus = 4 imesg = ?? ierr = MPI_SUC... comm = MPI_C... myrank = 1 ncpus = 4 imesg = ?? ierr = MPI_SUC... comm = MPI_C... myrank = 2 ncpus = 4 imesg = ?? ierr = MPI_SUC... comm = MPI_C... myrank = 3 ncpus = 4 imesg = ?? ierr = MPI_SUC... comm = MPI_C...

slide-54
SLIDE 54

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 54

program bcast implicit none include "mpif.h" integer :: myrank, ncpus, imesg, ierr integer, parameter :: comm = MPI_COMM_WORLD call MPI_INIT(ierr) call MPI_COMM_RANK(comm, myrank, ierr) call MPI_COMM_SIZE(comm, ncpus, ierr) imesg = myrank print *, "Before Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_BCAST(imesg, 1, MPI_INTEGER, 0, comm, ierr) print *, "Ater Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_FINALIZE(ierr) end program bcast

P0 P1 P2 P3

myrank = 0 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 1 ncpus = 4 imesg = 1 ierr = MPI_SUC... comm = MPI_C... myrank = 2 ncpus = 4 imesg = 2 ierr = MPI_SUC... comm = MPI_C... myrank = 3 ncpus = 4 imesg = 3 ierr = MPI_SUC... comm = MPI_C...

slide-55
SLIDE 55

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 55

program bcast implicit none include "mpif.h" integer :: myrank, ncpus, imesg, ierr integer, parameter :: comm = MPI_COMM_WORLD call MPI_INIT(ierr) call MPI_COMM_RANK(comm, myrank, ierr) call MPI_COMM_SIZE(comm, ncpus, ierr) imesg = myrank print *, "Before Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_BCAST(imesg, 1, MPI_INTEGER, 0, comm, ierr) print *, "Ater Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_FINALIZE(ierr) end program bcast

P0 P1 P2 P3

myrank = 0 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 1 ncpus = 4 imesg = 1 ierr = MPI_SUC... comm = MPI_C... myrank = 2 ncpus = 4 imesg = 2 ierr = MPI_SUC... comm = MPI_C... myrank = 3 ncpus = 4 imesg = 3 ierr = MPI_SUC... comm = MPI_C...

slide-56
SLIDE 56

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 56

P0 P1 P2 P3

myrank = 0 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 1 ncpus = 4 imesg = 1 ierr = MPI_SUC... comm = MPI_C... myrank = 2 ncpus = 4 imesg = 2 ierr = MPI_SUC... comm = MPI_C... myrank = 3 ncpus = 4 imesg = 3 ierr = MPI_SUC... comm = MPI_C...

call MPI_BCAST( imesg, 1, MPI_INTEGER, 0, comm, ierr )

slide-57
SLIDE 57

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 57

P0 P1 P2 P3

myrank = 0 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 1 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 2 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 3 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C...

call MPI_BCAST( imesg, 1, MPI_INTEGER, 0, comm, ierr )

slide-58
SLIDE 58

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 58

program bcast implicit none include "mpif.h" integer :: myrank, ncpus, imesg, ierr integer, parameter :: comm = MPI_COMM_WORLD call MPI_INIT(ierr) call MPI_COMM_RANK(comm, myrank, ierr) call MPI_COMM_SIZE(comm, ncpus, ierr) imesg = myrank print *, "Before Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_BCAST(imesg, 1, MPI_INTEGER, 0, comm, ierr) print *, "Ater Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_FINALIZE(ierr) end program bcast

P0 P1 P2 P3

myrank = 0 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 1 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 2 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 3 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C...

slide-59
SLIDE 59

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 59

program bcast implicit none include "mpif.h" integer :: myrank, ncpus, imesg, ierr integer, parameter :: comm = MPI_COMM_WORLD call MPI_INIT(ierr) call MPI_COMM_RANK(comm, myrank, ierr) call MPI_COMM_SIZE(comm, ncpus, ierr) imesg = myrank print *, "Before Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_BCAST(imesg, 1, MPI_INTEGER, 0, comm, ierr) print *, "Ater Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_FINALIZE(ierr) end program bcast

P0 P1 P2 P3

myrank = 0 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 1 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 2 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 3 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C...

slide-60
SLIDE 60

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 60

program bcast implicit none include "mpif.h" integer :: myrank, ncpus, imesg, ierr integer, parameter :: comm = MPI_COMM_WORLD call MPI_INIT(ierr) call MPI_COMM_RANK(comm, myrank, ierr) call MPI_COMM_SIZE(comm, ncpus, ierr) imesg = myrank print *, "Before Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_BCAST(imesg, 1, MPI_INTEGER, 0, comm, ierr) print *, "Ater Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_FINALIZE(ierr) end program bcast

P0 P1 P2 P3

myrank = 0 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 1 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 2 ncpus = 4 imesg = 0 ierr = MPI_SUCC comm = MPI_C... myrank = 3 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C...

slide-61
SLIDE 61

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 61

MPI: Domain partition OpenMP: Node Level shared mem CUDA/OpenCL/OpenAcc: floating point accelerators Python: Ensemble simulations, workflows Workload Management: system level, High-throughput

slide-62
SLIDE 62

Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 62