Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 1
Overview of Common Strategies for Paralleliza5on
Ivan Giro8o – igiro8o@ictp.it
Interna?onal Centre for Theore?cal Physics (ICTP)
Overview of Common Strategies for Paralleliza5on Ivan Giro8o - - PowerPoint PPT Presentation
Overview of Common Strategies for Paralleliza5on Ivan Giro8o igiro8o@ictp.it Interna?onal Centre for Theore?cal Physics (ICTP) Ivan Giro*o - igiro*o@ictp.it Overview of Common Strategies for Paralleliza5on 1 Cinvestav Abacus, 16 Feb 2018
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 1
Interna?onal Centre for Theore?cal Physics (ICTP)
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 2
CPU Memory Program Data Load/Store
A problem is broken into a discrete series of instructions. Instructions are executed one after another. Only one instruction may execute at any moment in time.
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 3
CPU Memory CPU Memory communica?on
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 4
The first step in developing a parallel algorithm is to decompose the problem into tasks that can be executed concurrently
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 5
init init
Read and Distribute Data Read and Distribute Data
terminate terminate
Compute on Sub Domain A
comm.
Compute on Sub Domain B Reduce data update Sub Domain Reduce data update Sub Domain
comm.
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 6
concurrently
processes running in parallel
associated within the program
parallel program execu?on
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 7
different people are performing different task at the same ?me
different people are performing the same task, but on different equivalent and independent objects
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 8
amount of overhead we introduce making the algorithm parallel
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 9
all here?
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 10
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 11
– concurrent read/write (including parallel write) to shared memory loca?ons – concurrent read/write (including parallel write) to shared devices – a message that must be send and received
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 12
load a
add a 1 store a load a add a 1 store a
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 13
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 14
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 15 NETWORK
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 16
(number of task) on which we want divide the problem
concurrency and parallel execu?on
§ coarse grain and fine grained paralleliza?on has to be mapped to the topology to reduce memory and I/O conten?on § make your code modularized to enhance different levels of granularity and consequently to become more “plaeorm adaptable”
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 17
The simplest data decomposi5on schemes for dense matrices are 1-D block distribu5on schemes.
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 18
Block distribu5on schemes can be generalized to higher dimensions as well.
Degree to which tasks/data can be subdivided is limit to concurrency and parallel execu5on!!
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 19
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 20
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 21
Local Idx Global Idx
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 22
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 23
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 24
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 25
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 26
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 27
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 28
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 29
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 30
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 31
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 32
proc_up = mod(proc_me + 1 , nprocs) proc_down = mod(proc_me - 1 + nprocs , nprocs)
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 33
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 34
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 35
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 36
x = P0 P1 P2 P3
At every step all the processes receive a block of columns of the Matrix B
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 37
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 38
38
int MPI_Allgather(const void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm)
sendbuf star?ng address of send buffer (choice) sendcount number of elements in send buffer (integer) sendtype data type of send buffer elements (handle) recvcount number of elements received from any process (integer) recvtype data type of receive buffer elements (handle) comm communicator (handle)
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 39
Master W1 W1 W2 W3 W2 W3 W4 W4
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 40
– each task can be serial or parallel – “independent” means they don’t communicate directly – Processes possibly driven by the mpirun framework
[igirotto@localhost]$ more my_shell_wrapper.sh #!/bin/bash #example for the OpenMPI implementation ./prog.x --input input_${OMPI_COMM_WORLD_RANK}.dat [igirotto@localhost]$ mpirun -np 400 ./my_shell_wrapper.sh
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 41
– Execu?ng mul?ple instances on the same program with different inputs/ini?al cond. – Reading large binary files by splikng the workload among processes – Searching elements on large data-sets – Other parallel execu?on of embarrassingly parallel problem (no communica?on among tasks)
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 42
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 43
I/O Bandwidth
I/O Bandwidth
I/O Bandwidth
I/O Bandwidth
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 44
I/O I/O I/O I/O
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 45
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 46
– parallel regions – concurrency – process communica?on – synchronism
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 47
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 48
– Parse arguments (mpirun may add some!) – Iden?fy parallel environment and rank of process – Read and distribute all data
– Proceed to subrou?ne with parallel work (can be same of different for all parallel tasks)
CAUTION: this sequence may be run only once
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 49
program bcast implicit none include "mpif.h" integer :: myrank, ncpus, imesg, ierr integer, parameter :: comm = MPI_COMM_WORLD call MPI_INIT(ierr) call MPI_COMM_RANK(comm, myrank, ierr) call MPI_COMM_SIZE(comm, ncpus, ierr) imesg = myrank print *, "Before Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_BCAST(imesg, 1, MPI_INTEGER, 0, comm, ierr) print *, "Ater Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_FINALIZE(ierr) end program bcast
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 50
program bcast implicit none include "mpif.h" integer :: myrank, ncpus, imesg, ierr integer, parameter :: comm = MPI_COMM_WORLD call MPI_INIT(ierr) call MPI_COMM_RANK(comm, myrank, ierr) call MPI_COMM_SIZE(comm, ncpus, ierr) imesg = myrank print *, "Before Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_BCAST(imesg, 1, MPI_INTEGER, 0, comm, ierr) print *, "Ater Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_FINALIZE(ierr) end program bcast
myrank = ?? ncpus = ?? imesg = ?? ierr = ?? comm = MPI_C... myrank = ?? ncpus = ?? imesg = ?? ierr = ?? comm = MPI_C... myrank = ?? ncpus = ?? imesg = ?? ierr = ?? comm = MPI_C... myrank = ?? ncpus = ?? imesg = ?? ierr = ?? comm = MPI_C...
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 51
program bcast implicit none include "mpif.h" integer :: myrank, ncpus, imesg, ierr integer, parameter :: comm = MPI_COMM_WORLD call MPI_INIT(ierr) call MPI_COMM_RANK(comm, myrank, ierr) call MPI_COMM_SIZE(comm, ncpus, ierr) imesg = myrank print *, "Before Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_BCAST(imesg, 1, MPI_INTEGER, 0, comm, ierr) print *, "Ater Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_FINALIZE(ierr) end program bcast
myrank = ?? ncpus = ?? imesg = ?? ierr = MPI_SUC... comm = MPI_C... myrank = ?? ncpus = ?? imesg = ?? ierr = MPI_SUC... comm = MPI_C... myrank = ?? ncpus = ?? imesg = ?? ierr = MPI_SUC... comm = MPI_C... myrank = ?? ncpus = ?? imesg = ?? ierr = MPI_SUC... comm = MPI_C...
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 52
program bcast implicit none include "mpif.h" integer :: myrank, ncpus, imesg, ierr integer, parameter :: comm = MPI_COMM_WORLD call MPI_INIT(ierr) call MPI_COMM_SIZE(comm, ncpus, ierr) call MPI_COMM_RANK(comm, myrank, ierr)
myrank = ?? ncpus = 4 imesg = ?? ierr = MPI_SUC... comm = MPI_C... myrank = ?? ncpus = 4 imesg = ?? ierr = MPI_SUC... comm = MPI_C... myrank = ?? ncpus = 4 imesg = ?? ierr = MPI_SUC... comm = MPI_C... myrank = ?? ncpus = 4 imesg = ?? ierr = MPI_SUC... comm = MPI_C...
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 53
program bcast implicit none include "mpif.h" integer :: myrank, ncpus, imesg, ierr integer, parameter :: comm = MPI_COMM_WORLD call MPI_INIT(ierr) call MPI_COMM_SIZE(comm, ncpus, ierr) call MPI_COMM_RANK(comm, myrank, ierr)
myrank = 0 ncpus = 4 imesg = ?? ierr = MPI_SUC... comm = MPI_C... myrank = 1 ncpus = 4 imesg = ?? ierr = MPI_SUC... comm = MPI_C... myrank = 2 ncpus = 4 imesg = ?? ierr = MPI_SUC... comm = MPI_C... myrank = 3 ncpus = 4 imesg = ?? ierr = MPI_SUC... comm = MPI_C...
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 54
program bcast implicit none include "mpif.h" integer :: myrank, ncpus, imesg, ierr integer, parameter :: comm = MPI_COMM_WORLD call MPI_INIT(ierr) call MPI_COMM_RANK(comm, myrank, ierr) call MPI_COMM_SIZE(comm, ncpus, ierr) imesg = myrank print *, "Before Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_BCAST(imesg, 1, MPI_INTEGER, 0, comm, ierr) print *, "Ater Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_FINALIZE(ierr) end program bcast
myrank = 0 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 1 ncpus = 4 imesg = 1 ierr = MPI_SUC... comm = MPI_C... myrank = 2 ncpus = 4 imesg = 2 ierr = MPI_SUC... comm = MPI_C... myrank = 3 ncpus = 4 imesg = 3 ierr = MPI_SUC... comm = MPI_C...
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 55
program bcast implicit none include "mpif.h" integer :: myrank, ncpus, imesg, ierr integer, parameter :: comm = MPI_COMM_WORLD call MPI_INIT(ierr) call MPI_COMM_RANK(comm, myrank, ierr) call MPI_COMM_SIZE(comm, ncpus, ierr) imesg = myrank print *, "Before Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_BCAST(imesg, 1, MPI_INTEGER, 0, comm, ierr) print *, "Ater Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_FINALIZE(ierr) end program bcast
myrank = 0 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 1 ncpus = 4 imesg = 1 ierr = MPI_SUC... comm = MPI_C... myrank = 2 ncpus = 4 imesg = 2 ierr = MPI_SUC... comm = MPI_C... myrank = 3 ncpus = 4 imesg = 3 ierr = MPI_SUC... comm = MPI_C...
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 56
myrank = 0 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 1 ncpus = 4 imesg = 1 ierr = MPI_SUC... comm = MPI_C... myrank = 2 ncpus = 4 imesg = 2 ierr = MPI_SUC... comm = MPI_C... myrank = 3 ncpus = 4 imesg = 3 ierr = MPI_SUC... comm = MPI_C...
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 57
myrank = 0 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 1 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 2 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 3 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C...
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 58
program bcast implicit none include "mpif.h" integer :: myrank, ncpus, imesg, ierr integer, parameter :: comm = MPI_COMM_WORLD call MPI_INIT(ierr) call MPI_COMM_RANK(comm, myrank, ierr) call MPI_COMM_SIZE(comm, ncpus, ierr) imesg = myrank print *, "Before Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_BCAST(imesg, 1, MPI_INTEGER, 0, comm, ierr) print *, "Ater Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_FINALIZE(ierr) end program bcast
myrank = 0 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 1 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 2 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 3 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C...
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 59
program bcast implicit none include "mpif.h" integer :: myrank, ncpus, imesg, ierr integer, parameter :: comm = MPI_COMM_WORLD call MPI_INIT(ierr) call MPI_COMM_RANK(comm, myrank, ierr) call MPI_COMM_SIZE(comm, ncpus, ierr) imesg = myrank print *, "Before Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_BCAST(imesg, 1, MPI_INTEGER, 0, comm, ierr) print *, "Ater Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_FINALIZE(ierr) end program bcast
myrank = 0 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 1 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 2 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 3 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C...
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 60
program bcast implicit none include "mpif.h" integer :: myrank, ncpus, imesg, ierr integer, parameter :: comm = MPI_COMM_WORLD call MPI_INIT(ierr) call MPI_COMM_RANK(comm, myrank, ierr) call MPI_COMM_SIZE(comm, ncpus, ierr) imesg = myrank print *, "Before Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_BCAST(imesg, 1, MPI_INTEGER, 0, comm, ierr) print *, "Ater Bcast opera?on I'm ", myrank, & " and my message content is ", imesg call MPI_FINALIZE(ierr) end program bcast
myrank = 0 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 1 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C... myrank = 2 ncpus = 4 imesg = 0 ierr = MPI_SUCC comm = MPI_C... myrank = 3 ncpus = 4 imesg = 0 ierr = MPI_SUC... comm = MPI_C...
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 61
MPI: Domain partition OpenMP: Node Level shared mem CUDA/OpenCL/OpenAcc: floating point accelerators Python: Ensemble simulations, workflows Workload Management: system level, High-throughput
Ivan Giro*o - igiro*o@ictp.it Cinvestav Abacus, 16 Feb 2018 Overview of Common Strategies for Paralleliza5on 62