Porting the Plasma Simulation PIConGPU to Heterogeneous - - PowerPoint PPT Presentation

porting the plasma simulation picongpu to heterogeneous
SMART_READER_LITE
LIVE PREVIEW

Porting the Plasma Simulation PIConGPU to Heterogeneous - - PowerPoint PPT Presentation

Porting the Plasma Simulation PIConGPU to Heterogeneous Architectures with Alpaka Ren Widera 1 , Erik Zenker 1,2 , Guido Juckeland 1 , Benjamin Worpitz 1,2 , Axel Huebl 1,2 , Andreas Knpfer 2 , Wolfgang E. Nagel 2 , Michael Bussmann 1 1


slide-1
SLIDE 1
  • Prof. Peter Mustermann I Institut xxxxx I www.hzdr.de

Porting the Plasma Simulation PIConGPU to Heterogeneous Architectures with Alpaka

René Widera1, Erik Zenker1,2, Guido Juckeland1, Benjamin Worpitz1,2, Axel Huebl1,2, Andreas Knüpfer2, Wolfgang E. Nagel2, Michael Bussmann1

1 Helmholtz-Zentrum Dresden – Rossendorf 2 Technische Universität Dresden

slide-2
SLIDE 2

Mitglied der Helmholtz-Gemeinschaft

2

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Electron Acceleration with Lasers

  • Compact X-Ray sources

Ion Acceleration with Lasers

  • Tumor Therapy

Plasma Instabilities

  • Astrophysics

PICon GPU

slide-3
SLIDE 3

Mitglied der Helmholtz-Gemeinschaft

3

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

+ + + + + + + + + + + + + + + + + +

─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─

Domain Decomposition ─ Field and Particle Domain

slide-4
SLIDE 4

Mitglied der Helmholtz-Gemeinschaft

4

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

+ + + + + + + + + + + + + + + + + +

─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─

  • Moving Particles create Fields
  • Fields act back on Particles
  • Particles change Cells

Domain Decomposition ─ Field and Particle Domain

slide-5
SLIDE 5

Mitglied der Helmholtz-Gemeinschaft

5

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Field Domain Particle Domain 1 2 3 4

+ + + +

Creating Vectorized Data Structures for Particles and Fields

slide-6
SLIDE 6

Mitglied der Helmholtz-Gemeinschaft

6

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Field Domain Particle Domain

Cell 1 Cell 1 Cell 2 Cell 4

+

1 2 3 4

+ + +

Creating Vectorized Data Structures for Particles and Fields

slide-7
SLIDE 7

Mitglied der Helmholtz-Gemeinschaft

7

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Field Domain Particle Domain

Cell 1 Cell 1 Cell 2 Cell 4

+

1 2 3 4

+ + +

  • chunked in supercells
  • line wise aligned

Creating Vectorized Data Structures for Particles and Fields

slide-8
SLIDE 8

Mitglied der Helmholtz-Gemeinschaft

8

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Field Domain Particle Domain

Cell 1 Cell 1 Cell 2 Cell 4

+

1 2 3 4

+ + +

  • fixed size frames
  • struct of aligned arrays
  • chunked in supercells
  • line wise aligned

Creating Vectorized Data Structures for Particles and Fields

slide-9
SLIDE 9

Mitglied der Helmholtz-Gemeinschaft

9

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Field Domain Particle Domain

Cell 1 Cell 1 Cell 2 Cell 4

+

1 2 3 4

+ + +

  • fixed size frames
  • struct of aligned arrays
  • chunked in supercells
  • line wise aligned

Creating Vectorized Data Structures for Particles and Fields

slide-10
SLIDE 10

Mitglied der Helmholtz-Gemeinschaft

10

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Cell 1 Cell 1 Cell 2 Cell 4

+

1 2 3 4

+ + +

Algorithm Driven Cache Strategy

slide-11
SLIDE 11

Mitglied der Helmholtz-Gemeinschaft

11

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Global Memory Cell 1 Cell 1 Cell 2 Cell 4

+

1 2 3 4

+ + +

Algorithm Driven Cache Strategy

slide-12
SLIDE 12

Mitglied der Helmholtz-Gemeinschaft

12

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Global Memory Shared Memory Cell 1 Cell 1 Cell 2 Cell 4

+

1 2 3 4

+ + +

Algorithm Driven Cache Strategy

slide-13
SLIDE 13

Mitglied der Helmholtz-Gemeinschaft

13

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Shared Memory Global Memory Cell 1 Cell 1 Cell 2 Cell 4

+

1 2 3 4

+ + +

Algorithm Driven Cache Strategy

slide-14
SLIDE 14

Mitglied der Helmholtz-Gemeinschaft

14

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Global Memory Shared Memory Cell 1 Cell 1 Cell 2 Cell 4

+

1 2 3 4

+ + +

THREAD BLOCK THREAD 1 THREAD 2 THREAD 4 THREAD 3

High Utilization of Threads

slide-15
SLIDE 15

Mitglied der Helmholtz-Gemeinschaft

15

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Task-Parallel Execution of Kernels + Asynchronous Communication

slide-16
SLIDE 16

Mitglied der Helmholtz-Gemeinschaft

16

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

strong scaling

PIConGPU ─ Scales up to 16,384 GPUs

1 10 100 1000 10000 1 10 100 1000 10000 speedup number of GPUs ideal 1 to 32 8 to 256 64 to 2048 512 to 16384 4096 to 16384

slide-17
SLIDE 17

Mitglied der Helmholtz-Gemeinschaft

17

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

weak scaling efficiency strong scaling

PIConGPU ─ Scales up to 16,384 GPUs

80 85 90 95 100 105 1 10 100 1000 10000 efficiency [%] number of GPUs ideal PIConGPU

1 10 100 1000 10000 1 10 100 1000 10000 speedup number of GPUs ideal 1 to 32 8 to 256 64 to 2048 512 to 16384 4096 to 16384

slide-18
SLIDE 18

Mitglied der Helmholtz-Gemeinschaft

18

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

weak scaling efficiency strong scaling

Efficiency >95% PIConGPU ─ Scales up to 16,384 GPUs

80 85 90 95 100 105 1 10 100 1000 10000 efficiency [%] number of GPUs ideal PIConGPU

1 10 100 1000 10000 1 10 100 1000 10000 speedup number of GPUs ideal 1 to 32 8 to 256 64 to 2048 512 to 16384 4096 to 16384

slide-19
SLIDE 19

Mitglied der Helmholtz-Gemeinschaft

19

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

weak scaling efficiency strong scaling

Efficiency >95% PIConGPU ─ Scales up to 16,384 GPUs

80 85 90 95 100 105 1 10 100 1000 10000 efficiency [%] number of GPUs ideal PIConGPU

6.9 PFlop/s (SP)

1 10 100 1000 10000 1 10 100 1000 10000 speedup number of GPUs ideal 1 to 32 8 to 256 64 to 2048 512 to 16384 4096 to 16384

slide-20
SLIDE 20

Mitglied der Helmholtz-Gemeinschaft

20

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

+

More Physics, More Computations, More Power!

s1 s2

slide-21
SLIDE 21

Mitglied der Helmholtz-Gemeinschaft

21

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

+

More Physics, More Computations, More Power!

s1 s2 s1,1 s1,2 s1,3 ... sn,m Old atom state

slide-22
SLIDE 22

Mitglied der Helmholtz-Gemeinschaft

22

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

+

More Physics, More Computations, More Power!

s1 s2 s1,1 s1,2 s1,3 ... sn,m Old atom state t1,1 t1,2 t1,3 … t1,n t2,1 . t3,1 . … . tn,1 tn,n Atom-physical effects

slide-23
SLIDE 23

Mitglied der Helmholtz-Gemeinschaft

23

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

+

More Physics, More Computations, More Power!

s1 s2 s1,1 s1,2 s1,3 ... sn,m Old atom state = s1,1 s1,2 s1,3 ... sn,m New atom state t1,1 t1,2 t1,3 … t1,n t2,1 . t3,1 . … . tn,1 tn,n Atom-physical effects

slide-24
SLIDE 24

Mitglied der Helmholtz-Gemeinschaft

24

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

+

More Physics, More Computations, More Power!

s1 s2 s1,1 s1,2 s1,3 ... sn,m Old atom state = s1,1 s1,2 s1,3 ... sn,m New atom state t1,1 t1,2 t1,3 … t1,n t2,1 . t3,1 . … . tn,1 tn,n Atom-physical effects

Really Big Data Task

◾ Random access on big amounts of data > 100 GB ◾ Good job for powerful CPUs ◾ Efficient CPU/GPU cooperation

slide-25
SLIDE 25

Mitglied der Helmholtz-Gemeinschaft

25

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Small Open Source Communities need Maintainable Codes Single Source

Heterogeneity Write once, execute everywhere Testability Validate once, get correct results everywhere Sustainability Porting implies minimal code changes Optimizability Tune for good performance at minimum coding effort Openness Open source and

  • pen standards
slide-26
SLIDE 26

Mitglied der Helmholtz-Gemeinschaft

26

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Alpaka

slide-27
SLIDE 27

Mitglied der Helmholtz-Gemeinschaft

27

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Fibers

C++

Threads

Next parallelism model

Good News: there are Alpakas on the Compute Meadow

  • Single zero overhead interface to existing parallelism models
  • Single source C++11 kernels
  • Data-agnostic memory model
slide-28
SLIDE 28

Mitglied der Helmholtz-Gemeinschaft

28

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Abstract Hierarchical Redundant Parallelism Model

Grid Synchronize Parallel Sequential

slide-29
SLIDE 29

Mitglied der Helmholtz-Gemeinschaft

29

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Abstract Hierarchical Redundant Parallelism Model

Grid Synchronize Block Parallel Sequential

slide-30
SLIDE 30

Mitglied der Helmholtz-Gemeinschaft

30

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Abstract Hierarchical Redundant Parallelism Model

Grid Synchronize Block Thread Parallel Sequential

slide-31
SLIDE 31

Mitglied der Helmholtz-Gemeinschaft

31

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Abstract Hierarchical Redundant Parallelism Model

Grid Synchronize Block Thread Element Parallel Sequential

  • Element level is an

explicit sequential layer

slide-32
SLIDE 32

Mitglied der Helmholtz-Gemeinschaft

32

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Device Global Memory Grid Explicit deep copy Host Memory

Data Structure Agnostic Memory Model

slide-33
SLIDE 33

Mitglied der Helmholtz-Gemeinschaft

33

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Device Global Memory Grid Explicit deep copy Shared Memory Block Host Memory

Data Structure Agnostic Memory Model

slide-34
SLIDE 34

Mitglied der Helmholtz-Gemeinschaft

34

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Device Global Memory Grid Explicit deep copy Shared Memory Block Register Memory Register Memory Thread Host Memory

Data Structure Agnostic Memory Model

slide-35
SLIDE 35

Mitglied der Helmholtz-Gemeinschaft

35

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Map the Abstraction Model to your Desired Acceleration Back-End

  • Explicit mapping of parallelization levels to hardware

CPU RAM L3 L3

Core L1/2 R Core L1/2 R Core L1/2 R Core L1/2 R Package Package

AVX AVX AVX AVX

slide-36
SLIDE 36

Mitglied der Helmholtz-Gemeinschaft

36

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Map the Abstraction Model to your Desired Acceleration Back-End

  • Explicit mapping of parallelization levels to hardware

CPU RAM L3 L3

Core L1/2 R Core L1/2 R Core L1/2 R Core L1/2 R Package Package

AVX AVX AVX AVX

Grid Global Memory

slide-37
SLIDE 37

Mitglied der Helmholtz-Gemeinschaft

37

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Map the Abstraction Model to your Desired Acceleration Back-End

  • Explicit mapping of parallelization levels to hardware

Core L1/2 R Core L1/2 R Core L1/2 R Core L1/2 R

CPU RAM L3 L3

Core L1/2 R Core L1/2 R Core L1/2 R Core L1/2 R Package Package

AVX AVX AVX AVX

Grid Block Shared Memory Global Memory

slide-38
SLIDE 38

Mitglied der Helmholtz-Gemeinschaft

38

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Map the Abstraction Model to your Desired Acceleration Back-End

  • Explicit mapping of parallelization levels to hardware

Core L1/2 R Core L1/2 R Core L1/2 R Core L1/2 R

CPU RAM L3 L3

Core L1/2 R Core L1/2 R Core L1/2 R Core L1/2 R Package Package

AVX AVX AVX AVX

Grid Block Thread Register Memory Shared Memory Global Memory

slide-39
SLIDE 39

Mitglied der Helmholtz-Gemeinschaft

39

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Map the Abstraction Model to your Desired Acceleration Back-End

  • Explicit mapping of parallelization levels to hardware

Core L1/2 R Core L1/2 R Core L1/2 R Core L1/2 R

CPU RAM L3 L3

Core L1/2 R Core L1/2 R Core L1/2 R Core L1/2 R Package Package

AVX AVX AVX AVX

Grid Block Thread Element Register Memory Shared Memory Global Memory

slide-40
SLIDE 40

Mitglied der Helmholtz-Gemeinschaft

40

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Map the Abstraction Model to your Desired Acceleration Back-End

  • Specific unsupported levels of the model can be ignored
  • Abstract interface allows to extend the set of mappings
slide-41
SLIDE 41

Mitglied der Helmholtz-Gemeinschaft

41

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Alpaka : Vector Addition Kernel

struct VectorAdd { template<typename TAcc, typename TElem, typename TSize> ALPAKA_FN_ACC auto operator()( TAcc const & acc, TSize const & numElements, TElem const * const X, TElem * const Y) const -> void { } };

slide-42
SLIDE 42

Mitglied der Helmholtz-Gemeinschaft

42

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Alpaka : Vector Addition Kernel

struct VectorAdd { template<typename TAcc, typename TElem, typename TSize> ALPAKA_FN_ACC auto operator()( TAcc const & acc, TSize const & numElements, TElem const * const X, TElem * const Y) const -> void { } }; using alp = alpaka; auto globalIdx = alp::idx::getIdx<alp::Grid, alp::Threads>(acc)[0u]; auto elemsPerThread = alp::workdiv::getWorkDiv<alp::Thread, alp::Elems>(acc)[0u];

slide-43
SLIDE 43

Mitglied der Helmholtz-Gemeinschaft

43

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Alpaka : Vector Addition Kernel

struct VectorAdd { template<typename TAcc, typename TElem, typename TSize> ALPAKA_FN_ACC auto operator()( TAcc const & acc, TSize const & numElements, TElem const * const X, TElem * const Y) const -> void { } }; using alp = alpaka; auto globalIdx = alp::idx::getIdx<alp::Grid, alp::Threads>(acc)[0u]; auto elemsPerThread = alp::workdiv::getWorkDiv<alp::Thread, alp::Elems>(acc)[0u]; auto begin = globalIdx * elemsPerThread; auto end = min(begin + elemsPerThread, numElements); for(TSize i = begin; i < end; ++i){ Y[i] = X[i] + Y[i]; }

slide-44
SLIDE 44

Mitglied der Helmholtz-Gemeinschaft

44

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Alpaka : Initialization

// Configure Alpaka using Dim = alpaka::dim::DimInt<3u> using Size = std::size_t using Acc = alpaka::acc::AccCpuSerial<Dim, Size>; using Host = alpaka::acc::AccCpuSerial<Dim, Size>; using Stream = alpaka::stream::StreamCpuSync; using WorkDiv = alpaka::workdiv::WorkDivMembers<Dim, Size>; using Elem = float;

slide-45
SLIDE 45

Mitglied der Helmholtz-Gemeinschaft

45

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Alpaka : Initialization

// Configure Alpaka using Dim = alpaka::dim::DimInt<3u> using Size = std::size_t using Acc = alpaka::acc::AccCpuSerial<Dim, Size>; using Host = alpaka::acc::AccCpuSerial<Dim, Size>; using Stream = alpaka::stream::StreamCpuSync; using WorkDiv = alpaka::workdiv::WorkDivMembers<Dim, Size>; using Elem = float; // Retrieve devices and stream DevHost devHost ( alpaka::dev::DevMan<Host>::getDevByIdx(0) ); DevAcc devAcc ( alpaka::dev::DevMan<Acc>::getDevByIdx(0) ); Stream stream ( devAcc);

slide-46
SLIDE 46

Mitglied der Helmholtz-Gemeinschaft

46

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Alpaka : Initialization

// Configure Alpaka using Dim = alpaka::dim::DimInt<3u> using Size = std::size_t using Acc = alpaka::acc::AccCpuSerial<Dim, Size>; using Host = alpaka::acc::AccCpuSerial<Dim, Size>; using Stream = alpaka::stream::StreamCpuSync; using WorkDiv = alpaka::workdiv::WorkDivMembers<Dim, Size>; using Elem = float; // Retrieve devices and stream DevHost devHost ( alpaka::dev::DevMan<Host>::getDevByIdx(0) ); DevAcc devAcc ( alpaka::dev::DevMan<Acc>::getDevByIdx(0) ); Stream stream ( devAcc); // Specify work division auto elementsPerThread ( alpaka::Vec<Dim, Size>::ones() ); auto threadsPerBlock ( alpaka::Vec<Dim, Size>::all(2u) ); auto blocksPerGrid ( alpaka::Vec<Dim, Size>(4u, 8u, 16u) ); WorkDiv workdiv(alpaka::workdiv::WorkDivMembers<Dim, Size>(blocksPerGrid, threadsPerBlock, elementsPerThread));

slide-47
SLIDE 47

Mitglied der Helmholtz-Gemeinschaft

47

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Alpaka : Call the Kernel

// Memory allocation and host to device memory copy auto X_h = alpaka::mem::buf::alloc<int, int>(devHost, extent); auto Y_h = alpaka::mem::buf::alloc<int, int>(devHost, extent); auto X_d = alpaka::mem::buf::alloc<Val, Size>(devAcc, extent); auto Y_d = alpaka::mem::buf::alloc<Val, Size>(devAcc, extent); alpaka::mem::view::copy(stream, X_d, X_h, extent); alpaka::mem::view::copy(stream, Y_d, Y_h, extent);

slide-48
SLIDE 48

Mitglied der Helmholtz-Gemeinschaft

48

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Alpaka : Call the Kernel

// Memory allocation and host to device memory copy auto X_h = alpaka::mem::buf::alloc<int, int>(devHost, extent); auto Y_h = alpaka::mem::buf::alloc<int, int>(devHost, extent); auto X_d = alpaka::mem::buf::alloc<Val, Size>(devAcc, extent); auto Y_d = alpaka::mem::buf::alloc<Val, Size>(devAcc, extent); alpaka::mem::view::copy(stream, X_d, X_h, extent); alpaka::mem::view::copy(stream, Y_d, Y_h, extent); // Kernel creation and execution VectorAdd kernel; auto const exec(alpaka::exec::create<Acc>( workDiv, kernel, numElements alpaka::mem::view::getPtrNative(X_d), alpaka::mem::view::getPtrNative(Y_d))); alpaka::stream::enqueue(stream, exec);

slide-49
SLIDE 49

Mitglied der Helmholtz-Gemeinschaft

49

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Alpaka : Call the Kernel

// Memory allocation and host to device memory copy auto X_h = alpaka::mem::buf::alloc<int, int>(devHost, extent); auto Y_h = alpaka::mem::buf::alloc<int, int>(devHost, extent); auto X_d = alpaka::mem::buf::alloc<Val, Size>(devAcc, extent); auto Y_d = alpaka::mem::buf::alloc<Val, Size>(devAcc, extent); alpaka::mem::view::copy(stream, X_d, X_h, extent); alpaka::mem::view::copy(stream, Y_d, Y_h, extent); // Kernel creation and execution VectorAdd kernel; auto const exec(alpaka::exec::create<Acc>( workDiv, kernel, numElements alpaka::mem::view::getPtrNative(X_d), alpaka::mem::view::getPtrNative(Y_d))); alpaka::stream::enqueue(stream, exec); // Copy memory back to host alpaka::mem::view::copy(stream, Y_h, Y_d, extent);

slide-50
SLIDE 50

Mitglied der Helmholtz-Gemeinschaft

50

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Alpaka : Call the Kernel

// Memory allocation and host to device memory copy auto X_h = alpaka::mem::buf::alloc<int, int>(devHost, extent); auto Y_h = alpaka::mem::buf::alloc<int, int>(devHost, extent); auto X_d = alpaka::mem::buf::alloc<Val, Size>(devAcc, extent); auto Y_d = alpaka::mem::buf::alloc<Val, Size>(devAcc, extent); alpaka::mem::view::copy(stream, X_d, X_h, extent); alpaka::mem::view::copy(stream, Y_d, Y_h, extent); // Kernel creation and execution VectorAdd kernel; auto const exec(alpaka::exec::create<Acc>( workDiv, kernel, numElements alpaka::mem::view::getPtrNative(X_d), alpaka::mem::view::getPtrNative(Y_d))); alpaka::stream::enqueue(stream, exec); // Copy memory back to host alpaka::mem::view::copy(stream, Y_h, Y_d, extent);

slide-51
SLIDE 51

Mitglied der Helmholtz-Gemeinschaft

51

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Cupla for the rescue : very fast porting !

Alpaka

slide-52
SLIDE 52

Mitglied der Helmholtz-Gemeinschaft

52

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Cupla for the rescue : very fast porting !

Alpaka

Fast enough for

live

hack session !

slide-53
SLIDE 53

Mitglied der Helmholtz-Gemeinschaft

53

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Live Cupla Hack Session

Tiling matrix-matrix multiplication algorithm

  • Live port of CUDA to Cupla

application

  • Starting point: matrix multiplication

algorithm of CUDA samples

  • Aim: single source code executed on

GPU and CPU hardware

slide-54
SLIDE 54

Mitglied der Helmholtz-Gemeinschaft

54

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

1450 GFLOPS

Single Source Alpaka DGEMM Kernel on Various Architectures

150 GFLOPS 540 GFLOPS 480 GFLOPS 560 GFLOPS

Theoretical Peak Performance DGEMM: C ← αAB + βC

283 121 74 117 35

slide-55
SLIDE 55

Mitglied der Helmholtz-Gemeinschaft

55

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

What happend so far...

slide-56
SLIDE 56

Mitglied der Helmholtz-Gemeinschaft

56

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

What happend so far...

slide-57
SLIDE 57

Mitglied der Helmholtz-Gemeinschaft

57

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

What happend so far...

slide-58
SLIDE 58

Mitglied der Helmholtz-Gemeinschaft

58

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

What happend so far...

slide-59
SLIDE 59

Mitglied der Helmholtz-Gemeinschaft

59

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

What happend so far...

slide-60
SLIDE 60

Mitglied der Helmholtz-Gemeinschaft

60

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

PIConGPU Runtime on Various Architectures

slide-61
SLIDE 61

Mitglied der Helmholtz-Gemeinschaft

61

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

PIConGPU Efficiency on Various Architectures

slide-62
SLIDE 62

Mitglied der Helmholtz-Gemeinschaft

62

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de

Clone us from GitHub

https://github.com/ComputationalRadiationPhysics

git clone https://github.com/ComputationalRadiationPhysics/picongpu git clone https://github.com/ComputationalRadiationPhysics/alpaka git clone https://github.com/ComputationalRadiationPhysics/cupla Alpaka paper pre-print: http://arxiv.org/abs/1602.08477

slide-63
SLIDE 63

Mitglied der Helmholtz-Gemeinschaft

63

René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp { r.widera, e.zenker, g.juckeland }@hzdr.de