Image Compositing on GPU-Accelerated Supercomputers Pascal Grosset - - PowerPoint PPT Presentation

image compositing on gpu accelerated supercomputers
SMART_READER_LITE
LIVE PREVIEW

Image Compositing on GPU-Accelerated Supercomputers Pascal Grosset - - PowerPoint PPT Presentation

Image Compositing on GPU-Accelerated Supercomputers Pascal Grosset & Charles (Chuck) Hansen Tuesday 5 April 2016 GTC 2016 Outline - Direct Volume Rendering - Distributed Volume Rendering - Rendering Pipeline - Setup - Rendering -


slide-1
SLIDE 1

Image Compositing on

GPU-Accelerated Supercomputers

Pascal Grosset & Charles (Chuck) Hansen

Tuesday 5 April 2016 GTC 2016

slide-2
SLIDE 2

Outline

  • Direct Volume Rendering
  • Distributed Volume Rendering
  • Rendering Pipeline
  • Setup
  • Rendering
  • Compositing
  • Test Setup
  • Results & Discussion
  • Conclusion & Future Work

GTC 2016

slide-3
SLIDE 3

Block of scalar values

Direct Volume Rendering

GTC 2016

slide-4
SLIDE 4

Sort-last Parallel Rendering

1. Partition the data among the nodes (loading) 2. Forming an image from the data (rendering) 3. Assemble the image (compositing)

Distributed Volume Rendering

GTC 2016

Block of scalar values

slide-5
SLIDE 5

Sort-last Parallel Rendering

1. Partition the data among the nodes (loading) 2. Forming an image from the data (rendering) 3. Assemble the image (compositing)

Distributed Volume Rendering

GTC 2016

slide-6
SLIDE 6

Sort-last Parallel Rendering

1. Partition the data among the nodes (loading) 2. Forming an image from the data (rendering) 3. Assemble the image (compositing)

Distributed Volume Rendering

GTC 2016

slide-7
SLIDE 7

Sort-last Parallel Rendering

1. Partition the data among the nodes (loading) 2. Forming an image from the data (rendering) 3. Assemble the image (compositing)

Distributed Volume Rendering

GTC 2016

slide-8
SLIDE 8

Sort-last Parallel Rendering

1. Partition the data among the nodes (loading) 2. Forming an image from the data (rendering) 3. Assemble the image (compositing)

Distributed Volume Rendering

GTC 2016

slide-9
SLIDE 9

Distributed Volume Rendering on GPU

GTC 2016

Rendering:

OpenGL: Most common way to render

Compositing:

Transfer to CPU and composite there?

slide-10
SLIDE 10

Inter-node GPU Communication

GTC 2016

slide-11
SLIDE 11

Inter-node GPU Communication

GTC 2016

CUDA Driver Buffer Network Driver Buffer Network Driver Buffer CUDA Driver Buffer

5 operations !!!

slide-12
SLIDE 12

Inter-node GPU Communication

GTC 2016 NO GPU Direct RDMA: 5 operations GPU Direct RDMA: 1 operation

slide-13
SLIDE 13

Distributed Volume Rendering on GPU

GTC 2016

Rendering:

OpenGL: Most common way to render

Compositing:

Transfer to CPU and composite there? Use the GPU: CUDA

slide-14
SLIDE 14

Distributed Volume Rendering on GPU

GTC 2016

Rendering: OpenGL Shaders Compositing: CUDA: Computation + Communication

Using OpenGL would imply 5 copies when compositing!

slide-15
SLIDE 15

Distributed Volume Rendering on GPU

GTC 2016

Rendering: OpenGL Shaders Compositing: CUDA: Computation + Communication

CUDA OpenGL Interop for linking OpenGL with CUDA

CUDA and OpenGL can run together on Tesla class GPUs

slide-16
SLIDE 16

Pipeline

GTC 2016

Setup Volume Rendering Compositing CUDA OpenGL Interop

slide-17
SLIDE 17

OpenGL with Shaders Offscreen Rendering

GPU Direct RDMA does NOT work with texture memory!!!

Pipeline

GTC 2016

Setup Volume Rendering Compositing CUDA OpenGL Interop Setup Volume Rendering Compositing

slide-18
SLIDE 18

OpenGL with Shaders Offscreen Rendering to GL_TEXTURE_BUFFER

Pipeline

GTC 2016

Setup Volume Rendering Compositing CUDA OpenGL Interop Setup Volume Rendering Compositing

slide-19
SLIDE 19

Pipeline

GTC 2016

Setup Volume Rendering Compositing

Compositing:

CUDA Kernels GPU Direct RDMA

Constraint:

Computation >> Communication Algorithm that minimizes communication

CUDA OpenGL Interop Setup Volume Rendering Compositing

slide-20
SLIDE 20

TOD-Tree

GTC 2016

Task-Overlapped Direct send Tree (TOD-Tree):

1. Direct Send 2. K-ary Tree compositing 3. Gather

Aim:

  • Minimize communication
  • Overlap communication with

computation

slide-21
SLIDE 21

GTC 2016

Each node:

  • Determine the nodes in its

locality of size r

  • Creates and advertises

receiving buffer

  • Do parallel Direct Send

TOD-Tree: Direct Send (stage 1)

slide-22
SLIDE 22

GTC 2016

Each node:

  • Determine if it is sending
  • r receiving

Sending node:

  • Sends to the receiving node

Receiving node:

  • Creates buffer and advertises
  • Blend images

TOD-Tree: K-ary Tree (stage 2)

slide-23
SLIDE 23

GTC 2016

Display node:

  • Receive from other images

Other nodes:

  • Nodes that have images send
  • their data to the display node

TOD-Tree: Gather (stage 3)

slide-24
SLIDE 24

TOD-Tree vs Radix-K and Binary Swap

GTC 2016

CPU Comparison against IceT

Radix-k TOD-Tree Binary Swap

slide-25
SLIDE 25

Driver 358 requires no X Server for OpenGL context

Pipeline

GTC 2016 Setup:

Activate X Server Create OpenGL Context using GLX

Volume Rendering:

Setup OpenGL Buffer Object Write offscreen using shaders

Compositing:

CUDA Kernel - Blending GPU Direct RDMA - Communication TOD-Tree - Logic OpenGL CUDA Interop

slide-26
SLIDE 26

Setup for testing

GTC 2016

Test Data: Cube dataset - one cube per node Test Platform:

Piz Daint at Swiss National Supercomputing Center (CSCS) Cray XC30 with 5,272 Tesla K20X 7th in Top 500 Supercomputers

Algorithm: TOD-Tree

slide-27
SLIDE 27

Results: TOD-Tree Edison vs Piz Daint

GTC 2016

Time (ms)

17 16 15 14 13 12 11

Time (ms)

70 65 60 55 50 45 40 35

slide-28
SLIDE 28

Results: TOD-Tree Edison vs Piz Daint

GTC 2016

slide-29
SLIDE 29

Conclusion

GTC 2016

Image compositing on GPUs is now feasible!

Rendering: OpenGL Shaders offscreen to GL_TEXTURE_BUFFER CUDA OpenGL InterOP Compositing:

Blending: CUDA Kernels Communication: GPU Direct RDMA Logic: TOD-Tree

Scales very well as we increase the size of images

slide-30
SLIDE 30

Future Work

GTC 2016

  • Test in-situ rendering
  • Scale to a larger number of nodes
  • Vulkan for OpenGL volume rendering
slide-31
SLIDE 31

More details ...

GTC 2016

Paper:

  • A. V. Pascal Grosset, Manasa Prasad, Cameron Christensen, Aaron Knoll,

Charles Hansen, "TOD-Tree: Task-Overlapped Direct send Tree Image Compositing for Hybrid MPI Parallelism and GPUs", IEEE Transactions on Visualization & Computer Graphics, no. 1, pp. 1, PrePrints, doi:10.1109/TVCG. 2016.2542069

slide-32
SLIDE 32

Thank you! Any Questions?

Special thanks to Tom Fogal, Peter Messmer and Jean Favre.

My email: pgrosset@sci.utah.edu

GTC 2016