Image Compositing on
GPU-Accelerated Supercomputers
Pascal Grosset & Charles (Chuck) Hansen
Tuesday 5 April 2016 GTC 2016
Image Compositing on GPU-Accelerated Supercomputers Pascal Grosset - - PowerPoint PPT Presentation
Image Compositing on GPU-Accelerated Supercomputers Pascal Grosset & Charles (Chuck) Hansen Tuesday 5 April 2016 GTC 2016 Outline - Direct Volume Rendering - Distributed Volume Rendering - Rendering Pipeline - Setup - Rendering -
Tuesday 5 April 2016 GTC 2016
GTC 2016
Block of scalar values
GTC 2016
1. Partition the data among the nodes (loading) 2. Forming an image from the data (rendering) 3. Assemble the image (compositing)
GTC 2016
Block of scalar values
1. Partition the data among the nodes (loading) 2. Forming an image from the data (rendering) 3. Assemble the image (compositing)
GTC 2016
1. Partition the data among the nodes (loading) 2. Forming an image from the data (rendering) 3. Assemble the image (compositing)
GTC 2016
1. Partition the data among the nodes (loading) 2. Forming an image from the data (rendering) 3. Assemble the image (compositing)
GTC 2016
1. Partition the data among the nodes (loading) 2. Forming an image from the data (rendering) 3. Assemble the image (compositing)
GTC 2016
GTC 2016
OpenGL: Most common way to render
Transfer to CPU and composite there?
GTC 2016
GTC 2016
CUDA Driver Buffer Network Driver Buffer Network Driver Buffer CUDA Driver Buffer
5 operations !!!
GTC 2016 NO GPU Direct RDMA: 5 operations GPU Direct RDMA: 1 operation
GTC 2016
OpenGL: Most common way to render
Transfer to CPU and composite there? Use the GPU: CUDA
GTC 2016
Using OpenGL would imply 5 copies when compositing!
GTC 2016
CUDA OpenGL Interop for linking OpenGL with CUDA
CUDA and OpenGL can run together on Tesla class GPUs
GTC 2016
Setup Volume Rendering Compositing CUDA OpenGL Interop
GPU Direct RDMA does NOT work with texture memory!!!
GTC 2016
Setup Volume Rendering Compositing CUDA OpenGL Interop Setup Volume Rendering Compositing
GTC 2016
Setup Volume Rendering Compositing CUDA OpenGL Interop Setup Volume Rendering Compositing
GTC 2016
Setup Volume Rendering Compositing
CUDA Kernels GPU Direct RDMA
Computation >> Communication Algorithm that minimizes communication
CUDA OpenGL Interop Setup Volume Rendering Compositing
GTC 2016
1. Direct Send 2. K-ary Tree compositing 3. Gather
computation
GTC 2016
locality of size r
receiving buffer
GTC 2016
GTC 2016
GTC 2016
Radix-k TOD-Tree Binary Swap
Driver 358 requires no X Server for OpenGL context
GTC 2016 Setup:
Activate X Server Create OpenGL Context using GLX
Volume Rendering:
Setup OpenGL Buffer Object Write offscreen using shaders
Compositing:
CUDA Kernel - Blending GPU Direct RDMA - Communication TOD-Tree - Logic OpenGL CUDA Interop
GTC 2016
Piz Daint at Swiss National Supercomputing Center (CSCS) Cray XC30 with 5,272 Tesla K20X 7th in Top 500 Supercomputers
GTC 2016
Time (ms)
17 16 15 14 13 12 11
Time (ms)
70 65 60 55 50 45 40 35
GTC 2016
GTC 2016
Rendering: OpenGL Shaders offscreen to GL_TEXTURE_BUFFER CUDA OpenGL InterOP Compositing:
Blending: CUDA Kernels Communication: GPU Direct RDMA Logic: TOD-Tree
GTC 2016
GTC 2016
Charles Hansen, "TOD-Tree: Task-Overlapped Direct send Tree Image Compositing for Hybrid MPI Parallelism and GPUs", IEEE Transactions on Visualization & Computer Graphics, no. 1, pp. 1, PrePrints, doi:10.1109/TVCG. 2016.2542069
Special thanks to Tom Fogal, Peter Messmer and Jean Favre.
My email: pgrosset@sci.utah.edu
GTC 2016