Ken Turkowski & Trevor Smith, GTC 2017
ACCELERATING STEREO 360 STITCHING USING MULTI-GPUS Ken Turkowski - - PowerPoint PPT Presentation
ACCELERATING STEREO 360 STITCHING USING MULTI-GPUS Ken Turkowski - - PowerPoint PPT Presentation
ACCELERATING STEREO 360 STITCHING USING MULTI-GPUS Ken Turkowski & Trevor Smith, GTC 2017 OVERVIEW What is a stereo panorama? [Ken] How do we stitch? [Ken] How do we handle real-time videos? GPUs! [Trevor] Demo in the VR Village, until
2
OVERVIEW
What is a stereo panorama? [Ken] How do we stitch? [Ken] How do we handle real-time videos? GPUs! [Trevor] Demo in the VR Village, until noon today! SDKs
3
NON-INTERACTIVE IMAGERY
4
INTERACTIVE IMAGERY
5
WHY 360° INTERACTIVE STEREO VIDEO?
360° Pan and zoom interactively Stereo Real-time Immersion in a real-world situation More so than still photographs, directed videos, and simple panoramas
6
SINGLE CAMERA
Light rays everywhere 2D sampling of a light field At one location Towards a preferred direction
7
PANORAMA
Sampled at one point, like a photograph More directions: 360° horizontally x ±90°vertically
8
HOW TO MAKE A PANORAMA?
Rotating slit camera with a fisheye lens Single camera rotated to different directions Multiple camera rig
9
PANORAMA FORMATS
Equirectangular Cube Map Single 360° Fisheye Double 180°Fisheye
10
PANORAMA PROS AND CONS
Gives a good sense of space Compact Interactive Facilitates individual exploration Frozen in time? à stitch videos NVIDIA has a mono stitching SDK! No sense of scale or depth Can we do better? How about stereo?
11
IDEAL STEREO PANORAMA
Light field for each eye Eyes separated by the IPD For each direction (360°around), capture
a spray of rays for the left eye a spray of rays for the right eye
2D spray x 1D direction = 3D data set A lot of data!
12
OMNIDIRECTIONAL STEREO
Omnidirectional Stereo (ODS) [Ishiguro’90 Peleg’01] 2D subset of the ideal 3D stereo rays 1D fan of rays, not a 2D spray Imagine 2 linear sensors + rotating motor Whereas mono rays converge radially, stereo rays converge tangentially
mono stereo
13
MONO & STEREO LIGHT FIELDS
Mono Radial sampling Stereo Tangential sampling
14
ADVANTAGES OF STEREO PANORAMA
Can drive an HMD Compelling Immersive Sense of depth & scale Compact: (~2 mono panos) A lot of bang per buck!
15
STEREO PANORAMA STITCHING
Camera size precludes capturing the rays that we want
Cameras not on interpupillary circle Very few of the rays that we want
Need to interpolate on the rig circle Project to the ipd circle
16
INTERPOLATION BETWEEN CAMERAS
(1) Compute pixel motion between adjacent cameras
Lambertian, photometrically consistent, rig calibrated, epipolar à disparity
(2) Interpolate/reproject pixel motion to virtual camera But there are problems with real-world images Need to filter and sometimes fake it
17
INTERPOLATION CHALLENGES
textureless regions
- cclusion
periodic textures specular surfaces noise
18
PIXEL MOTION POST-PROCESSING
Detect occlusion boundaries Reduce pixel motion noise between occlusion boundaries Fill holes (textureless regions) with plausible motion Enforce temporal coherence
Calibrate Camera Compute Stereo Overlap Project to Sphere (equirectangular) Generate Disparity Map Post Process Disparity Map Interpolate, Reproject & Blend
360 STEREO STITCHING PIPELINE
real-time GPU accelerated
Input SEtch Output
- Real Eme – CUDA accelerated
- High quality
- Scalable across mulEple GPUs
- Scalable across mulEple rigs
- Uncompressed outputs (RGBA)
- Render to output devices(HMD)
- MP4 (hardware accelerated encode)
- Live stream
- Capture
- SDI
- USB 3.0
- TCP/IP
- Queuing & synchronizaEon
- MP4 (hardware accelerated decode)
- Compressed and uncompressed input
STEREO PANORAMA PIPELINE
21
STITCHING IN REAL-TIME
22
CHALLENGE:
Need to decode 8 separate 4K streams at 30 fps (similar to 240 fps!) After getting frames to GPU, will we have any time left to stitch?
How can we decode real-time?
23
PIPELINING DECODE AND STITCH
Using NVDEC dedicated hardware decoder for better throughput
Decode
Frame N
Stitch
Frame N
Output
Frame N
Decode
Frame N+1
Stitch
Frame N+1
Decode
Frame N+2
Output
Frame N+1
Stitch
Frame N+2
Output
Frame N+2
Time
24
CHALLENGE:
Must copy input/output between GPUs and CPU Synchronous memory copy injects bubbles in compute workload
Dealing with memory copy latency
25
STREAMS AND ASYNC MEMCPY
Using CUDA streams to overlap compute and copy
26
CHALLENGE:
Synchronizing CUDA streams without blocking
Synchronizing with host can leave bubbles in compute work Can do better when we just need to sync/join streams with each other
27
<fork kernels off in stream1> <fork kernels off in stream2> cudaEventRecord(event, stream2) cudaStreamWaitEvent(stream1, event, 0) <launch kernels that need to wait on both streams>
FORK AND JOIN WITH STREAMS
Avoid host synchronization with stream events
kernels kernels kernels stream1 stream2 event
28
CHALLENGE:
More cameras, higher output resolution, faster refresh rate -> higher quality Can only do so much with a single GPU End-to-pipeline is same for each stereo pair (task parallelism!)
Achieving maximum quality in real-time
Project Disparity Map Post- Process Interpolate & Blend
29
MULTI-GPU SCALING
Distribute stereo pairs among available devices
30
PERFORMANCE: DECODE
SINGLE P6000 TWO P6000 FOUR P6000 4K 30fps
6 streams 12 streams 24 streams
1080p 60fps
12 streams 24 streams 48 streams
1080p 30fps
24 streams 48 streams 96 streams
31
PERFORMANCE: STITCHING
8x 4K input streams
SINGLE P6000 TWO P6000 FOUR P6000 5K x 5K output
14 FPS 26 FPS 37 FPS
4K x 4K output
22 FPS 39 FPS 51FPS
2.8K x 2.8K output
38 FPS 60 FPS 62 FPS
32
WORKS WITH MULTIPLE RIGS
33
VRWORKS 360 VIDEO SDK
34
STEREO SDK COMING SOON
Optimized stereo pipeline Real-time, low-latency Ambisonic audio support I/O Formats: MP4, H264, CUDA memory GPU-accelerated camera calibration Custom calibration of fisheye lenses
Mono SDK beta out now!
35
SAMPLE APPLICATION
// Set initial guesses for properties HelperCalibSetProperties(hCalibration, cam_props); // Feed in images for auto-calibration HelperCalibSetImages(hCalibration, input_frames); // Calibrate intrinsics, extrinsics, and distortion characteristics NVCALIB_Calibrate(hCalibration);
Working with calibration API
36
SAMPLE APPLICATION
// Apply calibrated parameters to stitcher NVSS_Stitcher_SetCalibration(hStitcher, hCalibration) // Decode streams into device memory // Run stitching pipeline NVSS_Stitcher_Stitch(hStitcher, &cam_images, &pano_image); // Encode output for streaming or interop to OpenGL
Working with stitching API
37 37
VRWORKS 360 VIDEO SDK NOW AVAILABLE!
Features:
Real-Time & Offline Stitching Up to 32 x 4k Camera Rigs and different fish-eye lenses GPU-Accelerated Decode, Stitching, and Encode Inputs: MP4 files, RGBA files, RGBA CUDA arrays Outputs: MP4 files, RGBA files, or OpenGL textures 3x2 cube map and equirectangular 360 projection Audio stitching in off-line mode
Mono SDK Available in Beta Now! Stereo SDK Available Soon
38