ACCELERATING STEREO 360 STITCHING USING MULTI-GPUS Ken Turkowski - - PowerPoint PPT Presentation

accelerating stereo 360 stitching using multi gpus
SMART_READER_LITE
LIVE PREVIEW

ACCELERATING STEREO 360 STITCHING USING MULTI-GPUS Ken Turkowski - - PowerPoint PPT Presentation

ACCELERATING STEREO 360 STITCHING USING MULTI-GPUS Ken Turkowski & Trevor Smith, GTC 2017 OVERVIEW What is a stereo panorama? [Ken] How do we stitch? [Ken] How do we handle real-time videos? GPUs! [Trevor] Demo in the VR Village, until


slide-1
SLIDE 1

Ken Turkowski & Trevor Smith, GTC 2017

ACCELERATING STEREO 360 STITCHING USING MULTI-GPUS

slide-2
SLIDE 2

2

OVERVIEW

What is a stereo panorama? [Ken] How do we stitch? [Ken] How do we handle real-time videos? GPUs! [Trevor] Demo in the VR Village, until noon today! SDKs

slide-3
SLIDE 3

3

NON-INTERACTIVE IMAGERY

slide-4
SLIDE 4

4

INTERACTIVE IMAGERY

slide-5
SLIDE 5

5

WHY 360° INTERACTIVE STEREO VIDEO?

360° Pan and zoom interactively Stereo Real-time Immersion in a real-world situation More so than still photographs, directed videos, and simple panoramas

slide-6
SLIDE 6

6

SINGLE CAMERA

Light rays everywhere 2D sampling of a light field At one location Towards a preferred direction

slide-7
SLIDE 7

7

PANORAMA

Sampled at one point, like a photograph More directions: 360° horizontally x ±90°vertically

slide-8
SLIDE 8

8

HOW TO MAKE A PANORAMA?

Rotating slit camera with a fisheye lens Single camera rotated to different directions Multiple camera rig

slide-9
SLIDE 9

9

PANORAMA FORMATS

Equirectangular Cube Map Single 360° Fisheye Double 180°Fisheye

slide-10
SLIDE 10

10

PANORAMA PROS AND CONS

Gives a good sense of space Compact Interactive Facilitates individual exploration Frozen in time? à stitch videos NVIDIA has a mono stitching SDK! No sense of scale or depth Can we do better? How about stereo?

slide-11
SLIDE 11

11

IDEAL STEREO PANORAMA

Light field for each eye Eyes separated by the IPD For each direction (360°around), capture

a spray of rays for the left eye a spray of rays for the right eye

2D spray x 1D direction = 3D data set A lot of data!

slide-12
SLIDE 12

12

OMNIDIRECTIONAL STEREO

Omnidirectional Stereo (ODS) [Ishiguro’90 Peleg’01] 2D subset of the ideal 3D stereo rays 1D fan of rays, not a 2D spray Imagine 2 linear sensors + rotating motor Whereas mono rays converge radially, stereo rays converge tangentially

mono stereo

slide-13
SLIDE 13

13

MONO & STEREO LIGHT FIELDS

Mono Radial sampling Stereo Tangential sampling

slide-14
SLIDE 14

14

ADVANTAGES OF STEREO PANORAMA

Can drive an HMD Compelling Immersive Sense of depth & scale Compact: (~2 mono panos) A lot of bang per buck!

slide-15
SLIDE 15

15

STEREO PANORAMA STITCHING

Camera size precludes capturing the rays that we want

Cameras not on interpupillary circle Very few of the rays that we want

Need to interpolate on the rig circle Project to the ipd circle

slide-16
SLIDE 16

16

INTERPOLATION BETWEEN CAMERAS

(1) Compute pixel motion between adjacent cameras

Lambertian, photometrically consistent, rig calibrated, epipolar à disparity

(2) Interpolate/reproject pixel motion to virtual camera But there are problems with real-world images Need to filter and sometimes fake it

slide-17
SLIDE 17

17

INTERPOLATION CHALLENGES

textureless regions

  • cclusion

periodic textures specular surfaces noise

slide-18
SLIDE 18

18

PIXEL MOTION POST-PROCESSING

Detect occlusion boundaries Reduce pixel motion noise between occlusion boundaries Fill holes (textureless regions) with plausible motion Enforce temporal coherence

slide-19
SLIDE 19

Calibrate Camera Compute Stereo Overlap Project to Sphere (equirectangular) Generate Disparity Map Post Process Disparity Map Interpolate, Reproject & Blend

360 STEREO STITCHING PIPELINE

real-time GPU accelerated

slide-20
SLIDE 20

Input SEtch Output

  • Real Eme – CUDA accelerated
  • High quality
  • Scalable across mulEple GPUs
  • Scalable across mulEple rigs
  • Uncompressed outputs (RGBA)
  • Render to output devices(HMD)
  • MP4 (hardware accelerated encode)
  • Live stream
  • Capture
  • SDI
  • USB 3.0
  • TCP/IP
  • Queuing & synchronizaEon
  • MP4 (hardware accelerated decode)
  • Compressed and uncompressed input

STEREO PANORAMA PIPELINE

slide-21
SLIDE 21

21

STITCHING IN REAL-TIME

slide-22
SLIDE 22

22

CHALLENGE:

Need to decode 8 separate 4K streams at 30 fps (similar to 240 fps!) After getting frames to GPU, will we have any time left to stitch?

How can we decode real-time?

slide-23
SLIDE 23

23

PIPELINING DECODE AND STITCH

Using NVDEC dedicated hardware decoder for better throughput

Decode

Frame N

Stitch

Frame N

Output

Frame N

Decode

Frame N+1

Stitch

Frame N+1

Decode

Frame N+2

Output

Frame N+1

Stitch

Frame N+2

Output

Frame N+2

Time

slide-24
SLIDE 24

24

CHALLENGE:

Must copy input/output between GPUs and CPU Synchronous memory copy injects bubbles in compute workload

Dealing with memory copy latency

slide-25
SLIDE 25

25

STREAMS AND ASYNC MEMCPY

Using CUDA streams to overlap compute and copy

slide-26
SLIDE 26

26

CHALLENGE:

Synchronizing CUDA streams without blocking

Synchronizing with host can leave bubbles in compute work Can do better when we just need to sync/join streams with each other

slide-27
SLIDE 27

27

<fork kernels off in stream1> <fork kernels off in stream2> cudaEventRecord(event, stream2) cudaStreamWaitEvent(stream1, event, 0) <launch kernels that need to wait on both streams>

FORK AND JOIN WITH STREAMS

Avoid host synchronization with stream events

kernels kernels kernels stream1 stream2 event

slide-28
SLIDE 28

28

CHALLENGE:

More cameras, higher output resolution, faster refresh rate -> higher quality Can only do so much with a single GPU End-to-pipeline is same for each stereo pair (task parallelism!)

Achieving maximum quality in real-time

Project Disparity Map Post- Process Interpolate & Blend

slide-29
SLIDE 29

29

MULTI-GPU SCALING

Distribute stereo pairs among available devices

slide-30
SLIDE 30

30

PERFORMANCE: DECODE

SINGLE P6000 TWO P6000 FOUR P6000 4K 30fps

6 streams 12 streams 24 streams

1080p 60fps

12 streams 24 streams 48 streams

1080p 30fps

24 streams 48 streams 96 streams

slide-31
SLIDE 31

31

PERFORMANCE: STITCHING

8x 4K input streams

SINGLE P6000 TWO P6000 FOUR P6000 5K x 5K output

14 FPS 26 FPS 37 FPS

4K x 4K output

22 FPS 39 FPS 51FPS

2.8K x 2.8K output

38 FPS 60 FPS 62 FPS

slide-32
SLIDE 32

32

WORKS WITH MULTIPLE RIGS

slide-33
SLIDE 33

33

VRWORKS 360 VIDEO SDK

slide-34
SLIDE 34

34

STEREO SDK COMING SOON

Optimized stereo pipeline Real-time, low-latency Ambisonic audio support I/O Formats: MP4, H264, CUDA memory GPU-accelerated camera calibration Custom calibration of fisheye lenses

Mono SDK beta out now!

slide-35
SLIDE 35

35

SAMPLE APPLICATION

// Set initial guesses for properties HelperCalibSetProperties(hCalibration, cam_props); // Feed in images for auto-calibration HelperCalibSetImages(hCalibration, input_frames); // Calibrate intrinsics, extrinsics, and distortion characteristics NVCALIB_Calibrate(hCalibration);

Working with calibration API

slide-36
SLIDE 36

36

SAMPLE APPLICATION

// Apply calibrated parameters to stitcher NVSS_Stitcher_SetCalibration(hStitcher, hCalibration) // Decode streams into device memory // Run stitching pipeline NVSS_Stitcher_Stitch(hStitcher, &cam_images, &pano_image); // Encode output for streaming or interop to OpenGL

Working with stitching API

slide-37
SLIDE 37

37 37

VRWORKS 360 VIDEO SDK NOW AVAILABLE!

Features:

Real-Time & Offline Stitching Up to 32 x 4k Camera Rigs and different fish-eye lenses GPU-Accelerated Decode, Stitching, and Encode Inputs: MP4 files, RGBA files, RGBA CUDA arrays Outputs: MP4 files, RGBA files, or OpenGL textures 3x2 cube map and equirectangular 360 projection Audio stitching in off-line mode

Mono SDK Available in Beta Now! Stereo SDK Available Soon

slide-38
SLIDE 38

38