NVIDIA OPTICAL FLOW Abhijit Patait, 3/18/2019 Optical Flow in - - PowerPoint PPT Presentation

nvidia optical flow
SMART_READER_LITE
LIVE PREVIEW

NVIDIA OPTICAL FLOW Abhijit Patait, 3/18/2019 Optical Flow in - - PowerPoint PPT Presentation

NVIDIA OPTICAL FLOW Abhijit Patait, 3/18/2019 Optical Flow in Turing GPUs NVIDIA Optical Flow SDK Benchmarks End-to-end applications AGENDA Roadmap 2 BACKGROUND 3 4 5 ESTIMATING PIXEL MOTION Video motion vectors Minimize


slide-1
SLIDE 1

Abhijit Patait, 3/18/2019

NVIDIA OPTICAL FLOW

slide-2
SLIDE 2

2

AGENDA

Optical Flow in Turing GPUs NVIDIA Optical Flow SDK Benchmarks End-to-end applications Roadmap

slide-3
SLIDE 3

3

BACKGROUND

slide-4
SLIDE 4

4

slide-5
SLIDE 5

5

slide-6
SLIDE 6

6

ESTIMATING PIXEL MOTION

➢ “Video” motion vectors

➢ Minimize encoding cost ➢ SAD, SATD, RDO, intra modes, partitions

➢ Optical flow vectors

➢ Visual motion ➢ Current and surrounding pixels/blocks

slide-7
SLIDE 7

7

ESTIMATING PIXEL MOTION USING NV GPUS

  • ME-only mode – Maxwell, Pascal, Volta
  • Optimized for encoding – up to 8×8 granularity motion vectors
  • Video Codec SDK 7.0+
  • Optical flow (OF) – Turing & beyond
  • New hardware in NVENC
  • Optical flow and stereo disparity
  • Optical Flow SDK 1.0 (released Feb 2019)
slide-8
SLIDE 8

8

OPTICAL FLOW ENGINE

  • Hardware
  • Up to 150* fps at 4K
  • 4 × 4 pixel granularity
  • ¼ pixel resolution
  • Accuracy comparable to best DL methods
  • Advanced algorithms to find true flow vectors
  • Software
  • SDK (Windows, Linux, CUDA, DirectX)

Capabilities

*Dependent on device clock speed

slide-9
SLIDE 9

9

INTENSITY DIFFERENCES

136 118 26 31 39 110 115 33 40 30 98 102 78 67 45 48 57 23 221 112 39 86 99 155 200 70 62 14 16 20 58 59 17 20 15 49 56 40 33 23 24 29 12 112 62 20 43 55 78 111

Optical flow must be insensitive to intensity

slide-10
SLIDE 10

10

TURING OPTICAL FLOW VS MOTION VECTORS

Turing Optical Flow Pascal/Volta Motion Vectors Granularity Up to 4x4 Up to 8x8 Algorithm used Visual motion optimization Encoding cost optimization Quality Robust to intensity changes Sensitive to intensity changes Accuracy Close to true motion Low average EPE (end-point error) May deviate from true motion Higher EPE

slide-11
SLIDE 11

11

NVIDIA OPTICAL FLOW SDK

slide-12
SLIDE 12

12

NVIDIA OPTICAL FLOW SDK

➢ New Optical Flow C-API ➢ Scalable, accommodates needs of future hardware ➢ Linux, Windows 8.1, 10, server, … ➢ DirectX, CUDA interoperability ➢ OpenCV ➢ Public released – Feb 2019 ➢ Legacy ME-only mode API continues to be supported

slide-13
SLIDE 13

13

OPTICAL FLOW API

Main Functionality (nvOpticalFlowCommon.h) NV_OF_STATUS(NVOFAPI* PFNNVOFINIT) (NvOFHandle hOf, const NV_OF_INIT_PARAMS *initParams); NV_OF_STATUS(NVOFAPI* PFNNVOFEXECUTE) (NvOFHandle hOf, const NV_OF_EXECUTE_INPUT_PARAMS *executeInParams, NV_OF_EXECUTE_OUTPUT_PARAMS *executeOutParams); typedef NV_OF_STATUS(NVOFAPI* PFNNVOFDESTROY) (NvOFHandle hOf);

Basic functionality

CUDA and DirectX buffer management nvOpticalFlowCuda.h & nvOpticalFlowD3D11.h

slide-14
SLIDE 14

14

REUSABLE CLASSES

NvOF Base class for all core functionality NvOFCUDA Input and output in CUDA buffers NvOFD3D11 Input and output in DirectX buffers

slide-15
SLIDE 15

15

USE VIA OPENCV

Mat frameL = imread(pathL, IMREAD_GRAYSCALE); Mat frameR = imread(pathR, IMREAD_GRAYSCALE); GpuMat d_flowL(frameL), d_flowR(frameR), d_flow; Mat flowx, flowy, flowxy; int gpuId = 0; int width = frameL.size().width, height = frameL.size().height; Ptr<cuda::NvidiaOpticalFlow> OpticalFlow = cuda::NvidiaOpticalFlow::create(perfPreset, width, height, gpuId); OpticalFlow->calc(frameL, frameR, d_flow); d_flow.download(flowxy); Ptr<cuda::FarnebackOpticalFlow> OpticalFlow = cuda::FarnebackOpticalFlow::create(); OpticalFlow->calc(d_flowL, d_flowR, d_flow); d_flow.download(flowxy);

slide-16
SLIDE 16

16

BENCHMARKS

slide-17
SLIDE 17

17

OPTICAL FLOW QUALITY

  • Objective quality
  • KITTI 2012/2015, Sintel, Middlebury
  • Average end point error (EPE)
  • Percentage of outliers – background, foreground and all
  • Subjective quality
  • Flow maps
  • Frame-rate-up-conversion (video interpolation)

Evaluation Methodology

slide-18
SLIDE 18

18

OPTICAL FLOW QUALITY

EPE – KITTI 2015

11.17 7.99 5.42 4.44 4.84 LEGACY ME-ONLY MODE OF RAW OF POST- PROCESSED PWC-DC FLOWNET2

  • Avg. EPE - Lower is better

DL-methods

  • EPE = End-point error = Euclidian

distance between OF vector & ground truth

  • Non-occluded EPE
  • Occluded EPE higher but same

trend

  • KITTI 2012 EPE = 2.31
  • Sintel EPE = 8
slide-19
SLIDE 19

19

OPTICAL FLOW QUALITY

Outliers – KITTI 2015

31.09% 21.33% 16.76% 21.21% 21.08% 43.01% 36.37% 27.57% 42.29% 23.57% LEGACY ME-ONLY MODE OF RAW OF POST-PROCESSED PWC-DC FLOWNET2

Outliers Percentage – Lower is better

Background Outliers %age Foreground Outliers %age

Outlier = Euclidian distance > 3 between OF vector and ground truth

DL-methods

slide-20
SLIDE 20

20

OPTICAL FLOW QUALITY

  • NVIDIA frame-rate-up-conversion
  • Video frame interpolation
  • ME-only mode (8×8), optical flow (4×4), optical flow with post-processing (1×1)
  • Subjective and objective quality comparison
  • Results
  • Raw optical flow (4x4) based video interpolation better than ME-only mode (8x8) interpolation
  • Some video quality improvement with OF-post-processed (1x1) – content-dependent

Subjective Quality

slide-21
SLIDE 21

21

VIDEO FRAME INTERPOLATION

Original 30 fps video

slide-22
SLIDE 22

22

VIDEO FRAME INTERPOLATION

Upconverted 60 fps video

slide-23
SLIDE 23

23

PERFORMANCE

➢ 3 presets ➢ Fast/Medium – no CUDA processing ➢ Slow – pre/post-processing in CUDA ➢ Performance scales with resolution ➢ Cost calculation in CUDA (enable only if needed)

Fast Medium Slow 2 4 6 8 10 12 20 40 60 80 100 120 140

Average EPE Performance (fps) at 3840 x 2160

Optical Flow quality vs performance

slide-24
SLIDE 24

24

slide-25
SLIDE 25

25

END-TO-END APPLICATIONS

slide-26
SLIDE 26

26

END-TO-END USE-CASES

  • Video comprehension/classification
  • 2x better accuracy compared to no optical flow with UCF-101
  • Makes OF-assisted-video-comprehension usable
  • Optical-flow-assisted video inter/extrapolation
  • Objective and subjective quality comparable to FlowNet2
  • Turing enables real-time optical-flow-assisted video interpolation

Applications

slide-27
SLIDE 27

27

OPTICAL FLOW-ASSISTED VIDEO CLASSIFICATION

slide-28
SLIDE 28

28

VIDEO CLASSIFICATION

Enables world class classification accuracy with real time performance

Image only video classification has high error rates Optical flow significantly reduces error rates, but DL based OF is unusably slow Turing hardware : → Optical Flow reduces error rates by 2x → 20+ streams 720p inference

slide-29
SLIDE 29

30

TURING OPTICAL FLOW

High quality video frame interpolation at 4K in real-time

Turing FlowNet2

0 fps 10 fps 20 fps 30 fps 40 fps 50 fps 60 fps 70 fps 80 fps 25 dB 26 dB 27 dB 28 dB 29 dB 30 dB 31 dB 32 dB 33 dB 34 dB 35 dB

Performance Interpolated frame PSNR

Video Interpolation

Performance vs quality – 2160p streams

Video Interpolation

  • 60 fps ➔ 120 fps at 4K in real-time
  • 7x perf vs FlowNet2
  • 1 dB better objective quality (PSNR)

than FlowNet2-assisted interpolation

  • Similar visual quality as FlowNet2-

assisted interpolation

slide-30
SLIDE 30

31

ROADMAP

slide-31
SLIDE 31

32

ROADMAP

➢ Q3 2018 ➢ Improved quality via post-processing ➢ 1x1 flow vectors ➢ Integration into DALI, Pytorch and other DL frameworks

Optical Flow SDK 1.1

slide-32
SLIDE 32

33

RESOURCES

Optical Flow SDK: https://developer.nvidia.com/opticalflow-sdk Support: video-devtech-support@nvidia.com Video & Optical Flow SDK forums: https://devtalk.nvidia.com/default/board/175/video-technologies/ Connect with Experts (CE9103): Wednesday, March 20, 2019, 3:00 pm

slide-33
SLIDE 33