CNN Ba CNN Based ed Pi Pipeline peline for or Op Optical ical - - PowerPoint PPT Presentation

cnn ba cnn based ed pi pipeline peline for or
SMART_READER_LITE
LIVE PREVIEW

CNN Ba CNN Based ed Pi Pipeline peline for or Op Optical ical - - PowerPoint PPT Presentation

CNN Ba CNN Based ed Pi Pipeline peline for or Op Optical ical Fl Flow ow Tal Schuster, June 2017 Based on: PatchBatch: a Batch Augmented Loss for Optical Flow, (Gadot, Wolf) CVPR 2016 Optical Flow Requires Multiple Strategies (but only


slide-1
SLIDE 1

CNN Ba CNN Based ed Pi Pipeline peline for

  • r

Op Optical ical Fl Flow

  • w

Based on: PatchBatch: a Batch Augmented Loss for Optical Flow, (Gadot, Wolf) CVPR 2016 Optical Flow Requires Multiple Strategies (but only one network), (Schuster, Wolf, Gadot) CVPR 2017

1

Tal Schuster, June 2017

slide-2
SLIDE 2

Overv rview iew

Goal – Get SOTA results in main optical flow benchmarks Was done by:

  • Constructing a Deep Learning based pipeline (modular)
  • Architectures exploration
  • Loss function augmentations
  • Per-batch statistics
  • Learning methods

2

slide-3
SLIDE 3

Problem Definition

3

slide-4
SLIDE 4

Problem blem Defi finiti nition

  • n - Optic

ical al Flow

  • w

Given 2 images, compute a dense Optical Flow Field describing the motion between both images (i.e. pure re optical flow): 2 X (h,w,1 / 3) → (h,w,2) Where:

  • h - image height, w - image width
  • (h,w,1 / 3) - a grayscale or RGB image
  • (h,w,2) - a 3D tensor describing for each point (x,y) in image-A a 2D-flow vector: (Δ𝑦, Δ𝑧)

Accuracy measures:

  • Based on GT (synthetic or physically obtained) - KITTI, MPI-Sintel
  • F_err - % of pixels with euclidean error > z pixels (usually z=3)
  • Avg_err - mean of euclidean errors over all pixels

4

slide-5
SLIDE 5

DB DB - KITTI TTI20 2012 12

5

  • LIDAR based
  • ~50% coverage
slide-6
SLIDE 6

DB DB - KITTI TTI20 2012 12

6

slide-7
SLIDE 7

DB DB - KITTI TTI20 2015 15

7

slide-8
SLIDE 8

DB DB - MPI SINTEL NTEL

8

  • Synthetic (computer graphics)
  • ~100% coverage
slide-9
SLIDE 9

Solutions tions

Traditional computer vision methods

  • Global constraints (Horn-Schunk, 1981) – Brightness constancy + smoothness asm.
  • Local constraints (Lucas-Kanade, 1981)

Main disadvantage – small objects and fast movements Descriptor based methods

  • Sparse to dense (Brox-Malik, 2010)

Descriptors SIFT, SURF, HOG, DAISY, etc. (handcrafted) CNN methods

  • End to End – Flownet (Fischer et al., 2015)

9

slide-10
SLIDE 10

Refer erenc ence e Work k – Zbontar

  • ntar &

& Lecun, n, 2015 2015

Solving Stereo-Matching vs. Optical Flow Classification-based vs. metric learning To compute the classification, the network needs to

  • bserve both patches simultaneously

10

slide-11
SLIDE 11

The PatchBatch pipeline

11

slide-12
SLIDE 12

PatchBatch hBatch - DNN DNN

Siamese DNN - i.e., tied weights due to symmetry Leaky ReLU Should be FAST: Matching function = L2 Conv only Independent descriptor computation

12

slide-13
SLIDE 13

PatchBatch hBatch - Overall all Pipel eline ine

PatchMatch - Barnes et al. 2010 EpicFlow - Revaud et at. 2015

13

(Normalized) Keeping only large connected components

slide-14
SLIDE 14

PatchBatch hBatch - ANN ANN

PatchMatch:

(Descriptors, Matching function) → ANN ANN and not ENN : O(N^2) → O(N*logN) 2 iterations are enough

14 1. Initialization (random) 2. Propagation 𝑔 𝑦, 𝑧 = 𝑏𝑠𝑕𝑛𝑗𝑜 𝐸 𝑔 𝑦, 𝑧 , 𝐸 𝑔 𝑦 − 1, 𝑧 , 𝐸 𝑔 𝑦, 𝑧 − 1 (+1 on even iterations) 3. Search 𝑣𝑗 = 𝑤0 + 𝑥𝛽𝑗𝑆𝑗 4. Return to step 2

𝑆𝑗 ∈ −1,1 × [−1,1] 𝑥 - max radius 𝛽 - step (=

1 2)

slide-15
SLIDE 15

PatchBatch hBatch - Post Post-Proces Processi sing ng

EpicFlow (Edge-Preserving Interpolation of Correspondences)

Sparse -> Dense Average support affine transformations based on geodesic distance on top of edges map 15 SED alg.

slide-16
SLIDE 16

PatchBatch hBatch - CNN NN

16

Batch Normalization- Solves the “internal covariate shift” problem

Per pixel instead of per feature map

slide-17
SLIDE 17

PatchBatch hBatch - Loss Loss

  • DrLIM - Dimensionality Reduction by Learning an Invariant Mapping (LeCun, 2006)

Orig DrLIM (SPRING) CENT+SD 17 𝐸𝓍 𝑀 Negative pairs CENT CENT + SD CENT

slide-18
SLIDE 18

PatchBatch hBatch - Trai aining ning Method hod

Negative sample – random 1-8 pixels from the true match Data augmentation - flipping , rotating 90°

18

slide-19
SLIDE 19

Results

19

slide-20
SLIDE 20

Benchmar hmarks

20

slide-21
SLIDE 21

How can we Improv prove e the Results lts?

21

slide-22
SLIDE 22

Architecture Modifications

22

slide-23
SLIDE 23

PatchBatch hBatch - CNN NN

23

Increased Patch and Descriptor sizes

slide-24
SLIDE 24

Hinge ge Loss s with h SD

  • Hinge Loss

s instead of DrLIM

  • Trained on Triplets

ts - <A, B-match, B-non-match >

  • Keeping the additional SD c

comp mponent

24

slide-25
SLIDE 25

Failed led Attempts empts

Data augmentation

  • Rotations (random +/- 𝛽)
  • Colored patches (HSV or other decomposition)

Loss function

  • Foursome (A, B, A’, B’)

A, B – matching patches A’– Patch from Image A that is closest to B H(A,B) = max(0, m - 𝑀2(A,B))

L = 𝑀2(A,B) + Ι(B ≠ B’) * H(A,B’) + Ι(A ≠ A’) * H(A’,B)

Sample Mining

  • PatchMatch output
  • Patches distance
  • Descriptor distance

25

slide-26
SLIDE 26

Optical Flow as a Multifaceted Problem

26

slide-27
SLIDE 27

Success ess of methods hods

The challenge of larg rge disp splace ceme ments ts KITTI 2015 average error:

  • Foreground – 26.43 %
  • Background – 11.43 %

Possible causes:

  • Matching algorithm
  • Descriptors quality

27 MPI-Sintel results table PatchBatch on KITTI 2012 – distance between true matches

slide-28
SLIDE 28

Descri criptor ptors Evaluation uation

Defined a quality measurement of descriptors for matching 𝑒𝑞 is a distra tractor tor of pixel 𝑞 if the 𝑀2 distance between 𝑒𝑞 and 𝑞 is lower than the 𝑀2 distance of 𝑞 with its matching pixel descriptor. Counted up to 25 pixels from the examined pixel.

28

slide-29
SLIDE 29

Distrac tractor tors by displacement lacement

Amount of distractors increase with displacement range Goal: improve results for large displacements without reducing for other ranges.

29

slide-30
SLIDE 30

Distrac tractor tors by displacement lacement

Amount of distractors increase with displacement range Goal: improve results for large displacements without reducing for other ranges. Expert models

  • Training only on sub ranges
  • Improving results for large displacements is possible
  • Implies the need of differ

ferent t feature tures for differe ferent patch ches

30

slide-31
SLIDE 31

Learning with Varying Difficulty

31

slide-32
SLIDE 32

Gradual dual Learni arning ng Methods hods

Deal with varying difficulty Curriculum (Bengio et al. 2009):

  • Samples are pre-ordered
  • Curriculum by displacement
  • Curriculum by distance (of false sample)

Self-Paced (Kumar et al. 2010):

  • No need to pre-order
  • Sample hardness increases with time (by loss value)

Hard Mining (Simo-Serra et al. 2015):

  • Backpropagate only some ratio of harder samples
  • Used for training local descriptors with triplets

All methods did not improve match over baseline – Why?

32

slide-33
SLIDE 33

Need ed for r variant riant extractin tracting g strategies rategies

Large motions are mostly correlated with more changes in appearance: 1. Background changes 2. View point changes -> occluded parts 3. Distance and angle to light source -> illumination 4. Scale (when moving along the Z-axis)

34

slide-34
SLIDE 34

Learning for Multiple Strategies and Varying Difficulty

35

slide-35
SLIDE 35

Our Interleav erleaving ing Learni arning ng Method hod

Goal: Deal with mult ltiple iple sub-task tasks Learning ML models

  • Mostly in random order (SGD)
  • Applying gradual methods can effect randomness

Interl rleavi ving Learn rning

  • Maintai

taining the random m order of categories s while adjust sting the diffic ficulty ty Motivated by psychological research (Kornell - Bjork)

  • Massing vs. Interleaving
  • Experiments on classification tasks, sports, etc.

36

Learning Concepts and Categories – Kornell and Bjork (2008)

Classification: Painting to Artist Massing Interleaving

slide-36
SLIDE 36

Same e class s of objec ects? ts?

37

slide-37
SLIDE 37

Interleavi erleaving ng Learning rning for Optical al Flow

Controlling the negative sample to balance difficulty

38 Original method Interleaving

slide-38
SLIDE 38

Interleavi erleaving ng Learning rning for Optical al Flow

39

  • Drawing the line from 𝑞 improved matching results but did not effect the distractors measurement

(Due to PatchMatch initialization)

slide-39
SLIDE 39

Self Self-Pac Paced ed Curric rriculum ulum Interleav erleaving ing Learning rning (SPCI) CI)

𝑚𝑗 - validation loss on epoch 𝑗 𝑚𝑗𝑜𝑗𝑢 - initial loss value (epoch #5) 𝑛 – total epoch amount

40

slide-40
SLIDE 40

Experiments

41

slide-41
SLIDE 41

Optical cal Flow

42

slide-42
SLIDE 42

Results

43

slide-43
SLIDE 43

Benchmar hmarks s - KITTI TTI20 2012 12

44

slide-44
SLIDE 44

Benchmar hmarks s - KITTI TTI20 2015 15

45

slide-45
SLIDE 45

46

slide-46
SLIDE 46

47

slide-47
SLIDE 47

48

slide-48
SLIDE 48

Benchmar hmarks s - MPI MPI-Sintel Sintel

49

slide-49
SLIDE 49

50

slide-50
SLIDE 50

51

slide-51
SLIDE 51

Summary

52

slide-52
SLIDE 52

Summary mary

  • Computing Optical Flow as a matching problem with a modular pipeline
  • using a CNN to generate descriptors
  • Per-batch statistics (SD, batch normalization)
  • Interleaving Learning Method & SPCI

Referring difficulties while maintaining a random order of the categories

  • One model to generate descriptors for both small and large displacements

53

slide-53
SLIDE 53

THANK YOU!

54