JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS - - PowerPoint PPT Presentation

joint detection and segmentation with deep hierarchical
SMART_READER_LITE
LIVE PREVIEW

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS - - PowerPoint PPT Presentation

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS Zhao Chen Machine Learning Intern, NVIDIA ABOUT ME 5th year PhD student in physics @ S tanford by day, deep learning computer vision scientist by night. Intern with Deep


slide-1
SLIDE 1

Zhao Chen Machine Learning Intern, NVIDIA

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS

slide-2
SLIDE 2

2

ABOUT ME

  • 5th year PhD student in physics @

S tanford by day, deep learning computer vision scientist by night.

  • Intern with Deep Learning Applied

Research (Autonomous Vehicles) @ NVIDIA, Oct-Dec 2016.

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-3
SLIDE 3

3

TALK OVERVIEW

(1) Problem statement and summary. (2) Dataset and preliminaries. (3) Model motivation. (4) Results and visualizations.

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-4
SLIDE 4

4

TALK OVERVIEW

(1) Problem statement and summary. (2) Dataset and preliminaries. (3) Model motivation. (4) Results and visualizations.

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-5
SLIDE 5

5

FROM SINGLE TO MULTITASK LEARNING

Putting deep learning to work in the real world

Detection Model

. . .

S egmentation Model

. . .

Obj ect Bounding Boxes S egmentation Mask

slide-6
SLIDE 6

6

FROM SINGLE TO MULTITASK LEARNING

Putting deep learning to work in the real world

Detection Model

. . .

S egmentation Model

. . .

Obj ect Bounding Boxes S egmentation Mask Poor scalability + inefficient use of information!

slide-7
SLIDE 7

7

FROM SINGLE TO MULTITASK LEARNING

How do we use one model to perform multiple tasks faster and better?

Putting deep learning to work in the real world

S hared Model

. . .

Obj ect Bounding Boxes S egmentation Mask

slide-8
SLIDE 8

8

FROM SINGLE TO MULTITASK LEARNING

How do we use one model to perform multiple tasks faster and better?

Putting deep learning to work in the real world

S hared Model

. . .

Obj ect Bounding Boxes S egmentation Mask + edge detection, + surface normals, + distance estimation…

slide-9
SLIDE 9

9

FROM SINGLE TO MULTITASK LEARNING

How do we use one model to perform multiple tasks faster and better?

Putting deep learning to work in the real world

S hared Model

. . .

Obj ect Bounding Boxes S egmentation Mask How do you relate various tasks to each other in a multi-task neural network?

slide-10
SLIDE 10

10

WHAT WE WILL SHOW

  • By ordering tasks based on receptive field and information density, we improve

segmentation and detection accuracy by ~2% and ~8%

  • ver single networks,

respectively.

  • The j oint network is robust and easy to tune compared to non-hierarchical

baselines.

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-11
SLIDE 11

11

TALK OVERVIEW

(1) Problem statement and summary. (2) Dataset and preliminaries. (3) Model motivation. (4) Results and visualizations.

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-12
SLIDE 12

12

CITYSCAPES DATASET

  • 2975 Training Images @

resolution 1024 x 2048.

  • 20 classes for semantic segmentation, including 8 obj ect classes. Of these 8, 4 are

much more represented (car, bicycle, person, rider): the “ easy classes.”

  • Both segmentation, bounding box, and edge ground truth can be generated.

Raw Image Edge Detection S emantic S eg. Bounding Box

slide-13
SLIDE 13

13

HOW TO TRAIN A SEGMENTATION NETWORK

  • S

tandard FCN (S helhamer 2015) Architecture: Convolutions followed by a deconvolution to retrieve a pixel-dense prediction mask.

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-14
SLIDE 14

14

HOW TO TRAIN A DETECTION NETWORK

  • Network outputs confidence that a pixel lies near the center of an obj ect.
  • Points of high confidence produce bounding box coordinates.
  • Confidences are rougher than

full segmentation but robust to occlusion.

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-15
SLIDE 15

15

TALK OVERVIEW

(1) Problem statement and summary. (2) Dataset and preliminaries. (3) Model motivation. (4) Results and visualizations.

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-16
SLIDE 16

16

S hared Feature Map (from base CNN)

Input (1024 x 2048)

Deconv

Low-Res S eg Predictions (W x H x 20) Obj . Confidence Positions Bbox Coordinate Positions

L = αLseg + (1- α)Ldet

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-17
SLIDE 17

17

OUR BASELINE MODEL PERFORMANCE

S

  • eg. Weight
  • Det. Weight

(α controls how much attention we pay to segmentation vs detection at training)

= α

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-18
SLIDE 18

18

OUR BASELINE MODEL PERFORMANCE

S

  • eg. Weight
  • Det. Weight

(α controls how much attention we pay to segmentation vs detection at training)

= α

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-19
SLIDE 19

19

OUR BASELINE MODEL PERFORMANCE

S

  • eg. Weight
  • Det. Weight

(α controls how much attention we pay to segmentation vs detection at training)

= α

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-20
SLIDE 20

20

OUR BASELINE MODEL PERFORMANCE

S

  • eg. Weight
  • Det. Weight

(α controls how much attention we pay to segmentation vs detection at training)

= α

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-21
SLIDE 21

21

OUR BASELINE MODEL PERFORMANCE

S

  • eg. Weight
  • Det. Weight

(α controls how much attention we pay to segmentation vs detection at training)

= α

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-22
SLIDE 22

22

OUR BASELINE MODEL PERFORMANCE

S

  • eg. Weight
  • Det. Weight

(α controls how much attention we pay to segmentation vs detection at training)

= α

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-23
SLIDE 23

23

OUR BASELINE MODEL PERFORMANCE

S

  • eg. Weight
  • Det. Weight

(α controls how much attention we pay to segmentation vs detection at training)

= α

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-24
SLIDE 24

24

OUR BASELINE MODEL PERFORMANCE

S

  • eg. Weight
  • Det. Weight

(α controls how much attention we pay to segmentation vs detection at training)

= α

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-25
SLIDE 25

25

A LABEL HIERARCHY ALONG TWO AXES

Density of Information Required Receptive Field Obj ect Bounding Boxes

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-26
SLIDE 26

26

A LABEL HIERARCHY ALONG TWO AXES

Density of Information Required Receptive Field Obj ect Bounding Boxes Obj ect Confidence

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-27
SLIDE 27

27

A LABEL HIERARCHY ALONG TWO AXES

Density of Information Required Receptive Field Obj ect Bounding Boxes S emantic S egmentation Obj ect Confidence

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-28
SLIDE 28

28

A LABEL HIERARCHY ALONG TWO AXES

Density of Information Required Receptive Field Obj ect Bounding Boxes Edge Detection S emantic S egmentation Obj ect Confidence

(plus)

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-29
SLIDE 29

29

S hared Feature Map (from base CNN)

Input (1024 x 2048)

Deconv

Low-Res S eg Predictions (W x H x 20) Obj . Confidence Positions Bbox Coordinate Positions

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-30
SLIDE 30

30

S hared Feature Map (from base CNN)

Input (1024 x 2048) S egmentation Features

Deconv

Low-Res S eg Predictions (W x H x 20) Obj . Confidence Features Obj . Confidence Positions Obj . BBox Features Bbox Coordinate Positions

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-31
SLIDE 31

31

S hared Feature Map (from base CNN)

Input (1024 x 2048) S egmentation Features

Deconv

Low-Res S eg Predictions (W x H x 20) Obj . Confidence Features Obj . Confidence Positions Obj . BBox Features Bbox Coordinate Positions

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

Decreasing information density

slide-32
SLIDE 32

32

S hared Feature Map (from base CNN)

Edge Features

Deconv

Input (1024 x 2048) Low-Res Edge Predictions (W x H x 3) S egmentation Features

Deconv

Low-Res S eg Predictions (W x H x 20) Obj . Confidence Features Obj . Confidence Positions Obj . BBox Features Bbox Coordinate Positions

Decreasing information density

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-33
SLIDE 33

33

S hared Feature Map (from base CNN)

Edge Features

Deconv

Input (1024 x 2048) Low-Res Edge Predictions (W x H x 3) S egmentation Features

Deconv

Low-Res S eg Predictions (W x H x 20) Obj . Confidence Features Obj . Confidence Positions Obj . BBox Features Bbox Coordinate Positions

Decreasing information density

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-34
SLIDE 34

34

S hared Feature Map (from base CNN)

Edge Features

Deconv

Input (1024 x 2048) Low-Res Edge Predictions (W x H x 3) S egmentation Features

Deconv

Low-Res S eg Predictions (W x H x 20) Obj . Confidence Features Obj . Confidence Positions Obj . BBox Features Bbox Coordinate Positions

X

Decreasing information density

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-35
SLIDE 35

35

S hared Feature Map (from base CNN)

Edge Features

Deconv

Input (1024 x 2048) Low-Res Edge Predictions (W x H x 3) S egmentation Features

Deconv

Low-Res S eg Predictions (W x H x 20) Obj . Confidence Features Obj . Confidence Positions Obj . BBox Features Bbox Coordinate Positions

X

Increasing receptive field

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-36
SLIDE 36

36

S hared Feature Map (from base CNN)

Edge Features

Deconv

Input (1024 x 2048) Low-Res Edge Predictions (W x H x 3) S egmentation Features

Deconv

Low-Res S eg Predictions (W x H x 20) Obj . Confidence Features Obj . Confidence Positions Obj . BBox Features Dilated Bbox Coordinate Positions Dilated Convs

Increasing receptive field

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-37
SLIDE 37

37

S hared Feature Map (from base CNN)

Edge Features

Deconv

Input (1024 x 2048) Low-Res Edge Predictions (W x H x 3) S egmentation Features

Deconv

Low-Res S eg Predictions (W x H x 20) Obj . Confidence Features Obj . Confidence Positions Obj . BBox Features Dilated Bbox Coordinate Positions Dilated Convs

Deep Hierarchical Network (DHM)

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-38
SLIDE 38

38

TALK OVERVIEW

(1) Problem statement and summary. (2) Dataset and preliminaries. (3) Model motivation. (4) Results and visualizations.

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-39
SLIDE 39

39

RESULTS: HIGH ROBUSTNESS

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-40
SLIDE 40

40

RESULTS: HIGH ROBUSTNESS

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-41
SLIDE 41

41

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-42
SLIDE 42

42

Edge Predictions

RAW IMAGE

Segmentation Predictions Bounding Box Predictions

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-43
SLIDE 43

43

VISUALIZATIONS

SINGLE NETWORK DETECTION SEGMENTAITION DHM (OURS)

slide-44
SLIDE 44

44

VISUALIZATIONS

SINGLE NETWORK

SALIENCY (CAR)

SEGMENTAITION DHM (OURS)

slide-45
SLIDE 45

45

VISUALIZATIONS

SINGLE NETWORK DETECTION SEGMENTAITION DHM (OURS)

slide-46
SLIDE 46

46

VISUALIZATIONS

SINGLE NETWORK DETECTION SEGMENTAITION DHM (OURS)

slide-47
SLIDE 47

47

VISUALIZATIONS

SINGLE NETWORK

SALIENCY (BUS)

SEGMENTAITION DHM (OURS)

slide-48
SLIDE 48

48

VISUALIZATIONS

SINGLE NETWORK DETECTION SEGMENTAITION DHM (OURS)

slide-49
SLIDE 49

49

VISUALIZATIONS

SINGLE NETWORK DETECTION SEGMENTAITION DHM (OURS)

slide-50
SLIDE 50

50

VISUALIZATIONS

SINGLE NETWORK DETECTION SEGMENTAITION DHM (OURS)

slide-51
SLIDE 51

51

SUMMARY

  • Our two hierarchies within our model allow our network to reason about intra-

task relationships:

  • Information density: (S

eg +) Edge > S eg > Obj ect Conf > Bbox

  • Receptive field: (S

eg +) Edge = Bbox >> Obj ect Conf > S eg

  • With these relationships wired in, our network is:
  • More accurate
  • Robust to tuning
  • Simultaneously better at fine detail and more instance aware
  • Efficient and scalable (3 tasks, 1 network!)

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

slide-52
SLIDE 52

52

REFERENCES

  • J. Yao, S

. Fidler, and R. Urtasun. Describing the scene as a whole: Joint obj ect detection, scene classificationa and semantic segmentation. In CVPR, 2012.

  • S

. Gidaris and N. Komodakis. Obj ect detection via a multiregion and semantic segmentation-aware cnn

  • model. In ICCV, 2015.
  • B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik. S

imultaneous detection and segmentation. In ECCV, 2014.

  • S

. Liu, X. Qi, J. S hi, H. Zhang, and J. Jia. Multi-scale patch aggregation (mpa) for simultaneous detection and segmentation. In CVPR, 2016.

  • E. S

helhamer, J. Long, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015.

  • B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik. Hypercolumns for obj ect segmentation and fine-

grained localization. In CVPR, 2015.

  • J. Dai, K. He, and J. S
  • un. Instance-aware semantic segmentation via multi-task network cascades. In

https:/ / arxiv.org/ pdf/ 1512.04412.pdf, 2015.

slide-53
SLIDE 53

53

THANK YOU!

Special thanks to: My internship mentor: Jian Yao My managers: John Zedlewski and Andrew Tao All the wonderful people in DLAR/ DLAV. Additional questions/comments: zchen89@stanford.edu

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.