[PPT] - JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS PowerPoint Presentation

SLIDE 1

Zhao Chen Machine Learning Intern, NVIDIA

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS

SLIDE 2

2

ABOUT ME

5th year PhD student in physics @

S tanford by day, deep learning computer vision scientist by night.

Intern with Deep Learning Applied

Research (Autonomous Vehicles) @ NVIDIA, Oct-Dec 2016.

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 3

3

TALK OVERVIEW

(1) Problem statement and summary. (2) Dataset and preliminaries. (3) Model motivation. (4) Results and visualizations.

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 4

4

TALK OVERVIEW

(1) Problem statement and summary. (2) Dataset and preliminaries. (3) Model motivation. (4) Results and visualizations.

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 5

5

FROM SINGLE TO MULTITASK LEARNING

Putting deep learning to work in the real world

Detection Model

. . .

S egmentation Model

. . .

Obj ect Bounding Boxes S egmentation Mask

SLIDE 6

6

FROM SINGLE TO MULTITASK LEARNING

Putting deep learning to work in the real world

Detection Model

. . .

S egmentation Model

. . .

Obj ect Bounding Boxes S egmentation Mask Poor scalability + inefficient use of information!

SLIDE 7

7

FROM SINGLE TO MULTITASK LEARNING

How do we use one model to perform multiple tasks faster and better?

Putting deep learning to work in the real world

S hared Model

. . .

Obj ect Bounding Boxes S egmentation Mask

SLIDE 8

8

FROM SINGLE TO MULTITASK LEARNING

How do we use one model to perform multiple tasks faster and better?

Putting deep learning to work in the real world

S hared Model

. . .

Obj ect Bounding Boxes S egmentation Mask + edge detection, + surface normals, + distance estimation…

SLIDE 9

9

FROM SINGLE TO MULTITASK LEARNING

How do we use one model to perform multiple tasks faster and better?

Putting deep learning to work in the real world

S hared Model

. . .

Obj ect Bounding Boxes S egmentation Mask How do you relate various tasks to each other in a multi-task neural network?

SLIDE 10

10

WHAT WE WILL SHOW

By ordering tasks based on receptive field and information density, we improve

segmentation and detection accuracy by ~2% and ~8%

ver single networks,

respectively.

The j oint network is robust and easy to tune compared to non-hierarchical

baselines.

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 11

11

TALK OVERVIEW

(1) Problem statement and summary. (2) Dataset and preliminaries. (3) Model motivation. (4) Results and visualizations.

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 12

12

CITYSCAPES DATASET

2975 Training Images @

resolution 1024 x 2048.

20 classes for semantic segmentation, including 8 obj ect classes. Of these 8, 4 are

much more represented (car, bicycle, person, rider): the “ easy classes.”

Both segmentation, bounding box, and edge ground truth can be generated.

Raw Image Edge Detection S emantic S eg. Bounding Box

SLIDE 13

13

HOW TO TRAIN A SEGMENTATION NETWORK

S

tandard FCN (S helhamer 2015) Architecture: Convolutions followed by a deconvolution to retrieve a pixel-dense prediction mask.

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 14

14

HOW TO TRAIN A DETECTION NETWORK

Network outputs confidence that a pixel lies near the center of an obj ect.
Points of high confidence produce bounding box coordinates.
Confidences are rougher than

full segmentation but robust to occlusion.

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 15

15

TALK OVERVIEW

(1) Problem statement and summary. (2) Dataset and preliminaries. (3) Model motivation. (4) Results and visualizations.

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 16

16

S hared Feature Map (from base CNN)

Input (1024 x 2048)

Deconv

Low-Res S eg Predictions (W x H x 20) Obj . Confidence Positions Bbox Coordinate Positions

L = αLseg + (1- α)Ldet

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 17

17

OUR BASELINE MODEL PERFORMANCE

S

eg. Weight
Det. Weight

(α controls how much attention we pay to segmentation vs detection at training)

= α

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 18

18

OUR BASELINE MODEL PERFORMANCE

S

eg. Weight
Det. Weight

(α controls how much attention we pay to segmentation vs detection at training)

= α

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 19

19

OUR BASELINE MODEL PERFORMANCE

S

eg. Weight
Det. Weight

(α controls how much attention we pay to segmentation vs detection at training)

= α

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 20

20

OUR BASELINE MODEL PERFORMANCE

S

eg. Weight
Det. Weight

(α controls how much attention we pay to segmentation vs detection at training)

= α

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 21

21

OUR BASELINE MODEL PERFORMANCE

S

eg. Weight
Det. Weight

(α controls how much attention we pay to segmentation vs detection at training)

= α

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 22

22

OUR BASELINE MODEL PERFORMANCE

S

eg. Weight
Det. Weight

(α controls how much attention we pay to segmentation vs detection at training)

= α

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 23

23

OUR BASELINE MODEL PERFORMANCE

S

eg. Weight
Det. Weight

(α controls how much attention we pay to segmentation vs detection at training)

= α

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 24

24

OUR BASELINE MODEL PERFORMANCE

S

eg. Weight
Det. Weight

(α controls how much attention we pay to segmentation vs detection at training)

= α

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 25

25

A LABEL HIERARCHY ALONG TWO AXES

Density of Information Required Receptive Field Obj ect Bounding Boxes

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 26

26

A LABEL HIERARCHY ALONG TWO AXES

Density of Information Required Receptive Field Obj ect Bounding Boxes Obj ect Confidence

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 27

27

A LABEL HIERARCHY ALONG TWO AXES

Density of Information Required Receptive Field Obj ect Bounding Boxes S emantic S egmentation Obj ect Confidence

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 28

28

A LABEL HIERARCHY ALONG TWO AXES

Density of Information Required Receptive Field Obj ect Bounding Boxes Edge Detection S emantic S egmentation Obj ect Confidence

(plus)

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 29

29

S hared Feature Map (from base CNN)

Input (1024 x 2048)

Deconv

Low-Res S eg Predictions (W x H x 20) Obj . Confidence Positions Bbox Coordinate Positions

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 30

30

S hared Feature Map (from base CNN)

Input (1024 x 2048) S egmentation Features

Deconv

Low-Res S eg Predictions (W x H x 20) Obj . Confidence Features Obj . Confidence Positions Obj . BBox Features Bbox Coordinate Positions

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 31

31

S hared Feature Map (from base CNN)

Input (1024 x 2048) S egmentation Features

Deconv

Low-Res S eg Predictions (W x H x 20) Obj . Confidence Features Obj . Confidence Positions Obj . BBox Features Bbox Coordinate Positions

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

Decreasing information density

SLIDE 32

32

S hared Feature Map (from base CNN)

Edge Features

Deconv

Input (1024 x 2048) Low-Res Edge Predictions (W x H x 3) S egmentation Features

Deconv

Low-Res S eg Predictions (W x H x 20) Obj . Confidence Features Obj . Confidence Positions Obj . BBox Features Bbox Coordinate Positions

Decreasing information density

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 33

33

S hared Feature Map (from base CNN)

Edge Features

Deconv

Input (1024 x 2048) Low-Res Edge Predictions (W x H x 3) S egmentation Features

Deconv

Low-Res S eg Predictions (W x H x 20) Obj . Confidence Features Obj . Confidence Positions Obj . BBox Features Bbox Coordinate Positions

Decreasing information density

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 34

34

S hared Feature Map (from base CNN)

Edge Features

Deconv

Input (1024 x 2048) Low-Res Edge Predictions (W x H x 3) S egmentation Features

Deconv

Low-Res S eg Predictions (W x H x 20) Obj . Confidence Features Obj . Confidence Positions Obj . BBox Features Bbox Coordinate Positions

X

Decreasing information density

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 35

35

S hared Feature Map (from base CNN)

Edge Features

Deconv

Input (1024 x 2048) Low-Res Edge Predictions (W x H x 3) S egmentation Features

Deconv

Low-Res S eg Predictions (W x H x 20) Obj . Confidence Features Obj . Confidence Positions Obj . BBox Features Bbox Coordinate Positions

X

Increasing receptive field

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 36

36

S hared Feature Map (from base CNN)

Edge Features

Deconv

Input (1024 x 2048) Low-Res Edge Predictions (W x H x 3) S egmentation Features

Deconv

Low-Res S eg Predictions (W x H x 20) Obj . Confidence Features Obj . Confidence Positions Obj . BBox Features Dilated Bbox Coordinate Positions Dilated Convs

Increasing receptive field

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 37

37

S hared Feature Map (from base CNN)

Edge Features

Deconv

Input (1024 x 2048) Low-Res Edge Predictions (W x H x 3) S egmentation Features

Deconv

Low-Res S eg Predictions (W x H x 20) Obj . Confidence Features Obj . Confidence Positions Obj . BBox Features Dilated Bbox Coordinate Positions Dilated Convs

Deep Hierarchical Network (DHM)

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 38

38

TALK OVERVIEW

(1) Problem statement and summary. (2) Dataset and preliminaries. (3) Model motivation. (4) Results and visualizations.

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 39

39

RESULTS: HIGH ROBUSTNESS

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 40

40

RESULTS: HIGH ROBUSTNESS

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 41

41

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 42

42

Edge Predictions

RAW IMAGE

Segmentation Predictions Bounding Box Predictions

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 43

43

VISUALIZATIONS

SINGLE NETWORK DETECTION SEGMENTAITION DHM (OURS)

SLIDE 44

44

VISUALIZATIONS

SINGLE NETWORK

SALIENCY (CAR)

SEGMENTAITION DHM (OURS)

SLIDE 45

45

VISUALIZATIONS

SINGLE NETWORK DETECTION SEGMENTAITION DHM (OURS)

SLIDE 46

46

VISUALIZATIONS

SINGLE NETWORK DETECTION SEGMENTAITION DHM (OURS)

SLIDE 47

47

VISUALIZATIONS

SINGLE NETWORK

SALIENCY (BUS)

SEGMENTAITION DHM (OURS)

SLIDE 48

48

VISUALIZATIONS

SINGLE NETWORK DETECTION SEGMENTAITION DHM (OURS)

SLIDE 49

49

VISUALIZATIONS

SINGLE NETWORK DETECTION SEGMENTAITION DHM (OURS)

SLIDE 50

50

VISUALIZATIONS

SINGLE NETWORK DETECTION SEGMENTAITION DHM (OURS)

SLIDE 51

51

SUMMARY

Our two hierarchies within our model allow our network to reason about intra-

task relationships:

Information density: (S

eg +) Edge > S eg > Obj ect Conf > Bbox

Receptive field: (S

eg +) Edge = Bbox >> Obj ect Conf > S eg

With these relationships wired in, our network is:
More accurate
Robust to tuning
Simultaneously better at fine detail and more instance aware
Efficient and scalable (3 tasks, 1 network!)

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.

SLIDE 52

52

REFERENCES

J. Yao, S

. Fidler, and R. Urtasun. Describing the scene as a whole: Joint obj ect detection, scene classificationa and semantic segmentation. In CVPR, 2012.

S

. Gidaris and N. Komodakis. Obj ect detection via a multiregion and semantic segmentation-aware cnn

model. In ICCV, 2015.
B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik. S

imultaneous detection and segmentation. In ECCV, 2014.

S

. Liu, X. Qi, J. S hi, H. Zhang, and J. Jia. Multi-scale patch aggregation (mpa) for simultaneous detection and segmentation. In CVPR, 2016.

E. S

helhamer, J. Long, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015.

B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik. Hypercolumns for obj ect segmentation and fine-

grained localization. In CVPR, 2015.

J. Dai, K. He, and J. S
un. Instance-aware semantic segmentation via multi-task network cascades. In

https:/ / arxiv.org/ pdf/ 1512.04412.pdf, 2015.

SLIDE 53

53

THANK YOU!

Special thanks to: My internship mentor: Jian Yao My managers: John Zedlewski and Andrew Tao All the wonderful people in DLAR/ DLAV. Additional questions/comments: zchen89@stanford.edu

Zhao Chen, Joint Det ect ion and S egment at ion with Deep Hierarchical Net works, GTC 2017.