[PPT] - Depth and Surface Normal Estimation from a Single Image Mian Wei PowerPoint Presentation

SLIDE 1

Mian Wei

University of Toronto

1

Depth and Surface Normal Estimation from a Single Image

SLIDE 2

2

Indirect-Invariant What is the problem?

SLIDE 3

3

Indirect-Invariant Given one image

SLIDE 4

4

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support

inference from RGBD images,” in Proc. Eur. Conf. Comput. Vision, 2012, pp. 746–760.

SLIDE 5

5

Indirect-Invariant Estimate the following:

SLIDE 6

6

Eigen, D. and Fergus, R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. ICCV 2015

SLIDE 7

7

Eigen, D. and Fergus, R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. ICCV 2015

SLIDE 8

8

Indirect-Invariant Why is this hard?

SLIDE 9

9

Indirect-Invariant Multiple ambiguities

SLIDE 10

10

Indirect-Invariant Scale ambiguity

SLIDE 11

11

SLIDE 12

12

SLIDE 13

13

Indirect-Invariant Bas-relief ambiguity

P. Belhumeur, D. Kriegman, and A. Yuille, “The Bas-Relief Ambiguity,” Proc. IEEE Conf.

Computer Vision and Pattern Recognition, pp. 1040-1046, 1997.

SLIDE 14

14

Indirect-Invariant Let’s play a game

SLIDE 15

15

Indirect-Invariant Spot the Difference

SLIDE 16

16

P. Belhumeur, D. Kriegman, and A. Yuille, “The Bas-Relief Ambiguity,” Proc. IEEE Conf.

Computer Vision and Pattern Recognition, pp. 1040-1046, 1997.

SLIDE 17

17

P. Belhumeur, D. Kriegman, and A. Yuille, “The Bas-Relief Ambiguity,” Proc. IEEE Conf.

Computer Vision and Pattern Recognition, pp. 1040-1046, 1997.

SLIDE 18

18

Indirect-Invariant All the same

SLIDE 19

19

P. Belhumeur, D. Kriegman, and A. Yuille, “The Bas-Relief Ambiguity,” Proc. IEEE Conf.

Computer Vision and Pattern Recognition, pp. 1040-1046, 1997.

SLIDE 20

20

P. Belhumeur, D. Kriegman, and A. Yuille, “The Bas-Relief Ambiguity,” Proc. IEEE Conf.

Computer Vision and Pattern Recognition, pp. 1040-1046, 1997.

SLIDE 21

21

Indirect-Invariant Family of transformation

SLIDE 22

22

Indirect-Invariant Generalized Bas-Relief

SLIDE 23

23

Indirect-Invariant Change shape and illumination

SLIDE 24

24

Indirect-Invariant Yield same image

SLIDE 25

25

Indirect-Invariant Existing works

SLIDE 26

26

Indirect-Invariant Multi-view Stereo

Hartley,R. and Zisserman, A. 2000. Multiple view geometry in computer vision, Cambridge University Press: Cambridge, UK.

SLIDE 27

27

SLIDE 28

28

SLIDE 29

29

Indirect-Invariant Photometric Stereo

Woodham, R.J. (1980), Photometric method for determining surface orientation from multiple images, Optical Engineering 19 (1) 139-144.

SLIDE 30

30

Indirect-Invariant Collimated Light Sources

SLIDE 31

31

Indirect-Invariant Light rays parallel

SLIDE 32

32

SLIDE 33

33

SLIDE 34

34

SLIDE 35

35

Indirect-Invariant Shape from Focus

S. Nayar and N. Yasuo, “Shape From Focus,” IEEE Trans. Pattern Analysis and Machine

Intelligence, vol. 16, no. 8, pp. 824-831, 1994.

SLIDE 36

36

SLIDE 37

37

SLIDE 38

38

SLIDE 39

39

SLIDE 40

40

SLIDE 41

41

SLIDE 42

42

SLIDE 43

43

SLIDE 44

44

SLIDE 45

45

SLIDE 46

46

SLIDE 47

47

SLIDE 48

48

Indirect-Invariant Light Fall-off Stereo

M. Liao, L. Wang, R. Yang, and M. Gong. Light fall-off stereo. In Proceedings of CVPR,

pages 1–8, 2007.

SLIDE 49

49

SLIDE 50

50

SLIDE 51

51

Indirect-Invariant Specialized Hardware

SLIDE 52

52

Indirect-Invariant Laser Scanner

SLIDE 53

53

Indirect-Invariant Active Illumination

SLIDE 54

54

Indirect-Invariant Time of Flight

SLIDE 55

55

Indirect-Invariant Estimating Depth

D. Eigen, C. Puhrsch, and R. Fergus. Depth map prediction from a single image using a

multi-scale deep network. NIPS 2014

SLIDE 56

56

SLIDE 57

57

Indirect-Invariant Train 2 networks

SLIDE 58

58

Indirect-Invariant Global coarse-scale network

SLIDE 59

59

Indirect-Invariant Local fine-scale network

SLIDE 60

60

Indirect-Invariant Global coarse-scale network

SLIDE 61

61

Indirect-Invariant Learns a coarse depth map

SLIDE 62

62

SLIDE 63

63

SLIDE 64

64

SLIDE 65

65

Indirect-Invariant Used as input to local network

SLIDE 66

66

Indirect-Invariant Intuition:

SLIDE 67

67

Indirect-Invariant Coarse info learnt already

SLIDE 68

68

Indirect-Invariant Focus on learning finer info

SLIDE 69

69

SLIDE 70

70

SLIDE 71

71

Indirect-Invariant Scale ambiguity

SLIDE 72

72

Indirect-Invariant Scale invariant error function

SLIDE 73

D(y, y) = 1 2n (log yi − log y

i i=1 n

∑

+α(yi, y*

i))2

73

α(yi, y*

i) = 1

n (log y*

i − log yi) i=1 n

∑

SLIDE 74

D(ay,ay*) = 1 2n (logayi − logay*

i i=1 n

∑

+α(ayi,ay*

i))2

D(ay,ay*) = 1 2n (loga − loga + log yi − log y*

i i=1 n

∑

+α(ayi,ay*

i))2

D(ay,ay*) = 1 2n (log yi − log y*

i i=1 n

∑

+ loga − loga +α(yi, y*

i))2

D(ay,ay*) = D(y, y*)

74

SLIDE 75

75

Indirect-Invariant Loss Function

SLIDE 76

76

Indirect-Invariant Scale invariant

SLIDE 77

77

L(y, y*) = 1 n d 2

i i=1 n

∑

− λ n2 ( di

i=1 n

∑

)2 di = log yi − log y*

i

SLIDE 78

78

Indirect-Invariant 2 Datasets

SLIDE 79

79

Indirect-Invariant NYUDepthV2

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support

inference from RGBD images,” in Proc. Eur. Conf. Comput. Vision, 2012, pp. 746–760.

SLIDE 80

80

Indirect-Invariant Indoor Rooms

SLIDE 81

81

SLIDE 82

82

Indirect-Invariant KITTI

A. Greiger, P. Lenz, C. Stiller, and R. Urtasun. Vision meets robotics: The kitti dataset.

International Journal of Robotics Research (IJRR). 2013.

SLIDE 83

83

Indirect-Invariant Outdoor images taken on a car

SLIDE 84

84

SLIDE 85

85

Indirect-Invariant How do you get ground truth?

SLIDE 86

86

Indirect-Invariant NYUDepthV2

SLIDE 87

87

Indirect-Invariant Kinect

SLIDE 88

88

SLIDE 89

89

Indirect-Invariant KITTI

SLIDE 90

90

SLIDE 91

91

Indirect-Invariant Time of Flight

SLIDE 92

92

Indirect-Invariant Times how long light travels

SLIDE 93

93

Indirect-Invariant From light source to camera

SLIDE 94

94

Indirect-Invariant Results

SLIDE 95

95

SLIDE 96

96

SLIDE 97

97

SLIDE 98

98

Indirect-Invariant Estimating Surface Normals

X. Wang, D. F. Fouhey, and A. Gupta. Designing deep networks for surface normal
estimation. CVPR 2015

SLIDE 99

99

Indirect-Invariant Similar to Eigen

SLIDE 100

100

SLIDE 101

101

Indirect-Invariant Trains 3 networks

SLIDE 102

102

Indirect-Invariant Global coarse-scale network

SLIDE 103

103

SLIDE 104

104

Indirect-Invariant Trains for room layout as well

SLIDE 105

105

SLIDE 106

106

Indirect-Invariant Local fine-scale network

SLIDE 107

107

SLIDE 108

108

Indirect-Invariant Trains for edge labels as well

SLIDE 109

109

Indirect-Invariant Convex, concave, occlusion, N/A

SLIDE 110

110

SLIDE 111

111

Indirect-Invariant Difference: Global and Local trained separately

SLIDE 112

112

Indirect-Invariant Fusion Network

SLIDE 113

113

Indirect-Invariant Combines both networks

SLIDE 114

114

Indirect-Invariant How to represent normals

SLIDE 115

115

Indirect-Invariant Normals lie in continuous space

SLIDE 116

116

Indirect-Invariant Regression as Classification

SLIDE 117

117

Indirect-Invariant Surface normal triangular coding

SLIDE 118

118

SLIDE 119

119

Indirect-Invariant Codebook with k-means

SLIDE 120

120

SLIDE 121

121

Indirect-Invariant Delaunay Triangulation cover

SLIDE 122

122

SLIDE 123

123

Indirect-Invariant Triangles as classes

SLIDE 124

124

Indirect-Invariant Represent Surface Normals

SLIDE 125

125

Indirect-Invariant Weighted sum of triangle corners

SLIDE 126

126

Indirect-Invariant Loss Function

SLIDE 127

L(I,Y) = − (

k=1 K

∑

i=1 M×M

∑

1(yi = k)logF

i,k(I))

127

SLIDE 128

128

SLIDE 129

129

SLIDE 130

130

Indirect-Invariant Thoughts

SLIDE 131

131

Indirect-Invariant Do not address bas-relief

SLIDE 132

132

Indirect-Invariant Incorporate Computer Graphics

SLIDE 133

133

Indirect-Invariant Inverse problem

SLIDE 134

134

Indirect-Invariant Given surface normals

SLIDE 135

135

Indirect-Invariant How should the scene look?

SLIDE 136

136

Indirect-Invariant What is the correct image?

SLIDE 137

137

Indirect-Invariant Incorporate image formation model

SLIDE 138

138

Depth and Surface Normal Estimation from a Single Image

Indirect-Invariant What is the problem?

Indirect-Invariant Given one image

Indirect-Invariant Estimate the following:

Indirect-Invariant Why is this hard?

Indirect-Invariant Multiple ambiguities

Indirect-Invariant Scale ambiguity

Indirect-Invariant Bas-relief ambiguity

Indirect-Invariant Let’s play a game

Indirect-Invariant Spot the Difference

Indirect-Invariant All the same

Indirect-Invariant Family of transformation

Indirect-Invariant Generalized Bas-Relief

Indirect-Invariant Change shape and illumination

Indirect-Invariant Yield same image

Indirect-Invariant Existing works

Indirect-Invariant Multi-view Stereo

Indirect-Invariant Photometric Stereo

Indirect-Invariant Collimated Light Sources

Indirect-Invariant Light rays parallel

Indirect-Invariant Shape from Focus

Indirect-Invariant Light Fall-off Stereo

Indirect-Invariant Specialized Hardware

Indirect-Invariant Laser Scanner

Indirect-Invariant Active Illumination

Indirect-Invariant Time of Flight

Indirect-Invariant Estimating Depth

Indirect-Invariant Train 2 networks

Indirect-Invariant Global coarse-scale network

Indirect-Invariant Local fine-scale network

Indirect-Invariant Global coarse-scale network

Indirect-Invariant Learns a coarse depth map

Indirect-Invariant Used as input to local network

Indirect-Invariant Intuition:

Indirect-Invariant Coarse info learnt already

Indirect-Invariant Focus on learning finer info

Indirect-Invariant Scale ambiguity

Indirect-Invariant Scale invariant error function

D(y, y*) = 1 2n (log yi − log y*

∑

+α(yi, y*

α(yi, y*

n (log y*

∑

∑

∑

∑

Indirect-Invariant Loss Function

Indirect-Invariant Scale invariant

L(y, y*) = 1 n d 2

∑

− λ n2 ( di

∑

)2 di = log yi − log y*

Indirect-Invariant 2 Datasets

Indirect-Invariant NYUDepthV2

Indirect-Invariant Indoor Rooms

Indirect-Invariant KITTI

Indirect-Invariant Outdoor images taken on a car

Indirect-Invariant How do you get ground truth?

Indirect-Invariant NYUDepthV2

Indirect-Invariant Kinect

Indirect-Invariant KITTI

Indirect-Invariant Time of Flight

Indirect-Invariant Times how long light travels

Indirect-Invariant From light source to camera

Indirect-Invariant Results

Indirect-Invariant Estimating Surface Normals

Indirect-Invariant Similar to Eigen

Indirect-Invariant Trains 3 networks

Indirect-Invariant Global coarse-scale network

Indirect-Invariant Trains for room layout as well

Indirect-Invariant Local fine-scale network

Indirect-Invariant Trains for edge labels as well

Indirect-Invariant Convex, concave, occlusion, N/A

Indirect-Invariant Difference: Global and Local trained separately

Indirect-Invariant Fusion Network

Indirect-Invariant Combines both networks

Indirect-Invariant How to represent normals

Indirect-Invariant Normals lie in continuous space

D(y, y) = 1 2n (log yi − log y