Segmentation and Matting Xiaoyong Shen The Chinese University of - - PowerPoint PPT Presentation

โ–ถ
segmentation and matting
SMART_READER_LITE
LIVE PREVIEW

Segmentation and Matting Xiaoyong Shen The Chinese University of - - PowerPoint PPT Presentation

Automatic Portrait Segmentation and Matting Xiaoyong Shen The Chinese University of Hong Kong goodshenxy@gmail.com Research on CV Pixel based (low level/ early vision) Filtering, restoration, denoise, enhancement, deblur, editing,


slide-1
SLIDE 1

Automatic Portrait Segmentation and Matting

Xiaoyong Shen

The Chinese University of Hong Kong goodshenxy@gmail.com

slide-2
SLIDE 2

Research on CV

  • Pixel based (low level/ early vision)
  • Filtering, restoration, denoise, enhancement, deblur,

editing, dehaze, etc.

  • Region/ Patch based (Middle level vision)
  • Matching, optical flow, stereo matching, tracking,

segmentation, etc.

  • Object/ Semantic based (high level vision)
  • Semantic segmentation, Object detection, image

classification, recognition, etc.

slide-3
SLIDE 3

My Research on CV

  • Pixel based (low level vision)
  • Filtering, restoration, denoise, enhancement, deblur,

editing, dehaze, etc.

  • Region/ Patch based (Middle level vision)
  • Matching, optical flow, stereo matching, tracking,

segmentation, etc.

  • Object based (high level vision)
  • Semantic segmentation, Object detection, image

classification, recognition, etc.

slide-4
SLIDE 4
slide-5
SLIDE 5

Multi-Spectral Image Restoration

  • Input
  • Noisy RGB image I0
  • E.g. captured at night
  • Clean guidance image G
  • E.g. dark-flashed NIR, or flashed RGB images
  • Output
  • Denoised image I
  • Structures are clear as guidance G.
  • Appearance is the same as image I0.
  • Shadow/Highlight does not affect.

5

[TPAMI 2015]

slide-6
SLIDE 6

Scale Map

  • Given ๐ฝโˆ— โ€“ the expected ground truth noise-

free image, our scale map s is defined under the following condition min ๐›ผ๐ฝโˆ— โˆ’ ๐‘ก๐›ผ๐ป

  • It adapts structures of ๐ป to that of I*.
  • It is an ideal ratio map between ๐›ผ๐ป and ๐›ผ๐ฝโˆ—.

6

slide-7
SLIDE 7

Result

7

Our Result Ground Truth Input Noisy Image Input NIR Image

slide-8
SLIDE 8

RGB Input I

8

slide-9
SLIDE 9

NIR Input G

9

slide-10
SLIDE 10

BM3D

10

slide-11
SLIDE 11

Our Result

11

slide-12
SLIDE 12

Mutual-Structure Filter

[ICCV 2015 Oral Presentation]

slide-13
SLIDE 13

Depth/RGB Restoration

Noisy Depth

slide-14
SLIDE 14

Depth/RGB Restoration

Noisy RGB Image

slide-15
SLIDE 15

Depth/RGB Restoration

Ground truth

slide-16
SLIDE 16

Depth/RGB Restoration

Ours PSNR = 37.19

slide-17
SLIDE 17

Rolling Guidance Filter

One line code only: ๐ฝ๐‘ข+1 = ๐พ๐บ(๐ฝ0, ๐ฝ๐‘ข)

[ECCV 2014 Oral Presentation]

slide-18
SLIDE 18

Texture Removal

18

slide-19
SLIDE 19

Halftone Image

19

slide-20
SLIDE 20

De-Filter

One line code only: ๐ฝ๐‘ข+1 = ๐ฝ๐‘ข + (๐ฝ0 โˆ’ ๐บ(๐ฝ๐‘ข))

slide-21
SLIDE 21

Reverse Skin Retouch

Retouched input

slide-22
SLIDE 22

Reverse Skin Retouch

Reversed

slide-23
SLIDE 23

Reverse Skin Retouch

Before retouch

slide-24
SLIDE 24
slide-25
SLIDE 25

Multi-Spectral Matching

  • Match general multi-spectral images with

significant displacement and obvious structure inconsistency

Different Exposures RGB/Depth RGB/NIR Flash/No-flash

slide-26
SLIDE 26

Result

  • Match RGB/NIR image pair

Inputs Our Result Blended

slide-27
SLIDE 27

Applications

  • HDR construction

Without Alignment With Alignment Constructed HDR

slide-28
SLIDE 28

Internet Image Matching

Reference Input Dense Correspondences ? Exist Correspondence No Correspondence [SIGGRAPH ASIA 2016]

slide-29
SLIDE 29

Our Motivation

Reference Input Dense Correspondences ? Foremost Region Matching

slide-30
SLIDE 30

Time-lapse Generation

slide-31
SLIDE 31

Automatic Morphing

slide-32
SLIDE 32

Automatic Morphing

slide-33
SLIDE 33

Object-based Matching

Achieve higher accuracy with the help of object (person)

slide-34
SLIDE 34

Object-based Matching

State-of-the-art Ours

slide-35
SLIDE 35
slide-36
SLIDE 36

Classification and Segmentation

  • Fine-grained Classification
  • DeepLAC (CVPR 2015)
  • Text detection and recognition
  • Semantic object segmentation
  • Portrait segmentation and matting
  • VOC challenge
slide-37
SLIDE 37

Automatic Portrait Segmentation

slide-38
SLIDE 38

Motivation

  • Abundant portraits in smartphone photos

38

Portrait, 30% Others, 70%

Samsung UK

Portrait, 90% Others, 10%

Symon Whitehorn from HTC

slide-39
SLIDE 39

Portrait Post-processing

39

slide-40
SLIDE 40

Foreground Selection

40

slide-41
SLIDE 41

Quick Selection

41

slide-42
SLIDE 42

Automatic Segmentation

42

Automatic?

slide-43
SLIDE 43

Challenges

43

Similar Color Complex Background Various Accessories Low Contrast Diverse Pose Complicated Edges

slide-44
SLIDE 44

Possible Solutions

  • Graph-cut with face tracker

44

slide-45
SLIDE 45

Possible Solutions

  • CNNs for semantic segmentation

45

slide-46
SLIDE 46

Most Related Work

  • Interactive Image Selection
  • Lazy snapping [Li et al. 2004]
  • Grabcut [Rother et al. 2004]
  • Paint Selection [Li et al. 2009]
  • CNNs for Semantic Object Segmentation
  • FCN [Long et al. 2014]
  • DeepLab [Chen et al. 2014]
  • CRFasRNN [Zheng et al. 2015]
  • Image Matting
  • Bayesian matting [Chuang et al. 2001]
  • Closed-form matting [Levin et al. 2008]
  • KNN matting [Chen et al. 2013]

46

slide-47
SLIDE 47

Our Approach

47

PortraitFCN and PortraitFCN+

slide-48
SLIDE 48

Our System

48

Detector Conv ReLU Pooling Conv Conv Pooling ReLU DeConv Mask

[Long et al. 2015]

PortraitFCN Model

RGB Channels 2 Outputs

slide-49
SLIDE 49

PortraitFCN

49

  • Fine tune it from original FCN-8s model

Portrait Knowledge

slide-50
SLIDE 50

PortraitFCN+

50

Detector Conv ReLU Pooling Conv Conv Pooling ReLU DeConv Mask

[Long et al. 2015]

PortraitFCN+ Model

RGB+Shape+Position 2 Outputs Shape Position

slide-51
SLIDE 51

Shape Channel

51

โ€ฆโ€ฆ

Labeled Masks

Align

Canonical Pose

Mean

Shape Channel

๐‘ = ฯƒ๐‘— ๐‘ฅ๐‘— โˆ˜ ๐‘ˆ๐‘—(๐‘๐‘—) ฯƒ๐‘— ๐‘ฅ๐‘— Align Test Image

slide-52
SLIDE 52

Position Channel

52

Canonical Pose x- Coordinate y- Coordinate Position Test Image

Align

slide-53
SLIDE 53

Effectiveness

53

Input

slide-54
SLIDE 54

Effectiveness

54

PortraitFCN

slide-55
SLIDE 55

Effectiveness

55

PortraitFCN+

slide-56
SLIDE 56

Experiments and Applications

56

slide-57
SLIDE 57

Our Dataset

  • 1,800 portraits from Flickr with labeled mask
  • 1500 portraits as the training data
  • 300 for testing
  • Large variations on portrait types
  • Age, color, background, clothing, accessories, head

position, hair style, lighting, etc.

57

slide-58
SLIDE 58

58

slide-59
SLIDE 59

Training

  • Fine turn the model starting from FCN-8s
  • Synthesize more data with different transforms
  • Using the person class and background weights
  • Find the best learning rate
  • Loss
  • accuracy

59

slide-60
SLIDE 60

Find the Best LR

60

slide-61
SLIDE 61

Evaluation

61

Methods Mean IoU (%) Graph-cut 80.02 FCN (Person Class) 73.09

IoU = area(output โˆฉ ground truth) area(output โˆช ground truth)

slide-62
SLIDE 62

Evaluation

62

Methods Mean IoU (%) Graph-cut 80.02 FCN (Person Class) 73.09 PortraitFCN 94.20

IoU = area(output โˆฉ ground truth) area(output โˆช ground truth)

slide-63
SLIDE 63

Evaluation

63

Methods Mean IoU (%) Graph-cut 80.02 FCN (Person Class) 73.09 PortraitFCN 94.20 PortraitFCN+ (Only with Mean Mask) 94.89 PortraitFCN+ (Only with Normalized x and y) 94.61

IoU = area(output โˆฉ ground truth) area(output โˆช ground truth)

slide-64
SLIDE 64

Evaluation

64

Methods Mean IoU (%) Graph-cut 80.02 FCN (Person Class) 73.09 PortraitFCN 94.20 PortraitFCN+ (Only with Mean Mask) 94.89 PortraitFCN+ (Only with Normalized x and y) 94.61 PortraitFCN+ 95.91

IoU = area(output โˆฉ ground truth) area(output โˆช ground truth)

slide-65
SLIDE 65

Comparisons

65

Input

slide-66
SLIDE 66

Comparisons

66

Ground Truth

slide-67
SLIDE 67

Comparisons

67

Graph-cut

slide-68
SLIDE 68

Comparisons

68

FCN-8s (Person)

slide-69
SLIDE 69

Comparisons

69

PortraitFCN

slide-70
SLIDE 70

Comparisons

70

PortraitFCN+

slide-71
SLIDE 71

Comparisons

71

Input Ground Truth

IoU = 0.83 IoU = 0.42 IoU = 0.91 IoU = 0.85

FCN-8s Graph-cut

IoU = 0.99 IoU = 0.98

Ours

slide-72
SLIDE 72

Comparisons

72

Input Ground Truth

IoU = 0.77 IoU = 0.95 IoU = 0.38 IoU = 0.84

FCN-8s Graph-cut

IoU = 0.98 IoU = 0.98

Ours

slide-73
SLIDE 73

Comparisons

73

Input Ground Truth

IoU = 0.83 IoU = 0.53 IoU = 0.81 IoU = 0.89

FCN-8s Graph-cut

IoU = 0.99 IoU = 0.98

Ours

slide-74
SLIDE 74

Robustness

74

Color Scale Rotation Occlusion

slide-75
SLIDE 75

User Study

  • Our result provides very good initialization for

further refinement

75

slide-76
SLIDE 76

Segmentation is not enough

  • -Automatic Portrait Matting
slide-77
SLIDE 77

Portrait Matting

Input Image Alpha Matte Color transform Depth-of-field Portrait Stylization Cartoon Background Edit

slide-78
SLIDE 78

Problem Definition

78

๐œท๐‘ฎ + ๐Ÿ โˆ’ ๐œท ๐‘ช

foreground background Image Alpha/foreground opacity

๐‘ฑ =

slide-79
SLIDE 79

Natural Image Matting

  • Color Sampling Methods
  • Given manual-labeled trimap
  • Bayesian Matting [Y-Y Chuang, 2001], etc.

79

Ima mage Trim rimap Al Alph pha ma matte

slide-80
SLIDE 80

Natural Image Matting

  • Propagation approaches
  • Given manual-labeled strokes & trimap
  • Closed-form Matting [Levin, 2008], etc.

80

๐›ฝ = ๐‘๐‘ ๐‘•๐‘›๐‘—๐‘œ ๐›ฝ๐‘ˆ๐‘€๐›ฝ + ๐œ‡ ๐›ฝ โˆ’ ๐‘๐‘ก ๐‘ˆ๐ธ(๐›ฝ โˆ’ ๐‘๐‘ก)

Matting Laplacian User-provided Strokes Diagonal stroke mask

slide-81
SLIDE 81

Motivation

  • It is very hard to specify trimap or strokes

81

Inp nput La Labele led Str Strokes Cl Clos

  • sed-for
  • rm Ma

Mattin ing

error

slide-82
SLIDE 82

Motivation

  • It is very hard to specify trimap or strokes

Inp nput La Labele led Trim rimap Cl Clos

  • sed-for
  • rm Ma

Mattin ing

error

slide-83
SLIDE 83

Motivation

83

Usually we need to refine the trimap many times to get a good alpha matteโ€ฆโ€ฆ

slide-84
SLIDE 84

Segmentation to Matting

slide-85
SLIDE 85

Segmentation to Matting

slide-86
SLIDE 86

86

slide-87
SLIDE 87

Learning for Automatic Matting

  • Challenges
  • Data preparation
  • Learning framework
  • We propose end-to-end Convolutional Neural

Networks (CNNs) for Portrait Matting

87

slide-88
SLIDE 88

Learning Data Collection

  • 2000 portraits from Flickr with large variation
  • Keywordsโ€ฆ
  • Different Age, gender, pose, hairstyle, backgroundโ€ฆ
  • Different camera typeโ€ฆ
  • Data example

88

slide-89
SLIDE 89

89 89

slide-90
SLIDE 90

Data Labeling

  • Apply closed-form matting and robust matting
  • Gradually refine the input trimap
  • Choose the best one from closed-form or robust matting
  • User interface
  • Ground truth example

90

slide-91
SLIDE 91

91 91

slide-92
SLIDE 92

Learn Automatic Matting

92

slide-93
SLIDE 93

Our Method

93

Trimap labeling

  • Input: RGB image
  • Output: trimap
  • Network: Fine tuned from FCN
slide-94
SLIDE 94

Our Method

94

Image Matting Layer

  • Input: trimap
  • Output: alpha matte
  • Novel-designed structure
slide-95
SLIDE 95

Our Method

95

Image Matting Layer

  • Feed-Forward:

๐‘›๐‘—๐‘œ ๐œ‡๐ต๐‘ˆ๐ถ๐ต + ๐œ‡ ๐ต โˆ’ 1 ๐‘ˆ๐บ(๐ต โˆ’ 1) + ๐ต๐‘ˆ๐‘€๐ต

  • Back-Forward:

๐œ–๐‘” ๐œ–๐ถ = โˆ’๐œ‡๐ธโˆ’1๐‘’๐‘—๐‘๐‘•(๐ธโˆ’1๐บ) ๐œ–๐‘” ๐œ–๐บ = ๐œ–๐‘” ๐œ–๐ถ + ๐ธโˆ’1 ๐œ–๐‘” ๐œ–๐œ‡ = โˆ’๐œ‡๐ธโˆ’1๐‘’๐‘—๐‘๐‘• ๐บ + ๐ถ ๐ธโˆ’1๐บ

slide-96
SLIDE 96

Our Method

96

Image Matting Layer

  • Loss function:

๐‘€(๐ต, ๐ต๐‘•๐‘ข) = เท

๐‘—

๐‘ฅ ๐ต๐‘—

๐‘•๐‘ข | ๐ต๐‘— โˆ’ ๐ต๐‘— ๐‘•๐‘ข |,

๐‘ฅ ๐ต๐‘—

๐‘•๐‘ข = โˆ’๐‘š๐‘๐‘•(๐‘ž(๐ต = ๐ต๐‘— ๐‘•๐‘ข))

slide-97
SLIDE 97

Model Training

97

  • Data augmentation
  • 4 scales {0.6,0.8,1.2,1.5}
  • 4 rotations {-45,-22,22,45} degree
  • Gamma value {0.5,0.8,1.2,1.5}
  • Network initialization
  • Fine tuned from FCN-8s Model [J. Long, 2015]
slide-98
SLIDE 98

Experiments

98

  • Running Time
  • Training time: 20k iterations, one day on Titan X GPU
  • Testing Time: 0.6s for 600ร—800 color image.
  • Comparisons
  • Graph-cut
  • FCN Baseline: direct FCN segmentation followed by

closed-form matting

slide-99
SLIDE 99

Results

99

Input Graph-cut FCN Ours

slide-100
SLIDE 100

Results

100

Input Graph-cut FCN Ours

slide-101
SLIDE 101

Results

101

Input Graph-cut FCN Ours

slide-102
SLIDE 102

Results

102

Input Graph-cut FCN Ours

slide-103
SLIDE 103

Failure Cases

103

Input Alpha Matte Input Alpha Matte

slide-104
SLIDE 104

Applications

104

Input Stylization PS GS Stick PS Fresco Stylization Input Stylization Depth-of-Field PS Fresco Stylization

slide-105
SLIDE 105

Applications

105

Input Stylization PS Palette Knife PS GS Stick PS Sketch Input PS Oil Paint Depth-of-Field PS GS Stick Stylization

slide-106
SLIDE 106

Applications

106

Input Stylization PS Palette Knife Depth-of-Field Stylization Input Stylization PS Palette Knife PS Dark Stroke PS Paint Daubs

slide-107
SLIDE 107

Conclusions

  • High accuracy automatic portrait segmentation

and matting approach

  • A novel CNN framework
  • Training and testing dataset
  • Benefits lots of applications
  • Future work
  • Video segmentation
  • Human segmentation
  • Single portrait image depth estimation
  • Weakly supervised version

107

slide-108
SLIDE 108

Q & A

108

Thanks