[PPT] - Segmentation and Matting Xiaoyong Shen The Chinese University of PowerPoint Presentation

SLIDE 1

Automatic Portrait Segmentation and Matting

Xiaoyong Shen

The Chinese University of Hong Kong goodshenxy@gmail.com

SLIDE 2

Research on CV

Pixel based (low level/ early vision)
Filtering, restoration, denoise, enhancement, deblur,

editing, dehaze, etc.

Region/ Patch based (Middle level vision)
Matching, optical flow, stereo matching, tracking,

segmentation, etc.

Object/ Semantic based (high level vision)
Semantic segmentation, Object detection, image

classification, recognition, etc.

SLIDE 3

My Research on CV

Pixel based (low level vision)
Filtering, restoration, denoise, enhancement, deblur,

editing, dehaze, etc.

Region/ Patch based (Middle level vision)
Matching, optical flow, stereo matching, tracking,

segmentation, etc.

Object based (high level vision)
Semantic segmentation, Object detection, image

classification, recognition, etc.

SLIDE 4

SLIDE 5

Multi-Spectral Image Restoration

Input
Noisy RGB image I0
E.g. captured at night
Clean guidance image G
E.g. dark-flashed NIR, or flashed RGB images
Output
Denoised image I
Structures are clear as guidance G.
Appearance is the same as image I0.
Shadow/Highlight does not affect.

5

[TPAMI 2015]

SLIDE 6

Scale Map

Given 𝐽∗ – the expected ground truth noise-

free image, our scale map s is defined under the following condition min 𝛼𝐽∗ − 𝑡𝛼𝐻

It adapts structures of 𝐻 to that of I*.
It is an ideal ratio map between 𝛼𝐻 and 𝛼𝐽∗.

6

SLIDE 7

Result

7

Our Result Ground Truth Input Noisy Image Input NIR Image

SLIDE 8

RGB Input I

8

SLIDE 9

NIR Input G

9

SLIDE 10

BM3D

10

SLIDE 11

Our Result

11

SLIDE 12

Mutual-Structure Filter

[ICCV 2015 Oral Presentation]

SLIDE 13

Depth/RGB Restoration

Noisy Depth

SLIDE 14

Depth/RGB Restoration

Noisy RGB Image

SLIDE 15

Depth/RGB Restoration

Ground truth

SLIDE 16

Depth/RGB Restoration

Ours PSNR = 37.19

SLIDE 17

Rolling Guidance Filter

One line code only: 𝐽𝑢+1 = 𝐾𝐺(𝐽0, 𝐽𝑢)

[ECCV 2014 Oral Presentation]

SLIDE 18

Texture Removal

18

SLIDE 19

Halftone Image

19

SLIDE 20

De-Filter

One line code only: 𝐽𝑢+1 = 𝐽𝑢 + (𝐽0 − 𝐺(𝐽𝑢))

SLIDE 21

Reverse Skin Retouch

Retouched input

SLIDE 22

Reverse Skin Retouch

Reversed

SLIDE 23

Reverse Skin Retouch

Before retouch

SLIDE 24

SLIDE 25

Multi-Spectral Matching

Match general multi-spectral images with

significant displacement and obvious structure inconsistency

Different Exposures RGB/Depth RGB/NIR Flash/No-flash

SLIDE 26

Result

Match RGB/NIR image pair

Inputs Our Result Blended

SLIDE 27

Applications

HDR construction

Without Alignment With Alignment Constructed HDR

SLIDE 28

Internet Image Matching

Reference Input Dense Correspondences ? Exist Correspondence No Correspondence [SIGGRAPH ASIA 2016]

SLIDE 29

Our Motivation

Reference Input Dense Correspondences ? Foremost Region Matching

SLIDE 30

Time-lapse Generation

SLIDE 31

Automatic Morphing

SLIDE 32

Automatic Morphing

SLIDE 33

Object-based Matching

Achieve higher accuracy with the help of object (person)

SLIDE 34

Object-based Matching

State-of-the-art Ours

SLIDE 35

SLIDE 36

Classification and Segmentation

Fine-grained Classification
DeepLAC (CVPR 2015)
Text detection and recognition
Semantic object segmentation
Portrait segmentation and matting
VOC challenge

SLIDE 37

Automatic Portrait Segmentation

SLIDE 38

Motivation

Abundant portraits in smartphone photos

38

Portrait, 30% Others, 70%

Samsung UK

Portrait, 90% Others, 10%

Symon Whitehorn from HTC

SLIDE 39

Portrait Post-processing

39

SLIDE 40

Foreground Selection

40

SLIDE 41

Quick Selection

41

SLIDE 42

Automatic Segmentation

42

Automatic?

SLIDE 43

Challenges

43

Similar Color Complex Background Various Accessories Low Contrast Diverse Pose Complicated Edges

SLIDE 44

Possible Solutions

Graph-cut with face tracker

44

SLIDE 45

Possible Solutions

CNNs for semantic segmentation

45

SLIDE 46

Most Related Work

Interactive Image Selection
Lazy snapping [Li et al. 2004]
Grabcut [Rother et al. 2004]
Paint Selection [Li et al. 2009]
CNNs for Semantic Object Segmentation
FCN [Long et al. 2014]
DeepLab [Chen et al. 2014]
CRFasRNN [Zheng et al. 2015]
Image Matting
Bayesian matting [Chuang et al. 2001]
Closed-form matting [Levin et al. 2008]
KNN matting [Chen et al. 2013]

46

SLIDE 47

Our Approach

47

PortraitFCN and PortraitFCN+

SLIDE 48

Our System

48

Detector Conv ReLU Pooling Conv Conv Pooling ReLU DeConv Mask

[Long et al. 2015]

PortraitFCN Model

RGB Channels 2 Outputs

SLIDE 49

PortraitFCN

49

Fine tune it from original FCN-8s model

Portrait Knowledge

SLIDE 50

PortraitFCN+

50

Detector Conv ReLU Pooling Conv Conv Pooling ReLU DeConv Mask

[Long et al. 2015]

PortraitFCN+ Model

RGB+Shape+Position 2 Outputs Shape Position

SLIDE 51

Shape Channel

51

……

Labeled Masks

Align

Canonical Pose

Mean

Shape Channel

𝑁 = σ𝑗 𝑥𝑗 ∘ 𝑈𝑗(𝑁𝑗) σ𝑗 𝑥𝑗 Align Test Image

SLIDE 52

Position Channel

52

Canonical Pose x- Coordinate y- Coordinate Position Test Image

Align

SLIDE 53

Effectiveness

53

Input

SLIDE 54

Effectiveness

54

PortraitFCN

SLIDE 55

Effectiveness

55

PortraitFCN+

SLIDE 56

Experiments and Applications

56

SLIDE 57

Our Dataset

1,800 portraits from Flickr with labeled mask
1500 portraits as the training data
300 for testing
Large variations on portrait types
Age, color, background, clothing, accessories, head

position, hair style, lighting, etc.

57

SLIDE 58

58

SLIDE 59

Training

Fine turn the model starting from FCN-8s
Synthesize more data with different transforms
Using the person class and background weights
Find the best learning rate
Loss
accuracy

59

SLIDE 60

Find the Best LR

60

SLIDE 61

Evaluation

61

Methods Mean IoU (%) Graph-cut 80.02 FCN (Person Class) 73.09

IoU = area(output ∩ ground truth) area(output ∪ ground truth)

SLIDE 62

Evaluation

62

Methods Mean IoU (%) Graph-cut 80.02 FCN (Person Class) 73.09 PortraitFCN 94.20

IoU = area(output ∩ ground truth) area(output ∪ ground truth)

SLIDE 63

Evaluation

63

Methods Mean IoU (%) Graph-cut 80.02 FCN (Person Class) 73.09 PortraitFCN 94.20 PortraitFCN+ (Only with Mean Mask) 94.89 PortraitFCN+ (Only with Normalized x and y) 94.61

IoU = area(output ∩ ground truth) area(output ∪ ground truth)

SLIDE 64

Evaluation

64

Methods Mean IoU (%) Graph-cut 80.02 FCN (Person Class) 73.09 PortraitFCN 94.20 PortraitFCN+ (Only with Mean Mask) 94.89 PortraitFCN+ (Only with Normalized x and y) 94.61 PortraitFCN+ 95.91

IoU = area(output ∩ ground truth) area(output ∪ ground truth)

SLIDE 65

Comparisons

65

Input

SLIDE 66

Comparisons

66

Ground Truth

SLIDE 67

Comparisons

67

Graph-cut

SLIDE 68

Comparisons

68

FCN-8s (Person)

SLIDE 69

Comparisons

69

PortraitFCN

SLIDE 70

Comparisons

70

PortraitFCN+

SLIDE 71

Comparisons

71

Input Ground Truth

IoU = 0.83 IoU = 0.42 IoU = 0.91 IoU = 0.85

FCN-8s Graph-cut

IoU = 0.99 IoU = 0.98

Ours

SLIDE 72

Comparisons

72

Input Ground Truth

IoU = 0.77 IoU = 0.95 IoU = 0.38 IoU = 0.84

FCN-8s Graph-cut

IoU = 0.98 IoU = 0.98

Ours

SLIDE 73

Comparisons

73

Input Ground Truth

IoU = 0.83 IoU = 0.53 IoU = 0.81 IoU = 0.89

FCN-8s Graph-cut

IoU = 0.99 IoU = 0.98

Ours

SLIDE 74

Robustness

74

Color Scale Rotation Occlusion

SLIDE 75

User Study

Our result provides very good initialization for

further refinement

75

SLIDE 76

Segmentation is not enough

-Automatic Portrait Matting

SLIDE 77

Portrait Matting

Input Image Alpha Matte Color transform Depth-of-field Portrait Stylization Cartoon Background Edit

SLIDE 78

Problem Definition

78

𝜷𝑮 + 𝟐 − 𝜷 𝑪

foreground background Image Alpha/foreground opacity

𝑱 =

SLIDE 79

Natural Image Matting

Color Sampling Methods
Given manual-labeled trimap
Bayesian Matting [Y-Y Chuang, 2001], etc.

79

Ima mage Trim rimap Al Alph pha ma matte

SLIDE 80

Natural Image Matting

Propagation approaches
Given manual-labeled strokes & trimap
Closed-form Matting [Levin, 2008], etc.

80

𝛽 = 𝑏𝑠𝑕𝑛𝑗𝑜 𝛽𝑈𝑀𝛽 + 𝜇 𝛽 − 𝑐𝑡 𝑈𝐸(𝛽 − 𝑐𝑡)

Matting Laplacian User-provided Strokes Diagonal stroke mask

SLIDE 81

Motivation

It is very hard to specify trimap or strokes

81

Inp nput La Labele led Str Strokes Cl Clos

sed-for
rm Ma

Mattin ing

error

SLIDE 82

Motivation

It is very hard to specify trimap or strokes

Inp nput La Labele led Trim rimap Cl Clos

sed-for
rm Ma

Mattin ing

error

SLIDE 83

Motivation

83

Usually we need to refine the trimap many times to get a good alpha matte……

SLIDE 84

Segmentation to Matting

SLIDE 85

Segmentation to Matting

SLIDE 86

86

SLIDE 87

Learning for Automatic Matting

Challenges
Data preparation
Learning framework
We propose end-to-end Convolutional Neural

Networks (CNNs) for Portrait Matting

87

SLIDE 88

Learning Data Collection

2000 portraits from Flickr with large variation
Keywords…
Different Age, gender, pose, hairstyle, background…
Different camera type…
Data example

88

SLIDE 89

89 89

SLIDE 90

Data Labeling

Apply closed-form matting and robust matting
Gradually refine the input trimap
Choose the best one from closed-form or robust matting
User interface
Ground truth example

90

SLIDE 91

91 91

SLIDE 92

Learn Automatic Matting

92

SLIDE 93

Our Method

93

Trimap labeling

Input: RGB image
Output: trimap
Network: Fine tuned from FCN

SLIDE 94

Our Method

94

Image Matting Layer

Input: trimap
Output: alpha matte
Novel-designed structure

SLIDE 95

Our Method

95

Image Matting Layer

Feed-Forward:

𝑛𝑗𝑜 𝜇𝐵𝑈𝐶𝐵 + 𝜇 𝐵 − 1 𝑈𝐺(𝐵 − 1) + 𝐵𝑈𝑀𝐵

Back-Forward:

𝜖𝑔 𝜖𝐶 = −𝜇𝐸−1𝑒𝑗𝑏𝑕(𝐸−1𝐺) 𝜖𝑔 𝜖𝐺 = 𝜖𝑔 𝜖𝐶 + 𝐸−1 𝜖𝑔 𝜖𝜇 = −𝜇𝐸−1𝑒𝑗𝑏𝑕 𝐺 + 𝐶 𝐸−1𝐺

SLIDE 96

Our Method

96

Image Matting Layer

Loss function:

𝑀(𝐵, 𝐵𝑕𝑢) = ෍

𝑗

𝑥 𝐵𝑗

𝑕𝑢 | 𝐵𝑗 − 𝐵𝑗 𝑕𝑢 |,

𝑥 𝐵𝑗

𝑕𝑢 = −𝑚𝑝𝑕(𝑞(𝐵 = 𝐵𝑗 𝑕𝑢))

SLIDE 97

Model Training

97

Data augmentation
4 scales {0.6,0.8,1.2,1.5}
4 rotations {-45,-22,22,45} degree
Gamma value {0.5,0.8,1.2,1.5}
Network initialization
Fine tuned from FCN-8s Model [J. Long, 2015]

SLIDE 98

Experiments

98

Running Time
Training time: 20k iterations, one day on Titan X GPU
Testing Time: 0.6s for 600×800 color image.
Comparisons
Graph-cut
FCN Baseline: direct FCN segmentation followed by

closed-form matting

SLIDE 99

Results

99

Input Graph-cut FCN Ours

SLIDE 100

Results

100

Input Graph-cut FCN Ours

SLIDE 101

Results

101

Input Graph-cut FCN Ours

SLIDE 102

Results

102

Input Graph-cut FCN Ours

SLIDE 103

Failure Cases

103

Input Alpha Matte Input Alpha Matte

SLIDE 104

Applications

104

Input Stylization PS GS Stick PS Fresco Stylization Input Stylization Depth-of-Field PS Fresco Stylization

SLIDE 105

Applications

105

Input Stylization PS Palette Knife PS GS Stick PS Sketch Input PS Oil Paint Depth-of-Field PS GS Stick Stylization

SLIDE 106

Applications

106

Input Stylization PS Palette Knife Depth-of-Field Stylization Input Stylization PS Palette Knife PS Dark Stroke PS Paint Daubs

SLIDE 107

Conclusions

High accuracy automatic portrait segmentation

and matting approach

A novel CNN framework
Training and testing dataset
Benefits lots of applications
Future work
Video segmentation
Human segmentation
Single portrait image depth estimation
Weakly supervised version

107

SLIDE 108

Q & A

108