[PPT] - Models for Image Restoration Shuhang Gu Dept. of Computing The PowerPoint Presentation

SLIDE 1

Synthesis and Analysis Sparse Representation Models for Image Restoration Shuhang Gu 顾舒航

Dept. of Computing

The Hong Kong Polytechnic University

SLIDE 2

Outline

2

 Sparse representation models for image modeling

Synthesis based representation model
Analysis based representation model
Synthesis & analysis models for image modeling

 Weighted nuclear norm and its applications in low level vision

Low rank models
Weighted nuclear norm minimization (WNNM)
WNNM for image denoising
WNNM-RPCA and WNNM-MC and their applications

 Convolutional sparse coding for single image super-resolution

Convolutional sparse coding (CSC)
CSC for single image super resolution

SLIDE 3

Synthesis and analysis sparse representation models

3

SLIDE 4

Synthesis based sparse representation model

Synthesis based sparse representation model assumes that a signal 𝑦 can be represented as a linear combination of a small number of atoms chosen out of a dictionary 𝐸: 𝑦 = 𝐸𝛽, s.t. 𝛽 0<𝜁

4

? ? . . . ? ?

A dense solution A sparse solution

Elad, M., Milanfar, P., Rubinstein, R. Analysis versus synthesis in signal priors. Inverse problems 2007.

SLIDE 5

Analysis based sparse representation model

Analysis model generate representation coefficients by a

simple multiplication operation, and assumes the coefficients are sparse: 𝑄𝑦 0<𝜁

5

? ? . . . ? ?

Elad, M., Milanfar, P., Rubinstein, R. Analysis versus synthesis in signal priors. Inverse problems 2007.

SLIDE 6

S&A representation models for image modeling

A geometry perspective

6

Synthesis model 𝑦 = 𝐸𝛽, where 𝛽 is sparse Analysis model 𝛾 = 𝑄𝑦, where 𝛾 is sparse

synthesis model emphasis the non-zero values in the sparse coefficient vector 𝛽, because these non-zero values select vectors in the dictionary to span the space of input signal

A hyperplane

Analysis model emphasis the zero values in the sparse coefficient vector 𝑄𝑦, because these zero values select vectors in the projection matrix to span the complementary space

f input signal

Elad, M., Milanfar, P., Rubinstein, R. Analysis versus synthesis in signal priors. Inverse problems 2007.

SLIDE 7

S&A representation models for image modeling

 Image restoration/enhancement problems 𝑧 = 𝑦 + 𝑜 𝑧 = 𝐸(𝑙⨂𝑦) + 𝑜 𝑧 = 𝑙⨂𝑦 + 𝑜 𝑧 = 𝑁 ⊙ 𝑦 + 𝑜

7

Image Inpainting Image Denoising Image Deconvolution

…

Image Super-resolution

SLIDE 8

S&A representation models for image modeling

8

Priors for image restoration

– Sparsity priors – Non-local similarity priors – Color line priors …

Buades A, Coll B, Morel JM. A non-local algorithm for image denoising. In CVPR 2005.

SLIDE 9

S&A representation models for image modeling

Sparsity prior

𝑏𝑠𝑕𝑛𝑏𝑦𝑦𝑞 𝑦 𝑧 = 𝑞 𝑧 𝑦 𝑞(𝑦) Minimize the –log(𝑞 𝑦 𝑧 ): 𝑦 = 𝑏𝑠𝑕𝑛𝑗𝑜𝑌 1 2 𝑦 − 𝑧 𝐺 − log(𝑞(𝑦))

9

𝐪(𝐲)

Transformation domain Original signal domain Decomposition domain Long-tail dist. leads to sparse solution

Dist. Is not

discriminative enough

Gaussian likelihood

dist. assumption

Data prior modeling

Prior modeling

Long-tail dist. leads to sparse solution Analysis model 𝜚(𝑄𝑦) Synthesis model 𝜔(𝛽)

SLIDE 10

S&A representation models for image modeling

Synthesis model

𝑛𝑗𝑜𝛽 1 2 𝑧 − 𝐸𝛽 𝐺 + 𝜔 𝛽 𝑦 = 𝐸𝛽

Representative methods

KSVD, BM3D, LSSC, NCSR, et. al.

Pros
Synthesis model can be more sparse
Easier to embed non-local prior
Cons
Patch prior modeling needs aggregation
Time consuming

10

Analysis model

𝑛𝑗𝑜𝑦 1 2 𝑧 − 𝑦 𝐺 + 𝜚(𝑄𝑦)

Representative methods

TV, wavelet methods, FRAME, FOE, CSF, TRD et. al.

Pros
Patch divide free
Efficient in the inference phase
Easier to learn task specific prior
Cons
Hard to embed non-local prior
Not as sparse as synthesis model

SLIDE 11

S&A representation models for image modeling

Synthesis model

Methods: KSVD, BM3D, LSSC, NCSR, etc.

Pros
Synthesis model can be more sparse
Easier to embed non-local prior
Cons
Patch prior modeling needs aggregation
Time consuming

11

Analysis model

methods: TV, wavelet methods, FOE, CSF, TRD etc.

Pros
Patch divide free
Efficient in the inference phase
Easier to learn task specific prior
Cons
Hard to embed non-local prior
Not as sparse as synthesis model

Patch based Filter based

SLIDE 12

S&A representation models for image modeling

Synthesis model

Methods: KSVD, BM3D, LSSC, NCSR, etc.

Pros
Synthesis model can be more sparse
Easier to embed non-local prior
Cons
Patch prior modeling needs aggregation
Time consuming

?

12

Analysis model

Methods: Analysis-KSVD et al. methods: TV, wavelet methods, FOE, CSF, TRD etc.

Pros
Patch divide free
Efficient in the inference phase
Easier to learn task specific prior
Cons
Hard to embed non-local prior
Not as sparse as synthesis model

Patch based Filter based

SLIDE 13

S&A representation models for image modeling

Notes:

Aggregation: Overlap aggregation method may smooth image or generate ringing artifacts Non-local prior: Non-local prior helps to generate visual plausible results on highly noisy situation

13

Analysis model

Embed non-local prior
Modeling structure better
Aggregation free
Modeling structure better

Patch based Filter based Synthesis model

Embed non-local prior
Modeling texture/details better
Aggregation free
Modeling texture/details better

SLIDE 14

S&A representation models for image modeling

Notes:

Aggregation: Overlap aggregation method may smooth image or generate ringing artifacts Non-local prior: Non-local prior helps to generate visual plausible results on highly noisy situation

14

Analysis model

Embed non-local prior
Modeling structure better
Aggregation free
Modeling structure better

Patch based Filter based Synthesis model

Embed non-local prior
Modeling texture/details better
Aggregation free
Modeling texture/details better

Different applications may be better solved via different models!

SLIDE 15

S&A representation models for image modeling

15

Weighted nuclear norm minimization denoisning model

– A analysis model with patch based implementation

Non-local prior
Analysis model is good at structure modeling
Convolutional sparse coding super resolution

– A synthesis model with filter based implementation

Aggregation free
Synthesis model is good at texture modeling

Denoising SR

SLIDE 16

Weighted nuclear norm minimization and its applications in low level vision

16

SLIDE 17

Low rank models

Matrix factorization methods

𝑛𝑗𝑜𝑉,𝑊𝒎𝒑𝒕𝒕 𝑍 − 𝑌 𝑡. 𝑢. 𝑌 = 𝑉𝑊

Loss functions are determined by different noise models: Gaussian noise model: PCA, Probabilistic PCA Sparse noise model: Robust PCAs Partial observations: Matrix completion Complex noise model: MoG etc.

17

SLIDE 18

Low rank models

Regularization methods

𝑛𝑗𝑜𝑌𝒎𝒑𝒕𝒕 𝑍 − 𝑌 + 𝑆(𝑌)

A common used regularization term is the nuclear norm of matrix X

𝑌 ∗ = 𝜏𝑗(𝑌) 1 Pros:

exact recovery property (theoretically proved)
nuclear norm proximal problem has closed-form solution

Character:

Regularization method balance fidelity and low-rankness via parameter
Factorization method set upper bound.

18

Candès, Emmanuel J., et al. "Robust principal component analysis?." Journal of the ACM, 2011. Candès, Emmanuel J., and Benjamin Recht. "Exact matrix completion via convex optimization." Foundations of Computational mathematics 2009. Cai, J. F., Candès, E. J., & Shen, Z. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 2010.

SLIDE 19

Low rank models

Regularization methods: a 2D analysis sparse perspective

Analysis sparse model

𝑛𝑗𝑜𝑦 1 2 𝑧 − 𝑦 𝐺 + 𝜚(𝑄𝑦)

Nuclear norm regularization model

𝑛𝑗𝑜𝑌 1 2 𝑍 − 𝑌 𝐺 + 𝑉𝑈𝑌𝑊 1

Nuclear norm regularization model can be interpreted as a 2D analysis sparse model!

19

SLIDE 20

Weighted nuclear norm minimization

Nuclear norm proximal

𝑛𝑗𝑜𝑌 1 2 𝑍 − 𝑌 𝐺 + 𝜇 𝑌 ∗ 𝑌∗ = 𝑉𝑇𝜇 𝜏𝑍 𝑊𝑈

Pros

– Tightest convex envelope of rank minimization. – Closed form solution.

Cons

– Treat equally all the singular values. Ignore the different significances of matrix singular values.

20

Cai, J. F., Candès, E. J., & Shen, Z. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 2010.

SLIDE 21

Weighted nuclear norm minimization Weighted nuclear norm

𝑌 𝑥,∗ = 𝑥𝑗𝜏𝑗(𝑌) 1

Weighted nuclear norm proximal (WNNP)

𝑌 = 𝑏𝑠𝑕𝑛𝑗𝑜𝑌 𝑌 − 𝑍 𝐺 + 𝑌 𝑥,∗

Difficulties

– The WNNM is not convex for general weight vectors – The sub-gradient method cannot be used to analyze its

ptimization

21

SLIDE 22

Weighted nuclear norm minimization

Theorem 1. ∀𝑍 ∈ 𝑆𝑛×𝑜, let 𝑍 = 𝑉Σ𝑊𝑈be its SVD. The optimal solution of the WNNP problem: 𝑌 = 𝑏𝑠𝑕𝑛𝑗𝑜𝑌 𝑌 − 𝑍 𝐺 + 𝑌 𝑥,∗ is 𝑌 = 𝑉𝐸𝑊𝑈 where D is a diagonal matrix with diagonal entries d=[d1, d2, … , dr] (r=min(m,n)) and d is determined by: 𝑛𝑗𝑜𝑒1, 𝑒2…𝑒𝑜 𝑗=1

𝑠

(𝑒𝑗−σ𝑗)2 + 𝑥𝑗 𝑒𝑗 𝑡. 𝑢. 𝑒1 ≥ 𝑒2≥ 𝑒𝑠 ≥0.

22

SLIDE 23

Weighted nuclear norm minimization Corollary 1. If the weights satisfy 0 ≤ 𝑥1 ≤ 𝑥2≤ 𝑥𝑜, the non-convex WNNP problem has a closed form

ptimal solution:

𝑌 = 𝑉𝑇𝑥(Σ)𝑊𝑈

where 𝑍 = 𝑉Σ𝑊𝑈 is the SVD of 𝑍, and

𝑇𝑥(Σ)𝑗𝑗 = max Σ𝑗𝑗 − 𝑥𝑗, 0 .

23

SLIDE 24

WNNM for image denosing

24

1. For each noisy patch, search in

the image for its nonlocal similar patches to form matrix Y.

2. Solve the WNNM problem to

estimate the clean patches X from Y.

3. Put the clean patch back to the

image.

4. Repeat the above procedures

several times to obtain the denoised image.

… …

WNNM

𝑌 = 𝑏𝑠𝑕𝑛𝑗𝑜𝑌 𝑌 − 𝑍 𝐺 + 𝑌 𝑥,∗

SLIDE 25

WNNM for image denosing

25

Weights setting

Reweighting strategy to promote sparsity

𝑥𝑗 = 𝐷 𝜏𝑗 𝑌 + 𝜁

Still only has one parameter
Will not introduce much further

computation burden

SLIDE 26

WNNM for image denosing

26

Denoising experimental results

SLIDE 27

WNNM-RPCA

27

𝑛𝑗𝑜𝑌,𝐹 𝐹 1 + 𝑌 𝑥,∗ 𝑡. 𝑢. 𝑍 = 𝑌 + 𝐹

Synthetic experiment:

𝑌, 𝑍 ∈ 𝑆𝑛×𝑛, 𝑆𝑏𝑜𝑙 𝑌 = 𝑄

𝑠 × 𝑛, 𝐹 0 = 𝑄 𝑓 × 𝑛2

SLIDE 28

WNNM-RPCA

28

SLIDE 29

WNNM-MC

29

𝑛𝑗𝑜𝑌 𝑌 𝑥,∗ 𝑡. 𝑢. 𝑍 = 𝑌 + 𝐹, 𝑄Ω 𝐹 = 0

Synthetic experiment:

𝑌, 𝑍 ∈ 𝑆𝑛×𝑛, 𝑆𝑏𝑜𝑙 𝑌 = 𝑄

𝑠 × 𝑛, 𝐹 0 = 𝑄 𝑓 × 𝑛2

SLIDE 30

WNNM-MC

30

SLIDE 31

WNNM summary

31

We analyzed the weighted nuclear norm proximal (WNNP)

problem.

Based on WNNP, we proposed a new image denoising algorithm,

and achieved state-of-the-art performance.

We then extend weighted nuclear norm to WNNM-RPCA and

WNNM-MC. WNNM achieved superior performance than NNM

n both the two applications.

SLIDE 32

Convolutional sparse coding for single image super-resolution

32

SLIDE 33

Convolutional sparse coding

Consistency constraint

33

SLIDE 34

Convolutional sparse coding

Aggregation method in patch based algorithms EPLL 𝑛𝑗𝑜𝑌 𝒀 − 𝒁 2 + 𝑆𝑗𝒀 − 𝑎𝑗

2 + 𝑄(𝑎𝑗)

34

Noisy Non Overlapping Center Pixel Overlapping

Zoran D, Weiss Y. From learning models of natural image patches to whole image restoration. In: ICCV 2011.

SLIDE 35

Convolutional sparse coding

Sparse coding

𝑛𝑗𝑜𝛽||𝑧 − 𝐸𝛽||𝐺

2+𝜚(𝛽)

Convolutional sparse coding

𝑛𝑗𝑜𝒂 𝒁 − 𝒈𝑗 ⊗ 𝒜𝑗

𝐺 2+ 𝜒(𝒜𝑗)

35

…

… … …

…

Mat atrix ix Form

M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus. Deconvolutional networks. In CVPR, 2010.

SLIDE 36

Convolutional sparse coding for image SR

36

N LR feature maps M HR feature maps

Mapp. Func.

M HR filters

Mapp. Func.

Learning Joint HR Filter and Mapping Function Learning LR Filter Learning

The Testing Phase The Training Phase

N LR filters

CSC LR Filter Learning HR Filter Learning HR Feature Map Estimation Convolution

SLIDE 37

Convolutional sparse coding for image SR

37

N LR feature maps M HR feature maps

Mapp. Func.

Learning Joint HR Filter and Mapping Function Learning LR Filter Learning

The Training Phase

CSC LR Filter Learning HR Filter Learning

– Pre-processing

SLIDE 38

Convolutional sparse coding for image SR

38

N LR feature maps M HR feature maps

Mapp. Func.

Learning Joint HR Filter and Mapping Function Learning LR Filter Learning

The Training Phase

CSC LR Filter Learning HR Filter Learning

– LR filter training

B. Wohlberg. Efficient convolutional sparse coding. In ICASSP, 2014.

SLIDE 39

Convolutional sparse coding for image SR

39

N LR feature maps M HR feature maps

Mapp. Func.

Learning Joint HR Filter and Mapping Function Learning LR Filter Learning

The Training Phase

CSC LR Filter Learning HR Filter Learning

– Joint HR filter and mapping function learning

SLIDE 40

Convolutional sparse coding for image SR

40

Mapp. Func.

M HR filters

The Testing Phase The Training Phase

N LR filters

HR Feature Map Estimation Convolution

SLIDE 41

Convolutional sparse coding for image SR

41

Optimization: SA-ADMM The original problem can be write as:

L. W. Zhong and J. T. Kwok. Fast stochastic alternating direction method of multipliers. In ICML, 2013.

SLIDE 42

Convolutional sparse coding for image SR

42

Optimization: SA-ADMM SA-ADMM

SLIDE 43

Convolutional sparse coding for image SR

43

SLIDE 44

Convolutional sparse coding for image SR

44

SLIDE 45

Convolutional sparse coding for image SR

45

SLIDE 46

Convolutional sparse coding for image SR

46

SLIDE 47

CSC-SR: summary and future work

47

Summary

– To avoid patch aggregation in super-resolution, we utilize convolutional sparse coding to deal with SR problem. – SA-ADMM algorithm is used to train CSC-SR model for large scale training data. – State-of-the-art SR results with high PSNR and visual quality.

Future work

– End to end training strategy may be better. – Is there any optimization algorithm which is more suitable for CSC training.

SLIDE 48

Related Publications and References

Related Publication

S. Gu, L. Zhang, W. Zuo, and X. Feng, “Weighted Nuclear Norm Minimization with Application to

Image Denoising,” In CVPR 2014.

Q. Xie, D. Meng, S. Gu, L. Zhang, W. Zuo, X. Feng and Z. Xu, “On the Optimal Solution of

Weighted Nuclear Norm Minimization” Technical Report.

S. Gu, Q. Xie, D. Meng, W. Zuo, X. Feng, L. Zhang. Weighted Nuclear Norm Minimization and Its

Applications to Low Level Vision. Submitted to IJCV (Minor revision).

S. Gu, W. Zuo, Q. Xie, D. Meng, X. Feng, L. Zhang. "Convolutional Sparse Coding for Image

Super-resolution," In ICCV 2015.

References

Elad, M., Milanfar, P., Rubinstein, R. Analysis versus synthesis in signal priors. Inverse problems

2007.

Buades A, Coll B, Morel JM. A non-local algorithm for image denoising. In CVPR 2005.
M. Aharon, M. Elad, A. Bruckstein. K-svd: An algorithm for designing overcomplete dictionaries

for sparse representation. TSP 2006.

K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3-d transform-

domain collaborative filtering. TIP, 2007.

48

SLIDE 49

Related Publications and References

References

J. Marial, F. Bach, J. P once, G. Sapiro, and A. Zisserman. Non-local sparse models for image
restoration. In ICCV 2009.
W. Dong, L. Zhang, and G. Shi, “Centralized Sparse Representation for Image Restoration,”

In ICCV 2011.

Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms.

Physica D: Nonlinear Phenomena 1992.

Zhu, S.C., Wu, Y., Mumford, D.: Filters, random fields and maximum entropy (frame): Towards a

unified theory for texture modeling. IJCV 1998.

Roth, S., Black, M.J.: Fields of experts. IJCV 2009.
Schmidt, U., Roth, S.: Shrinkage fields for effective image restoration. In: CVPR. (2014)
Chen, Y., Yu, W., Pock, T.: On learning optimized reaction diffusion processes for effective image
restoration. In: CVPR. (2015)
Rubinstein, Ron, Tomer Peleg, and Michael Elad. "Analysis K-SVD: A dictionary-learning

algorithm for the analysis sparse model." TSP 2013.

Tipping,

Michael E., and Christopher M. Bishop. "Probabilistic principal component analysis." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61.3 (1999): 611-622.

49

SLIDE 50

Related Publications and References

References

Ke Q, Kanade T. Robust l1 norm factorization in the presence of outliers and missing data by

alternative convex programming. In: CVPR 2005.

Meng D, Torre FDL. Robust matrix factorization with unknown noise. In: ICCV 2013.
Candès, Emmanuel J., et al. "Robust principal component analysis?." Journal of the ACM, 2011.
Candès,

Emmanuel J., and Benjamin Recht. "Exact matrix completion via convex

ptimization." Foundations of Computational mathematics 2009.
Cai, J. F., Candès, E. J., & Shen, Z. A singular value thresholding algorithm for matrix
completion. SIAM Journal on Optimization, 2010.
Zoran D, Weiss Y. From learning models of natural image patches to whole image restoration.

In: ICCV 2011.

M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus. Deconvolutional networks. In CVPR, 2010.
B. Wohlberg. Efficient convolutional sparse coding. In ICASSP, 2014.
L. W. Zhong and J. T. Kwok. Fast stochastic alternating direction method of multipliers. In ICML,

2013. Note: References of comparison methods in the tables are omitted, all of these references can be found in my corresponding publications.

50

SLIDE 51

THANKS!

51