On the use of Gaussian models on patches for image denoising - - PowerPoint PPT Presentation

on the use of gaussian models on patches for image
SMART_READER_LITE
LIVE PREVIEW

On the use of Gaussian models on patches for image denoising - - PowerPoint PPT Presentation

On the use of Gaussian models on patches for image denoising Antoine Houdard Young Researchers in Imaging Seminars Institut Henri Poincar Wednesday, February 27th Digital photography: noise in images Different ISO settings with constant


slide-1
SLIDE 1

On the use of Gaussian models on patches for image denoising

Antoine Houdard Young Researchers in Imaging Seminars Institut Henri Poincaré

Wednesday, February 27th

slide-2
SLIDE 2

Digital photography: noise in images

Different ISO settings with constant exposure – 25600 ISO

2/48

slide-3
SLIDE 3

Digital photography: noise in images

Different ISO settings with constant exposure – 200 ISO

2/48

slide-4
SLIDE 4

Noise modeling and denoising problem

3/48

slide-5
SLIDE 5

Patch-based image denoising

Many denoising methods rely on the description of the image by patches:

‹ NL-means Buades, Coll, Morel (2005), ‹ BM3D Dabov, Foi, Katkovnik (2007), ‹ PLE Yu, Sapiro, Mallat (2012), ‹ NL-Bayes Lebrun, Buades, Morel (2012), ‹ LDMM Shi, Osher, Zhu (2017), ‹ and many others... 4/48

slide-6
SLIDE 6

Patch-based image denoising

Hypothesis: the Ni are i.i.d.

5/48

slide-7
SLIDE 7

Patch-based image denoising The Bayesian paradigm

‹ We consider each clean patch x as a realization of a random vector X with prior distribution PX. Ñ The Gaussian white noise model rewrites: , then Bayes’ theorem yields the posterior distribution: PX|Y px|yq “ PY |Xpy|xqPXpxq PY pyq .

6/48

slide-8
SLIDE 8

Patch-based image denoising

Denoising strategies

p

x “ ErX|Y “ ys the minimum mean square error (MMSE) estimator

p

x “ Dy ` α s.t. D and α minimize Er}DY ` α ´ X}2s which is the linear MMSE also called Wiener estimator

p

x “ arg max

xPRp ppx|yq the maximum a posteriori (MAP)

7/48

slide-9
SLIDE 9

Outline

  • 1. Gaussian priors for X: why are they widely used?
  • 2. How to infer parameters in high dimension?
  • 3. Presentation of the HDMI method.
  • 4. Limitations of model-based patch-based approaches.

8/48

slide-10
SLIDE 10
  • 1. Modeling the clean patches Xi

9/48

slide-11
SLIDE 11

Choice of the model

In the literature

local Gaussian models

‹ patch-based PCA Deledalle, Salmon, Dalalyan (2011), ‹ NL-bayes Lebrun, Buades, Morel (2012), ‹ ...

Gaussian mixture models

‹ EPLL Zoran, Weiss (2011), ‹ PLE Yu, Sapiro, Mallat (2012), ‹ Single-frame Image Denoising Teodoro, Almeida, Figueiredo (2015). ‹ ...

Why Gaussian models are so widely used?

10/48

slide-12
SLIDE 12

Gaussian is convenient

Gaussian model

If X „ Npµ, Σq then p xMMSE “ p xWiener “ p xMAP “ µ ` ΣpΣ ` σ2Iq´1py ´ µq.

Gaussian mixture model (GMM)

If X „ řK

k“1 πkNpµk, Σkq then

p xMMSE “

K

ÿ

k“1

PpZ “ k|Y “ yq “ µk ` ΣkpΣk ` σ2Iq´1py ´ µkq ‰ .

11/48

slide-13
SLIDE 13

What do Gaussian models encode?

The covariance matrix in Gaussian models and GMM encodes geometric structures up to some contrast change: s ˆ s s Covariance matrix Σ. Patches generated from Npm, Σq.

12/48

slide-14
SLIDE 14

What do Gaussian models encode?

The covariance matrix in Gaussian models and GMM encodes geometric structures up to some contrast change: s ˆ s s Covariance matrix Σ. Patches generated from Npm, Σq.

12/48

slide-15
SLIDE 15

What do Gaussian models encode?

A covariance matrix cannot encode multiple translated versions of a structure: A set of 10000 patches representing edges with random grey levels and random translations.

13/48

slide-16
SLIDE 16

What do Gaussian models encode?

A covariance matrix cannot encode multiple translated versions of a structure: s ˆ s s Covariance matrix Σ. Patches generated from Npm, Σq.

13/48

slide-17
SLIDE 17

Restore with the right model

covariance matrix clean patch noisy patch denoised

14/48

slide-18
SLIDE 18

Conclusion

Modeling the patches with Gaussian models is a good idea:

They are convenient for computing the estimates; They are able to encode the geometric structures of the patches.

Need of good parameters for the model!

15/48

slide-19
SLIDE 19
  • 2. How to infer parameters in high

dimension?

16/48

slide-20
SLIDE 20

Parameters inference

Gaussian model case: X „ NpµX, ΣXq

  • bserved data ty1, . . . , ynu sampled from Y “ X ` N „ NpµY , ΣY q.

The maximization of the likelihood Lpy; θq “ 1 2

n

ÿ

i“1

py ´ µY qT ΣY

´1py ´ µY q,

yields the Maximum Likelihood estimators (MLE) p µY “ 1 n

n

ÿ

i“1

yi, p ΣY “ 1 n

n

ÿ

i“1

pyi ´ p µY qT pyi ´ p µY q. Since ΣY “ ΣX ` σ2Ip, it yields p µX “ p µY , p ΣX “ p ΣY ´ σ2Ip.

17/48

slide-21
SLIDE 21

How to group patches?

Need to group the patches representing the same structure together

For instance with } ¨ }2 Ñ not robust for strong noise: Gaussian Mixture Models naturally provide a (more robust) grouping!

18/48

slide-22
SLIDE 22

Parameters inference

Gaussian Mixture Model case: X „ ř πkNpµk, Σkq

This implies a GMM on the noisy patches Y „ ř πkNpµk, Skq EM algorithm: maximize the conditional expectation of the complete log-likelihood:

K

ÿ

k“1 n

ÿ

i“1

tik log pπkg pyi; θkqq , where tik “ E rZ “ k|yi, θ˚s and θ˚ a given set of parameters.

E-step estimation of tik knowing the current parameters M-step compute maximum likelihood estimators (MLE) for parameters:

p πk “ nk n , p µk “ 1 nk ÿ

i

tikyi, p Sk “ 1 nk ÿ

i

tikpyi ´ µkqpyi ´ µkqT , with nk “ ř

i tik.

19/48

slide-23
SLIDE 23

Sketch of a denoising algorithm

With all these ingredients, we can design a denoising algorithm:

Extract the patches from the image with Pi operators Learn a GMM for the clean patches X from the observations of Y Denoise each patch with the MMSE Aggregate all the denoised patches with the P T

i

  • perators

20/48

slide-24
SLIDE 24

Sketch of a denoising algorithm

With all these ingredients, we can design a denoising algorithm:

Extract the patches from the image with Pi operators Learn a GMM for the clean patches X from the observations of Y Denoise each patch with the MMSE Aggregate all the denoised patches with the P T

i

  • perators

But...

20/48

slide-25
SLIDE 25

The curse of dimensionality

Parameter estimation for Gaussian models or GMMs suffers from the curse

  • f dimensionality

The number of samples needed for the estimation of a parameter grows exponentially with the dimension

21/48

slide-26
SLIDE 26

The curse of dimensionality in patches space

We consider patches of size p “ 10 ˆ 10 Ñ High dimension. Ñ the estimation of sample covariance matrices is difficult: ill conditioned, singular...

22/48

slide-27
SLIDE 27

The curse of dimensionality in patches space

We consider patches of size p “ 10 ˆ 10 Ñ High dimension. Ñ the estimation of sample covariance matrices is difficult: ill conditioned, singular... In the literature, this issue is generally worked around by

the use of small patches (3 ˆ 3 or 5 ˆ 5) NL-Bayes [Lebrun, Buades, Morel] adding εI to singular covariance matrices PLE [Yu, Sapiro, Mallat] fixing a lower dimension for covariance matrices S-PLE [Wang, Morel]

But, there is no reason to be afraid of this curse!

22/48

slide-28
SLIDE 28

The bless of dimensionality?

In high-dimensional spaces, it is easier to separate data:

Many patches represent structures that live locally in a low dimensional space: using this latent lower dimension allows to group the patches in a more robust way. This “bless” is used in clustering algorithms designed for high-dimension

High-Dimensional Data Clustering [Bouveyron, Girard, Schmid] 2007

23/48

slide-29
SLIDE 29

The bless of dimensionality?

An illustration in the context of patches: an image made of vertical stripes of width >2 pixels with random grey levels.

24/48

slide-30
SLIDE 30

The bless of dimensionality?

An illustration in the context of patches:

view 1 view 2

In the patch space, we cannot distinguish three classes

24/48

slide-31
SLIDE 31

The bless of dimensionality?

An illustration in the context of patches:

view 1 of the first 3 pixels view 2 of the first 3 pixels

The algorithm is now able to separate these classes!

24/48

slide-32
SLIDE 32
  • 3. High-Dimensional Mixture Models

for Image Denoising

25/48

slide-33
SLIDE 33

HDMI: presentation of the model

model the clean patches X

` Z latent random variable indicating group membership ` X lives in a low-dimensional subspace which is specific to its latent group: X|Z“k „ Npµk, UkΛkU T

k q

where Uk is a p ˆ dk orthogonal matrix and Λk “ diagpλk

1, . . . , λk dkq a

diagonal matrix of size dk ˆ dk.

26/48

slide-34
SLIDE 34

HDMI: induced model

Induced model on the noisy patches Y

The model on X implies that Y follows a full rank GMM ppyq “

K

ÿ

k“1

πkg py; µk, Σkq where UkΣkU t

k has the specific structure:

¨ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˝ ak1 ... akd σ2 ... σ2 ˛ ‹ ‹ ‹ ‹ ‹ ‹ ‹ ‹ ‹ ‹ ‚ , .

  • dk

, / / . / /

  • pp ´ dkq

where akj “ λk

j ` σ2 and akj ą σ2, for j “ 1, . . . , dk.

27/48

slide-35
SLIDE 35

Denoising with the HDMI model

The HDMI model being known, each patch is denoised with the MMSE p xi “ ErX|Y “ yis “

K

ÿ

k“1

tikψkpyiq where tik is the posterior probability for the patch yi to belong in the k-th group and ψkpyiq “ µk ` Uk ¨ ˚ ˚ ˝

ak1´σ2 ak1

...

akdk ´σ2 akdk

˛ ‹ ‹ ‚U T

k pyi ´ µkq.

28/48

slide-36
SLIDE 36

Model inference

with an EM algorithm, the parameters are updated during the M-step :

p

Uk is formed by the dk first eigenvectors of the sample covariance matrix

p

akj is the j-th eigenvalue of the sample covariance matrix

29/48

slide-37
SLIDE 37

Model inference

with an EM algorithm, the parameters are updated during the M-step :

p

Uk is formed by the dk first eigenvectors of the sample covariance matrix

p

akj is the j-th eigenvalue of the sample covariance matrix The hyper-parameters K and d1, . . . , dK cannot be determined by maximizing the log-likelihood since they control the model complexity. Ñ Each set of K and d1, . . . , dK corresponds to a different model.

29/48

slide-38
SLIDE 38

Model inference

We propose to set K at a given value and to choose the intrinsic dimensions dk:

using an heuristic that links dk with the noise variance σ2 when known; using a model selection tool in order to select the best variance σ2 when

unknown.

30/48

slide-39
SLIDE 39

Estimation of intrinsic dimensions – known variance

With dk begin fixed, the MLE for the noise variance in the kth group is p σ2

|k “

1 p ´ dk

p

ÿ

j“dk`1

p akj. When the noise variance σ is known, this gives us the following heuristic:

  • Heuristic. Given a value of σ2 and for k “ 1, ..., K, we estimate the

dimension dk by x dk “ argmind ˇ ˇ ˇ ˇ ˇ 1 p ´ d

p

ÿ

j“d`1

p akj ´ σ2 ˇ ˇ ˇ ˇ ˇ .

31/48

slide-40
SLIDE 40

Estimation of intrinsic dimensions – convergence

By re-evaluating the dimensions, we change the model at each M-step! Question: is the convergence ensured?

32/48

slide-41
SLIDE 41

Estimation of intrinsic dimensions – convergence

By re-evaluating the dimensions, we change the model at each M-step! Question: is the convergence ensured?

dimensions groups

the dimensions stabilize Ñ there exists an iteration where the algorithm becomes a classic EM.

32/48

slide-42
SLIDE 42

Estimation of intrinsic dimensions – unknown variance

Each value of σ yields a different model, we propose to select the one with the better BIC (Bayesian Information Criterion) BICpMq “ ℓpˆ θq ´ ξpMq 2 logpnq, where ξpMq is the complexity of the model. Why BIC is well-adapted for the selection of σ?

If σ is too small, the likelihood is good but the complexity explodes; if σ is too high, the complexity is low but the likelihood is bad.

33/48

slide-43
SLIDE 43

Estimation of intrinsic dimensions – unknown variance

∆k “ ¨ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˚ ˝ ak1 ... akd σ2 ... σ2 ˛ ‹ ‹ ‹ ‹ ‹ ‹ ‹ ‹ ‹ ‹ ‚ , .

  • dk

, / / . / /

  • pp ´ dkq

Why BIC is well-adapted for the selection of σ?

If σ is too small, the likelihood is good but the complexity explodes; if σ is too high, the complexity is low but the likelihood is bad.

33/48

slide-44
SLIDE 44

Summary: the HDMI algorithm

We presented the HDMI model for image denoising:

which models the full process of the generation of the noisy patches; a fully statistical modeling without the usual “denoising cuisine”; can be used in a “blind” way thanks to BIC selection; attains state-of-the-art performances!

34/48

slide-45
SLIDE 45

Numerical Experiments

Clean image

35/48

slide-46
SLIDE 46

Numerical Experiments

Noisy image σ “ 50

35/48

slide-47
SLIDE 47

Numerical Experiments

Denoised with BM3D, Foi et al. 2007, psnr = 27.17dB

35/48

slide-48
SLIDE 48

Numerical Experiments

Denoised with FFDNet, Zhang et al. 2018, psnr = 27.58dB

35/48

slide-49
SLIDE 49

Numerical Experiments

Denoised with HDMI K “ 50, psnr = 27.28dB

35/48

slide-50
SLIDE 50

Numerical Experiments – zooms

Clean image

36/48

slide-51
SLIDE 51

Numerical Experiments – zooms

Noisy image σ “ 50

36/48

slide-52
SLIDE 52

Numerical Experiments – zooms

Denoised with BM3D, Foi et al. 2007, psnr = 27.17dB

36/48

slide-53
SLIDE 53

Numerical Experiments – zooms

Denoised with FFDNet, Zhang et al. 2018, psnr = 27.58dB

36/48

slide-54
SLIDE 54

Numerical Experiments – zooms

Denoised with HDMI K “ 50, psnr = 27.28dB

36/48

slide-55
SLIDE 55

Numerical Experiments

Clean image

37/48

slide-56
SLIDE 56

Numerical Experiments

Noisy image σ “ 50

37/48

slide-57
SLIDE 57

Numerical Experiments

Denoised with BM3D, Foi et al. 2007, psnr = 26.55.dB

37/48

slide-58
SLIDE 58

Numerical Experiments

Denoised with FFDNet, Zhang et al. 2018, psnr = 27.45dB

37/48

slide-59
SLIDE 59

Numerical Experiments

Denoised with HDMI K “ 50, psnr = 27.05dB

37/48

slide-60
SLIDE 60

Numerical Experiments – zooms

Clean image

38/48

slide-61
SLIDE 61

Numerical Experiments – zooms

Noisy image σ “ 50

38/48

slide-62
SLIDE 62

Numerical Experiments – zooms

Denoised with BM3D, Foi et al. 2007, psnr = 26.55.dB

38/48

slide-63
SLIDE 63

Numerical Experiments – zooms

Denoised with FFDNet, Zhang et al. 2018, psnr = 27.45dB

38/48

slide-64
SLIDE 64

Numerical Experiments – zooms

Denoised with HDMI K “ 50, psnr = 27.05dB

38/48

slide-65
SLIDE 65
  • 4. Limitations of denoising in the

patch-space

39/48

slide-66
SLIDE 66

The lower bound for patch-based image denoising

“Is denoising dead” [Chatterjee, Milanfar] 2010 proposed a lower bound for patch-based image denoising. In this context, denoting mk the number of patches in the k-th group and N the total number of patches, the bound for HDMI is E “ }u ´ p uHDMI}2‰ ě 1 N

K

ÿ

k“1

mk TrpΣkqσ2 p ` σ2 , ě C σ2 Npp ` σ2q

K

ÿ

k“1

mk “ C σ2 p ` σ2 independent of N. even if the number of samples increases by stretching the image size to infinity, the noise variance cannot be reduced more than a factor p.

40/48

slide-67
SLIDE 67

The lower bound for patch-based image denoising

HDMI (patches 3 ˆ 10) - PSNR = 30.12 L2 grouping (patches 3 ˆ 10) - PSNR = 25.03

41/48

slide-68
SLIDE 68

The lower bound for patch-based image denoising

HDMI (patches 3 ˆ 10) - PSNR = 30.27 L2 grouping (patches 3 ˆ 10) - PSNR = 30.84 cropped: actual images height is 500 pixels.

42/48

slide-69
SLIDE 69

The low frequency noise

Denoised with HDMI K “ 50, psnr = 36.47 dB

43/48

slide-70
SLIDE 70

Removing low frequency noise by denoising the DC component

Define the centered observed random variable Y c

i “ Yi ´ s

Yi1p, where s Yi “ 1 p

p

ÿ

j“1

Yipjq, is the DC component of the patch.

The noise model can then be divided into the two following problems

s Yi “ s Xi ` s Ni P R, (1) Y c

i “ Xc i ` N c i P Rp.

(2)

44/48

slide-71
SLIDE 71

Removing low frequency noise by denoising the DC component

The DC component can be reshaped as an image

Ñ

Extract patches from this image yields additive Gaussian noise problem

with colored noise

A change of basis brings us back to an additive white Gaussian noise Ñ

can be denoised with the HDMI method

45/48

slide-72
SLIDE 72

Results

Noisy with σ “ 50

46/48

slide-73
SLIDE 73

Results

Denoised with HDMI K “ 50, psnr = 36.47 dB

46/48

slide-74
SLIDE 74

Results

+ corrected DC component (HDMI K “ 30), psnr = 36.90 dB

46/48

slide-75
SLIDE 75

Results

Denoised with FFDNet, Zhang et al. 2018, psnr = 36.72dB

46/48

slide-76
SLIDE 76

Conclusion and future work

We explored model-based patch-based image denoising and we designed the HDMI model that performs state-of-the-art results. This work open several questions and future works:

Statistical modeling versus deep learning?

Ñ Statistical modeling is not dead yet! Ñ complementary approaches

Lower-bound for the denoising quality

Ñ change of paradigm: use the HDMI model in a global way.

Some miss-classifications when the noise variance is high

Ñ use of robust estimators such as the geometric median.

Extension to other image problem

Ñ missing pixels, inpainting, texture generation.

47/48

slide-77
SLIDE 77

Thank you for your attention! Any question?

More information on the HDMI model and my new preprint: houdard.wp.imt.fr

48/48

slide-78
SLIDE 78

Aggregation problem

Each pixel belongs in p patches: In all the experiments here: uniform aggregation. In the literature: there exist different aggregation methods Ñ able to improve visual results but in many cases, the final pixel is still

  • btained from a fixed number of realizations.
slide-79
SLIDE 79

Other inverse problem : missing pixels

70% missing pixels

EM is well-adapted for missing data Ñ the model can be easily adapted for missing pixel restoration

slide-80
SLIDE 80

Other inverse problem : missing pixels

restored with HDMI

EM is well-adapted for missing data Ñ the model can be easily adapted for missing pixel restoration

slide-81
SLIDE 81

Regularizing effect of the dimension reduction

slide-82
SLIDE 82

The HDMI algorithm

Input u noisy image, p patch size, K number of groups, tσ1, . . . , σmu list of standard deviation. Output ˆ u denoised image. Extract ty1, . . . , ynu patches from u; for σ “ σ1, . . . , σm do Initialization few iteration of k-means. dl Ð 8. while dl ą ǫ do M-step update parameters and dimensions dk E-step compute tik. update the log-likelihood l and compute the relative error dl “ |l ´ lex|{|l|. lex Ð l. end while compute the BIC for the model associated with σ end for select the model with the better BIC. compute denoised patches tx1, . . . , xnu with conditional expectation; aggregate patches xi in order to recover the denoised image v.

slide-83
SLIDE 83

Learning on a sub-sample of the patches

Figure: Effect of the subsampling on the computing time and the denoising performance with HDMI. Left: PSNR versus sampling size. Right: Computation time versus same sampling size. Dotted-lines: 20% subsampling.

slide-84
SLIDE 84

Influence of the number of group K

Figure: Denoising results (PSNR) with regard to K (left) and choice of K with BIC (right).

slide-85
SLIDE 85

Selection of σ2 with BIC