On Casting Importance Weighted Autoencoder to an EM Algorithm to - - PowerPoint PPT Presentation

on casting importance weighted autoencoder to an em
SMART_READER_LITE
LIVE PREVIEW

On Casting Importance Weighted Autoencoder to an EM Algorithm to - - PowerPoint PPT Presentation

On Casting Importance Weighted Autoencoder to an EM Algorithm to Learn Deep Generative Models D.Kim 1 and J.Hwang 2 and Y.Kim 1 Speaker : Dongha Kim 1 Department of Statistics, Seoul National University, South Korea 2 SK Telecom, South Korea


slide-1
SLIDE 1

On Casting Importance Weighted Autoencoder to an EM Algorithm to Learn Deep Generative Models

D.Kim 1 and J.Hwang 2 and Y.Kim 1 Speaker : Dongha Kim

1Department of Statistics, Seoul National University, South Korea 2SK Telecom, South Korea

November 07, 2019

XAIENCE 2019 November 07, 2019 1 / 31

slide-2
SLIDE 2

Introduction

Outline

1 Introduction 2 Proposed methods

IWAE as EM algorithm IWEM miss-IWAE

3 Empirical analysis 4 Summary 5 References

XAIENCE 2019 November 07, 2019 2 / 31

slide-3
SLIDE 3

Introduction

Deep generative model with latent variable

  • X : observable variable
  • Z : latent variable

Z ∼ p(z) (ex: N(0, I)) X|Z = z ∼ p(x|z; θ)

XAIENCE 2019 November 07, 2019 3 / 31

slide-4
SLIDE 4

Introduction

Deep generative model with latent variable

  • The log-likelihood of the observable vector x:

log p(x; θ) = log

  • p(x|z; θ)p(z)dz.
  • Marginalizing operation is problematic.

→ Hard to estimate MLE directly.

  • An alternative approach:
  • Calculate lower bound which is easy to compute.
  • VAE (Kingma and Welling, 2013; Rezende et al., 2014)

IWAE (Burda et al., 2015)

XAIENCE 2019 November 07, 2019 4 / 31

slide-5
SLIDE 5

Introduction

Variational autoencoders (VAE)

  • Employ a variational posterior distribution q(z|x; φ):

LVAE(x; θ, φ) := Ez∼q

  • log

p(x, z; θ) q(z|x; φ)

  • In practice, we use the Monte Carlo method:

ˆ LVAE(x; θ, φ) := 1 L

L

  • l=1

log p(x, zl; θ) q(zl|x; φ)

  • ,

where z1, ..., zL ∼ q(z|x; φ).

  • Maximize ˆ

LVAE w.r.t. (θ, φ).

XAIENCE 2019 November 07, 2019 5 / 31

slide-6
SLIDE 6

Introduction

Importance weighted autoencoders (IWAE)

  • Use multiple samples from q(z|x; φ).

LIWAE(x; θ, φ) := Ez1,...,zK∼q

  • log
  • 1

K

K

  • k=1

p(x, zk; θ) q(zk|x; φ)

  • More tight lower bound than VAE.
  • Use the Monte Carlo method:

ˆ LIWAE(x; θ, φ) := log

  • 1

K

K

  • k=1

p(x, zk; θ) q(zk|x; φ)

  • ,

where z1, ..., zK ∼ q(z|x; φ).

  • Maximize ˆ

LIWAE w.r.t. (θ, φ).

XAIENCE 2019 November 07, 2019 6 / 31

slide-7
SLIDE 7

Introduction

Contents

  • Interpret IWAE as an EM algorithm with importance sampling (IS).
  • Develop IWAE by

1 learning the proposal distribution carefully 2 and devising an annealing strategy.

→ IWEM (importance weighted EM algorithm )

  • Generalize IWEM for missing data problems.

→ miss-IWEM

XAIENCE 2019 November 07, 2019 7 / 31

slide-8
SLIDE 8

Proposed methods

Outline

1 Introduction 2 Proposed methods

IWAE as EM algorithm IWEM miss-IWAE

3 Empirical analysis 4 Summary 5 References

XAIENCE 2019 November 07, 2019 8 / 31

slide-9
SLIDE 9

Proposed methods IWAE as EM algorithm

Outline

1 Introduction 2 Proposed methods

IWAE as EM algorithm IWEM miss-IWAE

3 Empirical analysis 4 Summary 5 References

XAIENCE 2019 November 07, 2019 9 / 31

slide-10
SLIDE 10

Proposed methods IWAE as EM algorithm

EM algorithm

1 E-step

  • θc : the current estimate of θ.
  • Calculate the expected value of the complete log likelihood function:

Q(θ|θc; x) := Ez∼p(z|x;θc) [log p(x, z; θ)] .

2 M-step

  • Update the current estimate by maximizing Q(θ|θc; x).

XAIENCE 2019 November 07, 2019 10 / 31

slide-11
SLIDE 11

Proposed methods IWAE as EM algorithm

EM algorithm with IS

1 E-step

  • Approximate Q by employing a proposal distribution q(z|x; φ):

ˆ Q(θ|θc, φ; x) :=

K

  • k=1

wk K

k′=1 wk′ · log p(x, zk; θ)

where zk ∼ q(z|x; φ) and wk = p(x,zk;θc)

q(zk|x;φ) for k = 1, ..., K.

2 M-step

  • Update θ by maximizing ˆ

Q(θ|θc, φ; x).

3 P-step (if necessary)

  • Update φ by encouraging q(z|x; φ) to be a good proposal distribution.

XAIENCE 2019 November 07, 2019 11 / 31

slide-12
SLIDE 12

Proposed methods IWAE as EM algorithm

IWAE = EM algorithm

Proposition 1. The following equality holds for any θc: ∇θˆ LIWAE(x; θ, φ)

  • θ=θc = ∇θ ˆ

Q(θ|θc, φ; x)

  • θ=θc

IWAE = EM algorithm

  • if we use GD based optimization method.
  • Updating φ in IWAE can be understood as P-step:

max

φ

ˆ LIWAE(x; θc, φ)

XAIENCE 2019 November 07, 2019 12 / 31

slide-13
SLIDE 13

Proposed methods IWAE as EM algorithm

IWAE = EM algorithm (cont.)

IWAE as EM algorithm 1 E-step

  • Calculate ˆ

Q(θ|θc, φ; x)

2 M-step

  • Update θ by maximizing ˆ

Q(θ|θc, φ; x).

3 P-step

  • Update φ by maximizing ˆ

LIWAE(x; θc, φ).

XAIENCE 2019 November 07, 2019 13 / 31

slide-14
SLIDE 14

Proposed methods IWEM

Outline

1 Introduction 2 Proposed methods

IWAE as EM algorithm IWEM miss-IWAE

3 Empirical analysis 4 Summary 5 References

XAIENCE 2019 November 07, 2019 14 / 31

slide-15
SLIDE 15

Proposed methods IWEM

Optimal P-step

Using ˆ Q inevitably causes variance due to IS.

  • Small variance → stable learning procedure
  • The optimal proposal distribution (Owen, 2013):

qopt(z) ∝ |log p(x, z; θc)| · p(x, z; θc).

IWAE uses p(x, z; θc).

  • New P-step: replace p(x, z; θc) in IWAE to qopt(z):

ˆ Lopt(θc, φ; x) := log

  • 1

K

K

  • k=1

qopt(zk) q(zk|x; φ)

  • .

XAIENCE 2019 November 07, 2019 15 / 31

slide-16
SLIDE 16

Proposed methods IWEM

Annealing strategy

  • In general,

Var ˆ LVAE(x; θ, φ)

  • ≪Var
  • ˆ

Q(θ|θ, φ; x)

  • .
  • Using VAE at early steps → small variance
  • New E-step : take a convex combination with VAE:

ˆ Qα(θ|θc, φ; x) := α · ˆ Q(θ|θc, φ; x) + (1 − α) · ˆ LVAE(θ, φ; x).

  • α ∈ [0, 1] : annealing controller
  • start from zero and
  • increase it incrementally up to one as the iteration proceeds.

XAIENCE 2019 November 07, 2019 16 / 31

slide-17
SLIDE 17

Proposed methods IWEM

IWAE vs. IWEM

IWAE 1 E-step

  • Calculate ˆ

Q(θ|θc, φ; x)

2 M-step

  • Update θ by maximizing ˆ

Q.

3 P-step

  • Update φ by maximizing

ˆ LIWAE(x; θc, φ).

IWEM 1 E-step

  • Calculate ˆ

Qα(θ|θc, φ; x)

2 M-step

  • Update θ by maximizing ˆ

Qα.

3 P-step

  • Update φ by maximizing

ˆ Lopt(θc, φ; x).

XAIENCE 2019 November 07, 2019 17 / 31

slide-18
SLIDE 18

Proposed methods miss-IWAE

Outline

1 Introduction 2 Proposed methods

IWAE as EM algorithm IWEM miss-IWAE

3 Empirical analysis 4 Summary 5 References

XAIENCE 2019 November 07, 2019 18 / 31

slide-19
SLIDE 19

Proposed methods miss-IWAE

Missing data problem

  • x = (x(o), x(m))
  • We only observe x(o).
  • The log-likelihood is

log p(x(o); θ) = log

  • p(x(o), x(m), z; θ)dzdx(m)

Need to formulate a proposal distribution of (x(m), z).

XAIENCE 2019 November 07, 2019 19 / 31

slide-20
SLIDE 20

Proposed methods miss-IWAE

Formulation of proposal distribution

  • We use the following proposal distribution:

q(x(m), z|x(o); θ, φ) := p(x(m)|z; θ) · q(z|˘ x; φ)

  • q(z|x; φ) : same distribution as q in IWEM.
  • ˘

x = (x(o), ˘ x(m))

  • ˘

x(m) : imputed value of x(m).

  • Draw ˘

z from the distribution q(z|(x(o), 0); φ),

  • and draw ˘

x(m) from the distribution p(x(m)|˘ z; θ).

XAIENCE 2019 November 07, 2019 20 / 31

slide-21
SLIDE 21

Proposed methods miss-IWAE

miss-IWEM

Simply replaces q(z|x; φ) in IWEM to q(x(m), z|x(o); θ, φ). 1 E-step

  • Calculate

ˆ Qα

m(θ|θc, φ; x(o)) := α · ˆ

Qm(θ|θc, φ; x(o)) + (1 − α) · ˆ LVAE

m

(θ, φ; x(o)).

2 M-step

  • Update θ by maximizing ˆ

m(θ|θc, φ; x(o)).

3 P-step

  • Update φ by maximizing ˆ

Lopt

m (θc, φ; x(o)).

XAIENCE 2019 November 07, 2019 21 / 31

slide-22
SLIDE 22

Empirical analysis

Outline

1 Introduction 2 Proposed methods

IWAE as EM algorithm IWEM miss-IWAE

3 Empirical analysis 4 Summary 5 References

XAIENCE 2019 November 07, 2019 22 / 31

slide-23
SLIDE 23

Empirical analysis

Experimental setup

  • Model
  • p(z): N(040, I40)
  • (p(x|z; θ), q(z|x; φ)): (MLP, MLP) or (DeConv, Conv)
  • Optimization algorithm
  • Adam (Kingma and Ba, 2014)
  • Performance measure
  • Approximated test log-likelihood.
  • Datasets
  • Static biMNIST, Dynamic biMNIST, Omniglot, Caltech 101 Silhouette

XAIENCE 2019 November 07, 2019 23 / 31

slide-24
SLIDE 24

Empirical analysis

Complete data analysis

Performance results

MLP VAE IWAE IWEM-woa1 IWEM

  • sta. MNIST
  • 88.21
  • 87.68
  • 87.00
  • 87.11
  • dyn. MNIST
  • 85.31
  • 84.30
  • 84.10
  • 84.16

Omniglot

  • 108.46
  • 106.80
  • 106.50
  • 106.38

Caltech 101

  • 119.67
  • 118.06
  • 116.92
  • 116.54

CNN VAE IWAE IWEM-woa1 IWEM

  • sta. MNIST
  • 84.63
  • 83.54
  • 83.32
  • 83.77
  • dyn. MNIST
  • 84.08
  • 81.56
  • 81.07
  • 81.28

Omniglot

  • 101.63
  • 100.27
  • 100.15
  • 100.39

Caltech 101

  • 109.24
  • 106.94
  • 106.19
  • 106.05

1IWEM w/o annealing strategy XAIENCE 2019 November 07, 2019 24 / 31

slide-25
SLIDE 25

Empirical analysis

Incomplete data analysis

Generation of missing samples 1 Divide an image into 9 equal patches. 2 Generate an incomplete image by removing the predefined number of patches randomly.

XAIENCE 2019 November 07, 2019 25 / 31

slide-26
SLIDE 26

Empirical analysis

Incomplete data analysis (cont.)

Performance results

  • Static biMNIST + (MLP, MLP)
  • Missing rate ↑ ⇒ margin ↑

# of cropped patches missIWAE2 miss-IWEM-woa3 miss-IWEM 3

  • 90.29
  • 89.79
  • 89.71

4

  • 92.07
  • 90.97
  • 90.76

5

  • 95.54
  • 93.33
  • 92.23

6

  • 102.26
  • 97.66
  • 95.18

2Mattei and Frellsen (2018) 3miss-IWEM w/o annealing strategy XAIENCE 2019 November 07, 2019 26 / 31

slide-27
SLIDE 27

Summary

Outline

1 Introduction 2 Proposed methods

IWAE as EM algorithm IWEM miss-IWAE

3 Empirical analysis 4 Summary 5 References

XAIENCE 2019 November 07, 2019 27 / 31

slide-28
SLIDE 28

Summary

Summary

1 Proposed a new learning algorithm called IWEM.

  • Showed that IWAE can be understood as an EM algorithm.
  • Devised two new techniques to reduce the variance due to IS in E-step.

2 Modified IWEM for missing data, called miss-IWEM.

XAIENCE 2019 November 07, 2019 28 / 31

slide-29
SLIDE 29

Summary XAIENCE 2019 November 07, 2019 29 / 31

slide-30
SLIDE 30

References

Outline

1 Introduction 2 Proposed methods

IWAE as EM algorithm IWEM miss-IWAE

3 Empirical analysis 4 Summary 5 References

XAIENCE 2019 November 07, 2019 30 / 31

slide-31
SLIDE 31

References

References I

Burda, Y., Grosse, R., and Salakhutdinov, R. (2015). Importance weighted autoencoders. arXiv preprint arXiv:1509.00519. Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. Kingma, D. P. and Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Mattei, P.-A. and Frellsen, J. (2018). missiwae: Deep generative modelling and imputation of incomplete data. arXiv preprint arXiv:1812.02633. Owen, A. B. (2013). Monte Carlo theory, methods and examples. Rezende, D. J., Mohamed, S., and Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082.

XAIENCE 2019 November 07, 2019 31 / 31