Divide and Couple: Using Monte Carlo Variational Objectives for - - PowerPoint PPT Presentation

divide and couple using monte carlo variational
SMART_READER_LITE
LIVE PREVIEW

Divide and Couple: Using Monte Carlo Variational Objectives for - - PowerPoint PPT Presentation

Divide and Couple: Using Monte Carlo Variational Objectives for Posterior Approximation Justin Domke and Daniel Sheldon University of Massachusetts Amherst Overview Variational inference gives both a lower-bound on the log-likelihood and an


slide-1
SLIDE 1

Divide and Couple: Using Monte Carlo Variational Objectives for Posterior Approximation

Justin Domke and Daniel Sheldon

University of Massachusetts Amherst

Overview Variational inference gives both a lower-bound on the log-likelihood and an approximate posterior. Easy to get other lower-bounds. Do they also give approximate posteriors? This work: A general theory connecting likelihood bounds to posterior approximations.

slide-2
SLIDE 2

p(z, x)

− → z Take p(z,x) with x fixed.

slide-3
SLIDE 3

p(z, x)

− → z Take p(z,x) with x fixed. Observation: If ER = p(x), then ElogR ≤ logp(x).

slide-4
SLIDE 4

p(z, x)

− → z Take p(z,x) with x fixed. Observation: If ER = p(x), then ElogR ≤ logp(x). Example: Take R = p(x,z)

q(z) for z ∼ q Gaussian, optimize q.

slide-5
SLIDE 5

logR = 0.237 p(z, x) q(z), naive

− → z Take p(z,x) with x fixed. Observation: If ER = p(x), then ElogR ≤ logp(x). Example: Take R = p(x,z)

q(z) for z ∼ q Gaussian, optimize q.

slide-6
SLIDE 6

logR = 0.237 p(z, x) q(z), naive

− → z Take p(z,x) with x fixed. Observation: If ER = p(x), then ElogR ≤ logp(x). Example: Take R = p(x,z)

q(z) for z ∼ q Gaussian, optimize q.

Decomposition: KL(q(z)p(z|x)) = logp(x)−ElogR. Likelihood bound: Posterior approximation:

slide-7
SLIDE 7

p(z, x)

Recent work: Better Monte Carlo estimators R.

slide-8
SLIDE 8

logR′ = 0.060 p(z, x) q(z), antithetic

Recent work: Better Monte Carlo estimators R. Antithetic Sampling: Let T(z) “flip” z around mean of q. R = 1 2 p(z,x)+p(T(z),x) q(z)

slide-9
SLIDE 9

logR′ = 0.060 p(z, x) q(z), antithetic

Recent work: Better Monte Carlo estimators R. Antithetic Sampling: Let T(z) “flip” z around mean of q. R = 1 2 p(z,x)+p(T(z),x) q(z)

  • Likelihood bound:

Posterior approximation: × × ×

slide-10
SLIDE 10

logR′ = 0.060 p(z, x) q(z), antithetic

Recent work: Better Monte Carlo estimators R. Antithetic Sampling: Let T(z) “flip” z around mean of q. R = 1 2 p(z,x)+p(T(z),x) q(z)

  • Likelihood bound:

Posterior approximation: × × × This paper: Is some other distribution close to p?

slide-11
SLIDE 11

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR.

logR′ = 0.060 p(z, x) q(z), antithetic logR′ = 0.060 p(z, x) Q(z), antithetic

slide-12
SLIDE 12

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR.

logR′ = 0.060 p(z, x) q(z), antithetic logR′ = 0.060 p(z, x) Q(z), antithetic

slide-13
SLIDE 13

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR.

logR′ = 0.063 p(z, x) q(z), stratified logR′ = 0.063 p(z, x) Q(z), stratified

slide-14
SLIDE 14

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR.

logR′ = 0.021 p(z, x) q(z), antithetic within strata logR′ = 0.021 p(z, x) Q(z), antithetic within strata

slide-15
SLIDE 15

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR. How?

slide-16
SLIDE 16

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR. Unbiased estimator: Where is z? E

ω R(ω) = p(x)

slide-17
SLIDE 17

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR. Unbiased estimator: Where is z? E

ω R(ω) = p(x)

We suggest: Need a coupling: E

ω R(ω)a(z|ω

ω ω)

coupling

= p(z,x)

slide-18
SLIDE 18

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR. Unbiased estimator: Where is z? E

ω R(ω) = p(x)

We suggest: Need a coupling: E

ω R(ω)a(z|ω

ω ω)

coupling

= p(z,x) Then, exist augmented distributions s.t. KL(Q(z,ω ω ω)p(z,ω ω ω|x)) = logp(x)−ElogR

slide-19
SLIDE 19

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR. Summary: Tightening a bound logp(x)−ElogR is equivalent to VI in an augmented state space (ω ω ω,z). To sample from Q(z) draw ω then z ∼ a(z|ω). Paper gives couplings for:

◮ Antithetic sampling ◮ Stratified sampling ◮ Quasi Monte Carlo ◮ Latin hypercube sampling ◮ Arbitrary recursive combinations of above

slide-20
SLIDE 20

Implementation: Different sampling methods with Gaussian q.

slide-21
SLIDE 21

Experiments confirm: Better likelihood bounds ⇔ better posteriors Poster: Tue Dec 10th, 5:30-7:30pm @ East Exhibition Hall B + C #166