Divide and Couple: Using Monte Carlo Variational Objectives for - - PowerPoint PPT Presentation

▶

Sep 14, 2023 116 likes •335 views

Divide and Couple: Using Monte Carlo Variational Objectives for Posterior Approximation Justin Domke and Daniel Sheldon University of Massachusetts Amherst Overview Variational inference gives both a lower-bound on the log-likelihood and an

SLIDE 1

Divide and Couple: Using Monte Carlo Variational Objectives for Posterior Approximation

Justin Domke and Daniel Sheldon

University of Massachusetts Amherst

Overview Variational inference gives both a lower-bound on the log-likelihood and an approximate posterior. Easy to get other lower-bounds. Do they also give approximate posteriors? This work: A general theory connecting likelihood bounds to posterior approximations.

SLIDE 2

p(z, x)

− → z Take p(z,x) with x fixed.

SLIDE 3

p(z, x)

− → z Take p(z,x) with x fixed. Observation: If ER = p(x), then ElogR ≤ logp(x).

SLIDE 4

p(z, x)

− → z Take p(z,x) with x fixed. Observation: If ER = p(x), then ElogR ≤ logp(x). Example: Take R = p(x,z)

q(z) for z ∼ q Gaussian, optimize q.

SLIDE 5

logR = 0.237 p(z, x) q(z), naive

− → z Take p(z,x) with x fixed. Observation: If ER = p(x), then ElogR ≤ logp(x). Example: Take R = p(x,z)

q(z) for z ∼ q Gaussian, optimize q.

SLIDE 6

logR = 0.237 p(z, x) q(z), naive

− → z Take p(z,x) with x fixed. Observation: If ER = p(x), then ElogR ≤ logp(x). Example: Take R = p(x,z)

q(z) for z ∼ q Gaussian, optimize q.

Decomposition: KL(q(z)p(z|x)) = logp(x)−ElogR. Likelihood bound: Posterior approximation:

SLIDE 7

p(z, x)

Recent work: Better Monte Carlo estimators R.

SLIDE 8

logR′ = 0.060 p(z, x) q(z), antithetic

Recent work: Better Monte Carlo estimators R. Antithetic Sampling: Let T(z) “flip” z around mean of q. R = 1 2 p(z,x)+p(T(z),x) q(z)

SLIDE 9

logR′ = 0.060 p(z, x) q(z), antithetic

Recent work: Better Monte Carlo estimators R. Antithetic Sampling: Let T(z) “flip” z around mean of q. R = 1 2 p(z,x)+p(T(z),x) q(z)

Likelihood bound:

Posterior approximation: × × ×

SLIDE 10

logR′ = 0.060 p(z, x) q(z), antithetic

Recent work: Better Monte Carlo estimators R. Antithetic Sampling: Let T(z) “flip” z around mean of q. R = 1 2 p(z,x)+p(T(z),x) q(z)

Likelihood bound:

Posterior approximation: × × × This paper: Is some other distribution close to p?

SLIDE 11

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR.

logR′ = 0.060 p(z, x) q(z), antithetic logR′ = 0.060 p(z, x) Q(z), antithetic

SLIDE 12

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR.

logR′ = 0.060 p(z, x) q(z), antithetic logR′ = 0.060 p(z, x) Q(z), antithetic

SLIDE 13

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR.

logR′ = 0.063 p(z, x) q(z), stratified logR′ = 0.063 p(z, x) Q(z), stratified

SLIDE 14

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR.

logR′ = 0.021 p(z, x) q(z), antithetic within strata logR′ = 0.021 p(z, x) Q(z), antithetic within strata

SLIDE 15

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR. How?

SLIDE 16

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR. Unbiased estimator: Where is z? E

ω R(ω) = p(x)

SLIDE 17

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR. Unbiased estimator: Where is z? E

ω R(ω) = p(x)

We suggest: Need a coupling: E

ω R(ω)a(z|ω

ω ω)

coupling

= p(z,x)

SLIDE 18

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR. Unbiased estimator: Where is z? E

ω R(ω) = p(x)

We suggest: Need a coupling: E

ω R(ω)a(z|ω

ω ω)

coupling

= p(z,x) Then, exist augmented distributions s.t. KL(Q(z,ω ω ω)p(z,ω ω ω|x)) = logp(x)−ElogR

SLIDE 19

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR. Summary: Tightening a bound logp(x)−ElogR is equivalent to VI in an augmented state space (ω ω ω,z). To sample from Q(z) draw ω then z ∼ a(z|ω). Paper gives couplings for:

◮ Antithetic sampling ◮ Stratified sampling ◮ Quasi Monte Carlo ◮ Latin hypercube sampling ◮ Arbitrary recursive combinations of above

SLIDE 20

Implementation: Different sampling methods with Gaussian q.

SLIDE 21

Divide and Couple: Using Monte Carlo Variational Objectives for Posterior Approximation

Justin Domke and Daniel Sheldon

University of Massachusetts Amherst

Overview Variational inference gives both a lower-bound on the log-likelihood and an approximate posterior. Easy to get other lower-bounds. Do they also give approximate posteriors? This work: A general theory connecting likelihood bounds to posterior approximations.

p(z, x)

− → z Take p(z,x) with x fixed.

p(z, x)

− → z Take p(z,x) with x fixed. Observation: If ER = p(x), then ElogR ≤ logp(x).

p(z, x)

− → z Take p(z,x) with x fixed. Observation: If ER = p(x), then ElogR ≤ logp(x). Example: Take R = p(x,z)

q(z) for z ∼ q Gaussian, optimize q.

logR = 0.237 p(z, x) q(z), naive

− → z Take p(z,x) with x fixed. Observation: If ER = p(x), then ElogR ≤ logp(x). Example: Take R = p(x,z)

q(z) for z ∼ q Gaussian, optimize q.

logR = 0.237 p(z, x) q(z), naive

− → z Take p(z,x) with x fixed. Observation: If ER = p(x), then ElogR ≤ logp(x). Example: Take R = p(x,z)

q(z) for z ∼ q Gaussian, optimize q.

Decomposition: KL(q(z)p(z|x)) = logp(x)−ElogR. Likelihood bound: Posterior approximation:

p(z, x)

Recent work: Better Monte Carlo estimators R.

logR′ = 0.060 p(z, x) q(z), antithetic

Recent work: Better Monte Carlo estimators R. Antithetic Sampling: Let T(z) “flip” z around mean of q. R = 1 2 p(z,x)+p(T(z),x) q(z)

logR′ = 0.060 p(z, x) q(z), antithetic

Recent work: Better Monte Carlo estimators R. Antithetic Sampling: Let T(z) “flip” z around mean of q. R = 1 2 p(z,x)+p(T(z),x) q(z)

Posterior approximation: × × ×

logR′ = 0.060 p(z, x) q(z), antithetic

Recent work: Better Monte Carlo estimators R. Antithetic Sampling: Let T(z) “flip” z around mean of q. R = 1 2 p(z,x)+p(T(z),x) q(z)

Posterior approximation: × × × This paper: Is some other distribution close to p?

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR.

logR′ = 0.060 p(z, x) q(z), antithetic logR′ = 0.060 p(z, x) Q(z), antithetic

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR.

logR′ = 0.060 p(z, x) q(z), antithetic logR′ = 0.060 p(z, x) Q(z), antithetic

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR.

logR′ = 0.063 p(z, x) q(z), stratified logR′ = 0.063 p(z, x) Q(z), stratified

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR.

logR′ = 0.021 p(z, x) q(z), antithetic within strata logR′ = 0.021 p(z, x) Q(z), antithetic within strata

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR. How?

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR. Unbiased estimator: Where is z? E

ω R(ω) = p(x)

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR. Unbiased estimator: Where is z? E

ω R(ω) = p(x)

We suggest: Need a coupling: E

ω R(ω)a(z|ω

ω ω)

coupling

= p(z,x)

Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR. Unbiased estimator: Where is z? E

ω R(ω) = p(x)

We suggest: Need a coupling: E

ω R(ω)a(z|ω

ω ω)

coupling

= p(z,x) Then, exist augmented distributions s.t. KL(Q(z,ω ω ω)p(z,ω ω ω|x)) = logp(x)−ElogR

Implementation: Different sampling methods with Gaussian q.

Experiments confirm: Better likelihood bounds ⇔ better posteriors Poster: Tue Dec 10th, 5:30-7:30pm @ East Exhibition Hall B + C #166