SLIDE 1
Divide and Couple: Using Monte Carlo Variational Objectives for - - PowerPoint PPT Presentation
Divide and Couple: Using Monte Carlo Variational Objectives for - - PowerPoint PPT Presentation
Divide and Couple: Using Monte Carlo Variational Objectives for Posterior Approximation Justin Domke and Daniel Sheldon University of Massachusetts Amherst Overview Variational inference gives both a lower-bound on the log-likelihood and an
SLIDE 2
SLIDE 3
p(z, x)
− → z Take p(z,x) with x fixed. Observation: If ER = p(x), then ElogR ≤ logp(x).
SLIDE 4
p(z, x)
− → z Take p(z,x) with x fixed. Observation: If ER = p(x), then ElogR ≤ logp(x). Example: Take R = p(x,z)
q(z) for z ∼ q Gaussian, optimize q.
SLIDE 5
logR = 0.237 p(z, x) q(z), naive
− → z Take p(z,x) with x fixed. Observation: If ER = p(x), then ElogR ≤ logp(x). Example: Take R = p(x,z)
q(z) for z ∼ q Gaussian, optimize q.
SLIDE 6
logR = 0.237 p(z, x) q(z), naive
− → z Take p(z,x) with x fixed. Observation: If ER = p(x), then ElogR ≤ logp(x). Example: Take R = p(x,z)
q(z) for z ∼ q Gaussian, optimize q.
Decomposition: KL(q(z)p(z|x)) = logp(x)−ElogR. Likelihood bound: Posterior approximation:
SLIDE 7
p(z, x)
Recent work: Better Monte Carlo estimators R.
SLIDE 8
logR′ = 0.060 p(z, x) q(z), antithetic
Recent work: Better Monte Carlo estimators R. Antithetic Sampling: Let T(z) “flip” z around mean of q. R = 1 2 p(z,x)+p(T(z),x) q(z)
SLIDE 9
logR′ = 0.060 p(z, x) q(z), antithetic
Recent work: Better Monte Carlo estimators R. Antithetic Sampling: Let T(z) “flip” z around mean of q. R = 1 2 p(z,x)+p(T(z),x) q(z)
- Likelihood bound:
Posterior approximation: × × ×
SLIDE 10
logR′ = 0.060 p(z, x) q(z), antithetic
Recent work: Better Monte Carlo estimators R. Antithetic Sampling: Let T(z) “flip” z around mean of q. R = 1 2 p(z,x)+p(T(z),x) q(z)
- Likelihood bound:
Posterior approximation: × × × This paper: Is some other distribution close to p?
SLIDE 11
Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR.
logR′ = 0.060 p(z, x) q(z), antithetic logR′ = 0.060 p(z, x) Q(z), antithetic
SLIDE 12
Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR.
logR′ = 0.060 p(z, x) q(z), antithetic logR′ = 0.060 p(z, x) Q(z), antithetic
SLIDE 13
Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR.
logR′ = 0.063 p(z, x) q(z), stratified logR′ = 0.063 p(z, x) Q(z), stratified
SLIDE 14
Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR.
logR′ = 0.021 p(z, x) q(z), antithetic within strata logR′ = 0.021 p(z, x) Q(z), antithetic within strata
SLIDE 15
Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR. How?
SLIDE 16
Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR. Unbiased estimator: Where is z? E
ω R(ω) = p(x)
SLIDE 17
Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR. Unbiased estimator: Where is z? E
ω R(ω) = p(x)
We suggest: Need a coupling: E
ω R(ω)a(z|ω
ω ω)
coupling
= p(z,x)
SLIDE 18
Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR. Unbiased estimator: Where is z? E
ω R(ω) = p(x)
We suggest: Need a coupling: E
ω R(ω)a(z|ω
ω ω)
coupling
= p(z,x) Then, exist augmented distributions s.t. KL(Q(z,ω ω ω)p(z,ω ω ω|x)) = logp(x)−ElogR
SLIDE 19
Contribution of this paper: Given estimator with ER = p(x), we show how to construct Q(z) such that KL(Q(z)p(z|x)) ≤ logp(x)−ElogR. Summary: Tightening a bound logp(x)−ElogR is equivalent to VI in an augmented state space (ω ω ω,z). To sample from Q(z) draw ω then z ∼ a(z|ω). Paper gives couplings for:
◮ Antithetic sampling ◮ Stratified sampling ◮ Quasi Monte Carlo ◮ Latin hypercube sampling ◮ Arbitrary recursive combinations of above
SLIDE 20
Implementation: Different sampling methods with Gaussian q.
SLIDE 21