Sampling Methods CMSC 691 UMBC (Some) Learning Techniques - - PowerPoint PPT Presentation
Sampling Methods CMSC 691 UMBC (Some) Learning Techniques - - PowerPoint PPT Presentation
Approximate Inference: Sampling Methods CMSC 691 UMBC (Some) Learning Techniques MAP/MLE: Point estimation, basic EM Variational Inference: Functional Optimization Sampling/Monte Carlo today Outline Monte Carlo methods Sampling
(Some) Learning Techniques
MAP/MLE: Point estimation, basic EM Variational Inference: Functional Optimization Sampling/Monte Carlo
today
Outline
Monte Carlo methods Sampling Techniques
Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling
Example: Collapsed Gibbs Sampler for Topic Models
Two Problems for Sampling Methods to Solve
Generate samples from p
π π¦ = π£ π¦ π , π¦ β βπΈ π¦1, π¦2, β¦ , π¦π samples Q: Why might sampling from p(x) be hard?
Two Problems for Sampling Methods to Solve
Generate samples from p
π π¦ = π£ π¦ π , π¦ β βπΈ π¦1, π¦2, β¦ , π¦π samples Q: Why might sampling from p(x) be hard? A1: Can we evaluate Z? A2: Can we sample without enumerating? (Correct samples should be where p is big)
Two Problems for Sampling Methods to Solve
Generate samples from p
π π¦ = π£ π¦ π , π¦ β βπΈ π¦1, π¦2, β¦ , π¦π samples Q: Why might sampling from p(x) be hard? A1: Can we evaluate Z? A2: Can we sample without enumerating? (Correct samples should be where p is big) π£ π¦ = exp(.4 π¦ β .4 2 β 0.08π¦4)
ITILA, Fig 29.1
Two Problems for Sampling Methods to Solve
Generate samples from p Estimate expectation of a function π
π π¦ = π£ π¦ π , π¦ β βπΈ π¦1, π¦2, β¦ , π¦π samples Q: Why might sampling from p(x) be hard? A1: Can we evaluate Z? A2: Can we sample without enumerating? (Correct samples should be where p is big) Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
= β« π π¦ π π¦ ππ¦
Two Problems for Sampling Methods to Solve
Generate samples from p Estimate expectation of a function π
π π¦ = π£ π¦ π , π¦ β βπΈ π¦1, π¦2, β¦ , π¦π samples Q: Why might sampling from p(x) be hard? A1: Can we evaluate Z? A2: Can we sample without enumerating? (Correct samples should be where p is big) Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
= β« π π¦ π π¦ ππ¦
ΰ·‘ Ξ¦ =
1 π Οπ π π¦π
Two Problems for Sampling Methods to Solve
Generate samples from p Estimate expectation of a function π
If we could sample from pβ¦ π π¦ = π£ π¦ π , π¦ β βπΈ π¦1, π¦2, β¦ , π¦π samples Q: Why is sampling from p(x) hard? A1: Can we evaluate Z? A2: Can we sample without enumerating? (Correct samples should be where p is big) Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
= β« π π¦ π π¦ ππ¦
ΰ·‘ Ξ¦ =
1 π Οπ π π¦π π½ ΰ·‘ Ξ¦ = Ξ¦
consistent estimator
Outline
Monte Carlo methods Sampling Techniques
Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling
Example: Collapsed Gibbs Sampler for Topic Models
Uniform Sampling
ΰ·‘ Ξ¦ = ΰ·
π
π π¦π πβ(π¦π )
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample uniformly: π¦1, π¦2, β¦ , π¦π
Uniform Sampling
πβ π¦ = π£ π¦ πβ
ΰ·‘ Ξ¦ = ΰ·
π
π π¦π πβ(π¦π )
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal: πβ = ΰ·
π
π£(π¦π )
sample uniformly: π¦1, π¦2, β¦ , π¦π
Uniform Sampling
πβ π¦ = π£ π¦ πβ
ΰ·‘ Ξ¦ = ΰ·
π
π π¦π πβ(π¦π )
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal: πβ = ΰ·
π
π£(π¦π )
sample uniformly: π¦1, π¦2, β¦ , π¦π
this might work if R (the number of samples) sufficiently hits high probability regions
Uniform Sampling
πβ π¦ = π£ π¦ πβ
ΰ·‘ Ξ¦ = ΰ·
π
π π¦π πβ(π¦π )
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal: πβ = ΰ·
π
π£(π¦π )
sample uniformly: π¦1, π¦2, β¦ , π¦π
this might work if R (the number of samples) sufficiently hits high probability regions Ising model example:
- 2H states of high
probability
- 2N states total
Uniform Sampling
πβ π¦ = π£ π¦ πβ
ΰ·‘ Ξ¦ = ΰ·
π
π π¦π πβ(π¦π )
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal: πβ = ΰ·
π
π£(π¦π )
sample uniformly: π¦1, π¦2, β¦ , π¦π
this might work if R (the number of samples) sufficiently hits high probability regions Ising model example:
- 2H states of high
probability
- 2N states total
chance of sample being in high prob. region:
2πΌ 2π
- min. samples needed: βΌ 2πβπΌ
Outline
Monte Carlo methods Sampling Techniques
Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling
Example: Collapsed Gibbs Sampler for Topic Models
Importance Sampling
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample from Q: π¦1, π¦2, β¦ , π¦π approximating distribution: π π¦ β π£π π¦
ITILA, Fig 29.5
Importance Sampling
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample from Q: π¦1, π¦2, β¦ , π¦π approximating distribution: π π¦ β π£π π¦
p(x)
ITILA, Fig 29.5
x where Q(x) > p(x):
- ver-represented
x where Q(x) < p(x): under-represented
Importance Sampling
π₯ π¦π = π£π π¦ π£π π¦
ΰ·‘ Ξ¦ = Οπ π π¦π π₯(π¦π ) Οπ π₯ π¦π
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample from Q: π¦1, π¦2, β¦ , π¦π approximating distribution: π π¦ β π£π π¦
p(x)
ITILA, Fig 29.5
x where Q(x) > p(x):
- ver-represented
x where Q(x) < p(x): under-represented
Importance Sampling
π₯ π¦π = π£π π¦ π£π π¦
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample from Q: π¦1, π¦2, β¦ , π¦π approximating distribution: π π¦ β π£π π¦
p(x)
ITILA, Fig 29.5
x where Q(x) > p(x):
- ver-represented
x where Q(x) < p(x): under-represented
ΰ·‘ Ξ¦ = Οπ π π¦π π₯(π¦π ) Οπ π₯ π¦π
Q: How reliable will this estimator be?
Importance Sampling
π₯ π¦π = π£π π¦ π£π π¦
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample from Q: π¦1, π¦2, β¦ , π¦π approximating distribution: π π¦ β π£π π¦
p(x)
ITILA, Fig 29.5
x where Q(x) > p(x):
- ver-represented
x where Q(x) < p(x): under-represented
ΰ·‘ Ξ¦ = Οπ π π¦π π₯(π¦π ) Οπ π₯ π¦π
Q: How reliable will this estimator be? A: In practice, difficult to say. π₯ π¦π may not be a good indicator
Importance Sampling
π₯ π¦π = π£π π¦ π£π π¦
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample from Q: π¦1, π¦2, β¦ , π¦π approximating distribution: π π¦ β π£π π¦
p(x)
ITILA, Fig 29.5
x where Q(x) > p(x):
- ver-represented
x where Q(x) < p(x): under-represented
ΰ·‘ Ξ¦ = Οπ π π¦π π₯(π¦π ) Οπ π₯ π¦π
Q: How reliable will this estimator be? A: In practice, difficult to say. π₯ π¦π may not be a good indicator Q: How do you choose a good approximating distribution?
Importance Sampling
π₯ π¦π = π£π π¦ π£π π¦
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample from Q: π¦1, π¦2, β¦ , π¦π approximating distribution: π π¦ β π£π π¦
p(x)
ITILA, Fig 29.5
x where Q(x) > p(x):
- ver-represented
x where Q(x) < p(x): under-represented
ΰ·‘ Ξ¦ = Οπ π π¦π π₯(π¦π ) Οπ π₯ π¦π
Q: How reliable will this estimator be? A: In practice, difficult to say. π₯ π¦π may not be a good indicator Q: How do you choose a good approximating distribution? A: Task/domain specific
Importance Sampling: Variance Estimator may vary
ITILA, Fig 29.6
true value
q(x): Gaussian q(x): Cauchy distribution iterations
Outline
Monte Carlo methods Sampling Techniques
Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling
Example: Collapsed Gibbs Sampler for Topic Models
Rejection Sampling
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
approximating distribution: π π¦ β π£π π¦ , π β π£π > π£π
π β π£π π¦ π£π π¦
ITILA, Fig 29.8
Rejection Sampling
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample uniformly: π¨π βΌ Unif(0, π β π£π π¦π )
π β π£π π¦ π£π π¦
ITILA, Fig 29.8
sample from Q: π¦1, π¦2, β¦ , π¦πβ
select tuples
approximating distribution: π π¦ β π£π π¦ , π β π£π > π£π
Rejection Sampling
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample uniformly: π¨π βΌ Unif(0, π β π£π π¦π )
π β π£π π¦ π£π π¦
ITILA, Fig 29.8
sample from Q: π¦1, π¦2, β¦ , π¦πβ
π β π£π π¦ π£π π¦ π¨
select tuples
if π¨π β€ π£π π¦π : add π¦π to sampled R points
- therwise: reject it
approximating distribution: π π¦ β π£π π¦ , π β π£π > π£π
Rejection Sampling
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample uniformly: π¨π βΌ Unif(0, π β π£π π¦π )
π β π£π π¦ π£π π¦
ITILA, Fig 29.8
sample from Q: π¦1, π¦2, β¦ , π¦πβ
π β π£π π¦ π£π π¦ π¨
select tuples
if π¨π β€ π£π π¦π : add π¦π to sampled R points
- therwise: reject it
approximating distribution: π π¦ β π£π π¦ , π β π£π > π£π
this produces samples from the p-distribution
Rejection Sampling
ΰ·‘ Ξ¦ = 1 π ΰ·
π
π π¦π
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample uniformly: π¨π βΌ Unif(0, π β π£π π¦π )
π β π£π π¦ π£π π¦
ITILA, Fig 29.8
sample from Q: π¦1, π¦2, β¦ , π¦πβ
π β π£π π¦ π£π π¦ π¨
select tuples
if π¨π β€ π£π π¦π : add π¦π to sampled R points
- therwise: reject it
approximating distribution: π π¦ β π£π π¦ , π β π£π > π£π
π β π£π π¦ π£π π¦ π¨
Rejection Sampling
ΰ·‘ Ξ¦ = 1 π ΰ·
π
π π¦π
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample uniformly: π¨π βΌ Unif(0, π β π£π π¦π )
ITILA, Fig 29.8
sample from Q: π¦1, π¦2, β¦ , π¦πβ
select tuples
if π¨π β€ π£π π¦π : add π¦π to sampled R points
- therwise: reject it
Q: How reliable will this estimator be?
approximating distribution: π π¦ β π£π π¦ , π β π£π > π£π
π β π£π π¦ π£π π¦ π¨
Rejection Sampling
ΰ·‘ Ξ¦ = 1 π ΰ·
π
π π¦π
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample uniformly: π¨π βΌ Unif(0, π β π£π π¦π )
ITILA, Fig 29.8
sample from Q: π¦1, π¦2, β¦ , π¦πβ
select tuples
if π¨π β€ π£π π¦π : add π¦π to sampled R points
- therwise: reject it
Q: How reliable will this estimator be? A: How well does Q approximate P? Q: How do you choose a good approximating distribution?
approximating distribution: π π¦ β π£π π¦ , π β π£π > π£π
π β π£π π¦ π£π π¦ π¨
Rejection Sampling
ΰ·‘ Ξ¦ = 1 π ΰ·
π
π π¦π
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample uniformly: π¨π βΌ Unif(0, π β π£π π¦π )
ITILA, Fig 29.8
sample from Q: π¦1, π¦2, β¦ , π¦πβ
select tuples
if π¨π β€ π£π π¦π : add π¦π to sampled R points
- therwise: reject it
Q: How reliable will this estimator be? A: How well does Q approximate P? Q: How do you choose a good approximating distribution? A: Task/domain specific
approximating distribution: π π¦ β π£π π¦ , π β π£π > π£π
π β π£π π¦ π£π π¦ π¨
Rejection Sampling
ΰ·‘ Ξ¦ = 1 π ΰ·
π
π π¦π
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample uniformly: π¨π βΌ Unif(0, π β π£π π¦π )
ITILA, Fig 29.8
sample from Q: π¦1, π¦2, β¦ , π¦πβ
select tuples
if π¨π β€ π£π π¦π : add π¦π to sampled R points
- therwise: reject it
Q: How reliable will this estimator be? A: How well does Q approximate P? Q: How do you choose a good approximating distribution? A: Task/domain specific
approximating distribution: π π¦ β π£π π¦ , π β π£π > π£π rejection sampling can be difficult to use in high- dimensional spaces ο
Outline
Monte Carlo methods Sampling Techniques
Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling
Example: Collapsed Gibbs Sampler for Topic Models
Markov Chain Monte Carlo
transition kernel
Metropolis-Hastings
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
importance and rejection sampling: a single proposal distribution π π¦ β π£π π¦ Metropolis-Hastings (and Gibbs): create a proposal distribution based
- n current state
π π¦|π¦(π’) β π£π π¦| π¦(π’)
Metropolis-Hastings
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
importance and rejection sampling: a single proposal distribution π π¦ β π£π π¦ Metropolis-Hastings (and Gibbs): create a proposal distribution based
- n current state
π π¦|π¦(π’) β π£π π¦|π¦(π’)
Q does not need to look similar to P
ITILA, Fig 29.10
Metropolis-Hastings
ΰ·‘ Ξ¦ = 1 π ΰ·
π
π π¦π
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample uniformly: π½π = π£π(π¦π) π£π π¦(π’) π£π π¦π π£π π¦(π’) sample from πΉ π¦|π¦(π’) : π¦1, π¦2, β¦ , π¦πβ ifπ½π β₯ 1: add π¦π to sampled R points
- therwise: accept with
probability π½π transition kernel/distribution: π π¦|π¦(π’) β π£π π¦|π¦(π’)
Metropolis-Hastings
ΰ·‘ Ξ¦ = 1 π ΰ·
π
π π¦π
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample uniformly: π½π = π£π(π¦π) π£π π¦(π’) π£π π¦π π£π π¦(π’) sample from πΉ π¦|π¦(π’) : π¦1, π¦2, β¦ , π¦πβ if π½π β₯ 1: add π¦π to sampled R points
- therwise: accept with
probability π½π transition kernel/distribution: π π¦|π¦(π’) β π£π π¦|π¦(π’)
if accepted: π¦(π’+1) = π¦π
- therwise: π¦(π’+1) = π¦(π’)
samples are not independent
Metropolis-Hastings
ΰ·‘ Ξ¦ = 1 π ΰ·
π
π π¦π
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample uniformly: π½π = π£π(π¦π) π£π π¦(π’) π£π π¦π π£π π¦(π’) sample from πΉ π¦|π¦(π’) : π¦1, π¦2, β¦ , π¦πβ if π½π β₯ 1: add π¦π to sampled R points
- therwise: accept with
probability π½π transition kernel/distribution: π π¦|π¦(π’) β π£π π¦|π¦(π’)
if accepted: π¦(π’+1) = π¦π
- therwise: π¦(π’+1) = π¦(π’)
samples are not independent Metropolis-Hastings can be used effectively in high- dimensional spaces βΊ
Outline
Monte Carlo methods Sampling Techniques
Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling
Example: Collapsed Gibbs Sampler for Topic Models
Gibbs Sampling
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
transition kernel/distribution: π π¦|π¦(π’) = π π¦ all other variables)
Next sampled value
- f current variable
Values of all other variables, both new and old
π¦ π
(π’+1) βΌ π β π¦ 1 π’+1 , β¦ , π¦ πβ1 π’+1 , π¦ π+1 π’
, β¦ , π¦ π
π’ )
x[i]
Remember: Markov Blanket
x Markov blanket of a node x is its parents, children, and children's parents
π π¦π π¦πβ π = π(π¦1, β¦ , π¦π) β« π π¦1, β¦ , π¦π ππ¦π = Οπ π(π¦π|π π¦π ) β« Οπ π π¦π π π¦π ) ππ¦π
factor out terms not dependent on xi
factorization
- f graph
= Οπ:π=π or πβπ π¦π π(π¦π|π π¦π ) β« Οπ:π=π or πβπ π¦π π π¦π π π¦π ) ππ¦π
the set of nodes needed to form the complete conditional for a variable xi
Gibbs Sampling
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
transition kernel/distribution: π π¦|π¦(π’) = π π¦ MB variables)
Next sampled value of current variable Values of just the Markov blanket variables, both old and old
π¦ 5
(π’+1) βΌ π β π¦ 2 (π’+1), π¦ 3 π’ , π¦ 4 (π’+1) , π¦ 6 (π’+1), π¦ 7 π’ , π¦ 8 (π’+1))
x5 x2 x3 x4 x6 x7 x8
Gibbs Sampling
ΰ·‘ Ξ¦ = 1 π ΰ·
π
π π¦π
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample (always accept) from πΉ π¦|π¦(π’) : π¦1, π¦2, β¦ , π¦πβ transition kernel/distribution: π π¦|π¦(π’) = π π¦ MB(π¦(π’)))
π¦(π’+1) = π¦π
Markov blanket
samples are not independent Gibbs Sampling can be used effectively in high-dimensional spaces βΊ
Collapsed Gibbs Sampling (CGS)
ΰ·‘ Ξ¦ = 1 π ΰ·
π
π π¦π
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample (always accept) from πΉ π¦|π¦(π’) : π¦1, π¦2, β¦ , π¦πβ transition kernel/distribution: π π¦|π¦(π’) = β« π π¦ MB(π¦ π’ )) ππ§ = π π¦ MBβy π¦π’ )
π¦(π’+1) = π¦π
integrate out some of Markov blanket
samples are not independent Collapsed Gibbs can be used effectively in high-dimensional spaces βΊ
Collapsed Gibbs Sampling
ΰ·‘ Ξ¦ = 1 π ΰ·
π
π π¦π
Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
sample (always accept) from πΉ π¦|π¦(π’) : π¦1, π¦2, β¦ , π¦πβ transition kernel/distribution: π π¦|π¦(π’) = β« π π¦ MB(π¦ π’ )) ππ§ = π π¦ MBβy π¦π’ )
π¦(π’+1) = π¦π
integrate out some of Markov blanket
samples are not independent Collapsed Gibbs can be used effectively in high-dimensional spaces βΊ Warning: collapsing changes the Markov blanket
Collapsed Gibbs Sampling Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
transition kernel/distribution: π π¦|π¦(π’) = π π¦ select vars in MB)
x5 x2 x3 x4 x6 x7 x8 x9
Letβs integrate out x4
Collapsed Gibbs Sampling Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
transition kernel/distribution: π π¦|π¦(π’) = π π¦ select vars in MB)
x5 x2 x3 x4 x6 x7 x8 x9
Letβs integrate out x4
Collapsed Gibbs Sampling Ξ¦ = π π¦
π = π½π¦βΌπ π π¦
Goal:
transition kernel/distribution: π π¦|π¦(π’) = π π¦ select vars in MB)
Next sampled value of current variable Values of some of the Markov blanket variables, both old and old
π¦ 5
(π’+1) βΌ π β π¦ 2 (π’+1), π¦ 3 π’ , π¦ 9 (π’+1) , π¦ 6 (π’+1), π¦ 7 π’ , π¦ 8 (π’+1))
x5 x2 x3 x4 x6 x7 x8 x9
Letβs integrate out x4
What Are the Trade-offs of CGS?
Benefits
- Collapsing variables
removes variables from the model via integration/marginalization
β Depending on which variables are marginalized out, this could be a drastic reduction β The priors/hyperparams of the collapsed variables still impact the result
- The βstepsβ are less
incremental Drawbacks
What Are the Trade-offs of CGS?
Benefits
- Collapsing variables
removes variables from the model via integration/marginalization
β Depending on which variables are marginalized out, this could be a drastic reduction β The priors/hyperparams of the collapsed variables still impact the result
- The βstepsβ are less
incremental Drawbacks
- Collapsing removes
conditional independences in the model
- Math may not be easy
- You may be restricted via
conjugacy/other statistical properties
- You still have the drawbacks
- f sampling
Outline
Monte Carlo methods Sampling Techniques
Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling
Example: Collapsed Gibbs Sampler for Topic Models
Latent Dirichlet Allocation (Blei et al., 2003)
Per- document (latent) topic usage Per-document (unigram) word counts Per-topic word usage
d
Gibbs Sampler for LDirA
for each document d: resample ΞΈd | zd,1,β¦, zd,Nd for each token i in d: resample zd,i | wd,i , {Οk }, ΞΈd for each topic k: resample Οk
Latent Dirichlet Allocation (Blei et al., 2003)
Per- document (latent) topic usage Per-document (unigram) word counts Per-topic word usage
d
integrate these out
Collapsed Gibbs Sampler for LDirA
for each document d: resample ΞΈd | zd,1,β¦, zd,Nd for each token i in d: resample zd,i | wd,i , {Οk }, {z*,-i} for each topic k: resample Οk
Collapsed Gibbs Sampler for LDirA
for each document d: resample ΞΈd | zd,1,β¦, zd,Nd for each token i in d: resample zd,i | wd,i , {Οk }, {z*,-i} for each topic k: resample Οk π π¨ππ π¨β,βπ) = π(π¨β,β) π(π¨β,βπ)
Sampling: Discrete Observations
Griffiths and Stevers (PNAS, 2004)
Sampling: Discrete Observations
Griffiths and Stevers (PNAS, 2004)
Sampling: Discrete Observations
Griffiths and Stevers (PNAS, 2004)
Sampling: Discrete Observations
Griffiths and Stevers (PNAS, 2004)
Gamma function fact: Ξ π¦ + 1 = π¦Ξ(π¦) maintain count tables
Sampling: Discrete Observations
Griffiths and Stevers (PNAS, 2004)
π π¨ππ π¨β,βπ) = π(π¨β,β) π(π¨β,βπ)
Collapsed Gibbs Sampling goal:
Sampling: Discrete Observations
Griffiths and Stevers (PNAS, 2004)
Gamma function fact: Ξ π¦ + 1 = π¦Ξ(π¦)
π π¨ππ π¨β,βπ) = Ξ(Οπ π½π) Ξ(Οπ π π, π + π½π) Οπ Ξ(π π, π + π½π) Ξ(π½π) Ξ(Οπ π½π) Ξ(Οπ π π, π β 1 + π½π) Οπ Ξ(π π, π β 1 + π½π) Ξ(π½π)
Collapsed Gibbs Sampling goal:
Sampling: Discrete Observations
Griffiths and Stevers (PNAS, 2004)
Gamma function fact: Ξ π¦ + 1 = π¦Ξ(π¦) π π¨ππ π¨β,βπ) = Οπ(π π, π β 1 + π½π)Ξ(π π, π β 1 + π½π) (Οπ π π, π β 1 + π½π) Ξ(Οπ π π, π β 1 + π½π) Οπ Ξ(π π, π β 1 + π½π) Ξ(Οπ π π, π β 1 + π½π)
Collapsed Gibbs Sampling goal:
Sampling: Discrete Observations
Griffiths and Stevers (PNAS, 2004)
Gamma function fact: Ξ π¦ + 1 = π¦Ξ(π¦) π π¨ππ π¨β,βπ) = Οπ(π π, π β 1 + π½π)Ξ(π π, π β 1 + π½π) (Οπ π π, π β 1 + π½π) Ξ(Οπ π π, π β 1 + π½π) Οπ Ξ(π π, π β 1 + π½π) Ξ(Οπ π π, π β 1 + π½π)
Collapsed Gibbs Sampling goal:
Sampling: Discrete Observations
Griffiths and Stevers (PNAS, 2004)
Gamma function fact: Ξ π¦ + 1 = π¦Ξ(π¦)
π π¨ππ = π π¨β,βπ) β π π, π β 1 + π½π
Collapsed Gibbs Sampling goal:
maintain count tables
Collapsed Gibbs Sampler for LDirA
for each document d: for each token i in d: resample zd,i | wd,i , {Οk }, {z*,-i} π π¨ππ π¨β,βπ) =β π π, π β 1 + π½π βtopic-word counts
Collapsed Gibbs Sampler for LDirA
randomly assign z*,* maintain count tables: c(d,k): document-topic counts c(k,v): topic-word counts for each document d: for each token i in d: unassign topic: zd,i resample zd,i | wd,i , {Οk }, {z*,-i} reassign topic: zd,i
π π¨ππ π¨β,βπ) =β π π, π β 1 + π½π βtopic-word counts
decrease counts increase counts