Bayesian Inference for Parameter Estimation + Topic Modeling
1
10-418 / 10-618 Machine Learning for Structured Data
Matt Gormley Lecture 20
- Nov. 4, 2019
Machine Learning Department School of Computer Science Carnegie Mellon University
Bayesian Inference for Parameter Estimation + Topic Modeling Matt - - PowerPoint PPT Presentation
10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Inference for Parameter Estimation + Topic Modeling Matt Gormley Lecture 20 Nov. 4, 2019 1
1
Matt Gormley Lecture 20
Machine Learning Department School of Computer Science Carnegie Mellon University
3
4
Motivation: Suppose you’re given a massive corpora and asked to carry out the following tasks
Motivation: Suppose you’re given a massive corpora and asked to carry out the following tasks
Topic Modeling: A method of (usually unsupervised) discovery of latent or hidden structure in a corpus
accommodate large-scale datasets
http:// www.cs.umass.edu/~mimno/icml100.html
Dirichlet-multinomial regression (DMR) topic model on ICML (Mimno & McCallum, 2008)
https://app.nihmaps.org/
(Talley et al., 2011)
(Wang & Grimson, 2007) Manual LDA SLDA
1. Beta-Bernoulli 2. Dirichlet-Multinomial 3. Dirichlet-Multinomial Mixture Model 4. LDA
– Exact inference – EM – Monte Carlo EM – Gibbs sampler – Collapsed Gibbs sampler
– Correlated topic models – Dynamic topic models – Polylingual topic models – Supervised LDA
12
1 2 3 4 f(φ|α, β) 0.2 0.4 0.6 0.8 1 φ α = 0.1, β = 0.9 α = 0.5, β = 0.5 α = 1.0, β = 1.0 α = 5.0, β = 5.0 α = 10.0, β = 5.0
H T T H H T T H H H x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
⇤ ∼ Beta(, ⇥) [draw distribution over words] For each word n ∈ {1, . . . , N} xn ∼ Bernoulli(⇤) [draw word]
1 2 3 4 f(φ|α, β) 0.2 0.4 0.6 0.8 1 φ α = 0.1, β = 0.9 α = 0.5, β = 0.5 α = 1.0, β = 1.0 α = 5.0, β = 5.0 α = 10.0, β = 5.0
p(⌅ ⇤|α) = 1 B(α)
K
⇤
k=1
⇤αk−1
k
where B(α) = ⇥K
k=1 Γ(k)
Γ(K
k=1 k)
0.2 0.4 0.6 0.8 1 2 0.25 0.5 0.75 1
1.5 2 2.5 3 p ( ~
~ ↵ )
0.2 0.4 0.6 0.8 1 2 0.25 0.5 0.75 1
5 10 15 p(~ |~ ↵)
the he is the and the she she is is x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
φ ∼ Dir(β) [draw distribution over words] For each word n ∈ {1, . . . , N} xn ∼ Mult(1, φ) [draw word]
the he is x11 x12 x13 the and the x21 x22 x23 she she is is x31 x32 x33 x34 Document 1 Document 2 Document 3 !"#$%& '"%()*+!&
the he is x11 x12 x13 the and the x21 x22 x23 she she is is x31 x32 x33 x34 Document 1 Document 2 Document 3
For each topic k ∈ {1, . . . , K}: φk ∼ Dir(β) [draw distribution over words] θ ∼ Dir(α) [draw distribution over topics] For each document m ∈ {1, . . . , M} zm ∼ Mult(1, θ) [draw topic assignment] For each word n ∈ {1, . . . , Nm} xmn ∼ Mult(1, φzm) [draw word]
20
P (X)
times word t appeared
p(⇤|X) ∼ Dir(β + n) φ ∼ Dir(β) [draw distribution over words] For each word n ∈ {1, . . . , N} xn ∼ Mult(1, φ) [draw word]
22
!"#$%& '"%()*+!&
!"#$%& '"%()*+!&
Diagrams from Wallach, JHU 2011, slides
the he is x11 x12 x13 the and the x21 x22 x23 she she is is x31 x32 x33 x34 Document 1 Document 2 Document 3 !"#$%& '"%()*+!&
the he is x11 x12 x13 the and the x21 x22 x23 she she is is x31 x32 x33 x34 Document 1 Document 2 Document 3
26
(Blei, Ng, & Jordan, 2003)
0.000 0.006 0.000 0.006 0.000 0.006 0.012 0.000 0.006 0.000 0.006 0.000 0.006 0.012
Dirichlet(β)
27
ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6
0.000 0.006 0.000 0.006 0.000 0.006 0.012 0.000 0.006 0.000 0.006 0.000 0.006 0.012
(Blei, Ng, & Jordan, 2003)
Dirichlet(β)
28
ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6 team, season, hockey, player, penguins, ice, canadiens, puck, montreal, stanley, cup
0.000 0.006 0.000 0.006 0.000 0.006 0.012 0.000 0.006 0.000 0.006 0.000 0.006 0.012(Blei, Ng, & Jordan, 2003)
Dirichlet(β)
29
ϕ1 ϕ2 ϕ3
{hockey}
ϕ4 ϕ5 ϕ6 team, season, hockey, player, penguins, ice, canadiens, puck, montreal, stanley, cup
0.000 0.006 0.000 0.006 0.000 0.006 0.012 0.000 0.006 0.000 0.006 0.000 0.006 0.012(Blei, Ng, & Jordan, 2003)
Dirichlet(β)
30
{Canadian gov.} {government} {hockey} {U.S. gov.} {baseball} {Japan}
ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6
0.000 0.006 0.000 0.006 0.000 0.006 0.012 0.000 0.006 0.000 0.006 0.000 0.006 0.012(Blei, Ng, & Jordan, 2003)
Dirichlet(β)
31
θ1= Dirichlet(α)
{Canadian gov.} {government} {hockey} {U.S. gov.} {baseball} {Japan}
ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6
0.000 0.006 0.000 0.006 0.000 0.006 0.012 0.000 0.006 0.000 0.006 0.000 0.006 0.012(Blei, Ng, & Jordan, 2003)
Dirichlet(β)
32
The 54/40' boundary dispute is still unresolved, and Canadian and US
θ1= Dirichlet(α)
{Canadian gov.} {government} {hockey} {U.S. gov.} {baseball} {Japan}
ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6
0.000 0.006 0.000 0.006 0.000 0.006 0.012 0.000 0.006 0.000 0.006 0.000 0.006 0.012(Blei, Ng, & Jordan, 2003)
Dirichlet(β)
33
The 54/40' boundary dispute is still unresolved, and Canadian and US
θ1= Dirichlet(α)
{Canadian gov.} {government} {hockey} {U.S. gov.} {baseball} {Japan}
ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6
0.000 0.006 0.000 0.006 0.000 0.006 0.012 0.000 0.006 0.000 0.006 0.000 0.006 0.012(Blei, Ng, & Jordan, 2003)
Dirichlet(β)
34
The 54/40' boundary dispute is still unresolved, and Canadian and US Coast Guard
θ1= Dirichlet(α)
{Canadian gov.} {government} {hockey} {U.S. gov.} {baseball} {Japan}
ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6
0.000 0.006 0.000 0.006 0.000 0.006 0.012 0.000 0.006 0.000 0.006 0.000 0.006 0.012(Blei, Ng, & Jordan, 2003)
Dirichlet(β)
35
The 54/40' boundary dispute is still unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain each other's fish boats in the disputed waters off Dixon…
θ1= Dirichlet(α)
{Canadian gov.} {government} {hockey} {U.S. gov.} {baseball} {Japan}
ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6
0.000 0.006 0.000 0.006 0.000 0.006 0.012 0.000 0.006 0.000 0.006 0.000 0.006 0.012(Blei, Ng, & Jordan, 2003)
Dirichlet(β)
36
The 54/40' boundary dispute is still unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain each other's fish boats in the disputed waters off Dixon… In the year before Lemieux came, Pittsburgh finished with 38 points. Following his arrival, the Pens finished… The Orioles' pitching staff again is having a fine exhibition season. Four shutouts, low team ERA, (Well, I haven't gotten any baseball…
θ1= θ2= θ3= Dirichlet(α)
{Canadian gov.} {government} {hockey} {U.S. gov.} {baseball} {Japan}
ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6
0.000 0.006 0.000 0.006 0.000 0.006 0.012 0.000 0.006 0.000 0.006 0.000 0.006 0.012(Blei, Ng, & Jordan, 2003)
Dirichlet(β)
37
Dirichlet(β)
The 54/40' boundary dispute is still unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain each other's fish boats in the disputed waters off Dixon… In the year before Lemieux came, Pittsburgh finished with 38 points. Following his arrival, the Pens finished… The Orioles' pitching staff again is having a fine exhibition season. Four shutouts, low team ERA, (Well, I haven't gotten any baseball…
θ1= θ2= θ3= Dirichlet(α)
{Canadian gov.} {government} {hockey} {U.S. gov.} {baseball} {Japan}
ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6
0.000 0.006 0.000 0.006 0.000 0.006 0.012 0.000 0.006 0.000 0.006 0.000 0.006 0.012(Blei, Ng, & Jordan, 2003)
38
The 54/40' boundary dispute is still unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain each other's fish boats in the disputed waters off Dixon… In the year before Lemieux came, Pittsburgh finished with 38 points. Following his arrival, the Pens finished… The Orioles' pitching staff again is having a fine exhibition season. Four shutouts, low team ERA, (Well, I haven't gotten any baseball…
θ1= θ2= θ3= Dirichlet(α)
{Canadian gov.} {government} {hockey} {U.S. gov.} {baseball} {Japan}
ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6
0.000 0.006 0.000 0.006 0.000 0.006 0.012 0.000 0.006 0.000 0.006 0.000 0.006 0.012(Blei, Ng, & Jordan, 2003)
Dirichlet(β)
The 54/40' boundary dispute is still unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain each other's fish boats in the disputed waters off Dixon… In the year before Lemieux came, Pittsburgh finished with 38 points. Following his arrival, the Pens finished… The Orioles' itching staff again is having a fine exhibition season. Four shutouts, low team ERA, (Well, I haven't gotten any baseball…
39
Dirichlet( ) θ1= θ2= θ3= Dirichlet( )
ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6
(Blei, Ng, & Jordan, 2003) = = = = = =
40
M Nm K xmn zmn θm α φk β
M Nm K xmn zmn θm α φk β
Dirichlet Document-specific topic distribution Topic assignment Observed word Topic Dirichlet
1 For each document, allocate its words to as few topics as possible. 2 For each topic, assign high probability to as few terms as possible.
All of its words must have probability under that topic.
To cover a document’s words, it must assign many topics to it.
Slide from David Blei, MLSS 2012
Slide from David Blei, MLSS 2012
1. Beta-Bernoulli 2. Dirichlet-Multinomial 3. Dirichlet-Multinomial Mixture Model 4. LDA
– Exact inference – EM – Monte Carlo EM – Gibbs sampler – Collapsed Gibbs sampler
– Correlated topic models – Dynamic topic models – Polylingual topic models – Supervised LDA
47
M Nm K xmn zmn θm α φk β
Document-specific topic distribution Topic assignment Observed word Topic Optimized Observed
M Nm K xmn zmn θm α φk β
Dirichlet Document-specific topic distribution Topic assignment Observed word Topic Dirichlet Optimized Observed
θ
θ
Estimate the posterior:
M Nm K xmn zmn θm α φk β
Document-specific topic distribution Topic assignment Observed word Topic Optimized Exact Inference
M Nm K xmn zmn θm α φk β
Dirichlet Document-specific topic distribution Topic assignment Observed word Topic Dirichlet Optimized Exact Inference
M Nm K xmn zmn θm α φk β
Dirichlet Document-specific topic distribution Topic assignment Observed word Topic Dirichlet Optimized Sampled
M Nm K xmn zmn θm α φk β
Dirichlet Document-specific topic distribution Topic assignment Observed word Topic Dirichlet Exact Inference?
55
M Nm K xmn zmn θm α φk β
Dirichlet Document-specific topic distribution Topic assignment Observed word Topic Dirichlet Exact Inference? Intractable
1. “moralization” converts directed to undirected 2. “triangulation” breaks 4-cycles by adding edges 3. Cliques arranged into a junction tree
M Nm K xmn zmn θm α φk β
Dirichlet Document-specific topic distribution Topic assignment Observed word Topic Dirichlet Sampled
M Nm K xmn zmn θm α φk β
Dirichlet Document-specific topic distribution Topic assignment Observed word Topic Dirichlet Integrated out Sampled