Lecture 8: Word Clustering
Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16
1 6501 Natural Language Processing
Lecture 8: Word Clustering Kai-Wei Chang CS @ University of - - PowerPoint PPT Presentation
Lecture 8: Word Clustering Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 6501 Natural Language Processing 1 This lecture v Brown Clustering 6501 Natural Language Processing 2
Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16
1 6501 Natural Language Processing
2 6501 Natural Language Processing
v π π₯#, π₯%, π₯&, β¦ , π₯( = π π₯% π₯# π π₯& π₯% β¦ π π₯( π₯(+% = Ξ -.%
/
P(w3 β£ π₯3+%)
3 6501 Natural Language Processing
π₯# is a dummy word representing βbegin of a sentenceβ
v π π₯#, βπβ, βπππβ, β¦ , βπππ’β = π βπβ π₯# π βπππβ βπβ β¦ π βπππ’β βπβ
4 6501 Natural Language Processing
Cluster 3 a the Cluster 46 dog cat fox rabbit bird boy Cluster 64 is was Cluster 64 chasing following bitingβ¦
5 6501 Natural Language Processing
Cluster 3 a the Cluster 46 dog cat fox rabbit bird boy Cluster 64 is was Cluster 8 chasing following biting⦠C3 C46 C64 C8 C3 C46
6 6501 Natural Language Processing
Cluster 3 a the Cluster 46 dog cat fox rabbit bird boy Cluster 64 is was Cluster 8 chasing following biting⦠C3 C46 C64 C8 C3 C46 a dog is chasing a cat
7 6501 Natural Language Processing
Cluster 3 a the Cluster 46 dog cat fox rabbit bird boy Cluster 64 is was Cluster 8 chasing following biting⦠C3 C46 C64 C8 C3 C46 the boy is following a rabbit
8 6501 Natural Language Processing
Cluster 3 a the Cluster 46 dog cat fox rabbit bird boy Cluster 64 is was Cluster 8 chasing following biting⦠C3 C46 C64 C8 C3 C46 a fox was chasing a bird
9 6501 Natural Language Processing
Cluster 3 a the Cluster 46 dog cat fox rabbit bird boy Cluster 64 is was Cluster 8 chasing following biting⦠C3 C46 C64 C8 C3 C46 a dog is chasing a cat
10 6501 Natural Language Processing
Cluster 3 a the Cluster 46 dog cat fox rabbit bird boy Cluster 64 is was Cluster 8 chasing following biting⦠C3 C46 C64 C8 C3 C46 a dog is chasing a cat
π π₯#, π₯%, π₯&, β¦ , π₯( = π π·(π₯%) π· π₯# π π·(π₯&) π·(π₯%) β¦ π π· π₯( π· π₯(+% π(π₯%|π· π₯% π π₯& π· π₯& β¦ π(π₯(|π· π₯( ) = Ξ -.%
/
P π· w3 π· π₯3+% π(π₯3 β£ π· π₯3 )
11 6501 Natural Language Processing
π π₯#, π₯%, π₯&, β¦ , π₯( = Ξ -.%
/
P π· w3 π· π₯3+% π(π₯3 β£ π· π₯3 )
12 6501 Natural Language Processing Cluster 3 a the Cluster 46 dog cat fox rabbit bird boy Cluster 64 is was Cluster 8 chasing following biting⦠C3 C46 C64 C8 C3 C46 a dog is chasing a cat
π π₯#, π₯%, π₯&, β¦ , π₯( = Ξ -.%
/
P π· w3 π· π₯3+% π(π₯3 β£ π· π₯3 ) v A vocabulary set π v A function π·: π β {1, 2, 3, β¦ π }
v A partition of vocabulary into k classes
v Conditional probability π(πβ² β£ π) for π, πJ β 1,β¦ , π v Conditional probability π(π₯ β£ π) for π, πJ β 1,β¦ , π ,π₯ β π
13 6501 Natural Language Processing
/
/
XβY ππ(π,π·)
[ ππ(π,π·)
14 6501 Natural Language Processing
XβY ππ(π, π·)
/
/
#(]^,]) #]
#(_,]) #]
6501 Natural Language Processing 15
[ β-.% /
a ] a(]^) + π» c ]J.% c ].%
# ],]^ β #(],]^)
d,d^
# ] β #(])
d
a ],]^ a ] a(]^) = a π πJ a ]
16 6501 Natural Language Processing
[ β-.% /
a ] a(]^) + π» c ]J.% c ].%
17 6501 Natural Language Processing
6501 Natural Language Processing 18
v Create a new cluster πil% (we have m+1 clusters) v Choose two cluster from m+1 clusters based on
6501 Natural Language Processing 19
6501 Natural Language Processing 20
6501 Natural Language Processing 21
6501 Natural Language Processing 22
6501 Natural Language Processing 23