Data Mining
Clustering Hamid Beigy
Sharif University of Technology
Fall 1396
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 41
Data Mining Clustering Hamid Beigy Sharif University of Technology - - PowerPoint PPT Presentation
Data Mining Clustering Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 41 Table of contents Introduction 1 Data matrix and dissimilarity matrix 2 Proximity
Sharif University of Technology
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 41
1
2
3
4
5
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 2 / 41
1
2
3
4
5
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 3 / 41
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 3 / 41
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 4 / 41
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 5 / 41
1
2
3
4
5
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 6 / 41
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 6 / 41
1
2
3
4
5
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 7 / 41
Object j
1 sum 1 q r q + r
Object i
s t s + t sum q + s r + t p
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 7 / 41
h
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 8 / 41
f =1 δ(f ) ij d(f ) ij
f =1 δ(f ) ij
ij
ij
ij
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 9 / 41
1
2
3
4
5
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 10 / 41
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 10 / 41
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 11 / 41
n
i=1
p∈Ci
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 12 / 41
2 3 4 10 11 12 20 25 30 (a) Initial dataset µ1 = 2 2 3 µ2 = 4 4 10 11 12 20 25 30 (b) Iteration: t = 1 µ1 = 2.5 2 3 4 µ2 = 16 10 11 12 20 25 30 (c) Iteration: t = 2 µ1 = 3 2 3 4 10 µ2 = 18 11 12 20 25 30 (d) Iteration: t = 3 µ1 = 4.75 2 3 4 10 11 12 µ2 = 19.60 20 25 30 (e) Iteration: t = 4 µ1 = 7 2 3 4 10 11 12 µ2 = 25 20 25 30 (f) Iteration: t = 5 (converged)
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 13 / 41
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 14 / 41
a ab b c d e de cde abcde Step 0 Step 1 Step 2 Step 3 Step 4 Step 4 Step 3 Step 2 Step 1 Step 0 Divisive (DIANA) Agglomerative (AGNES)
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 15 / 41
p∈Ci,q∈Cj{|p − q|}
p∈Ci,q∈Cj{|p − q|}
p∈Ci,q∈Cj
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 16 / 41
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 17 / 41
K
k=1
k πk = 1 and that
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 18 / 41
k zk = 1
K
k=1
k
K
k=1
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 19 / 41
z
K
k=1
j=1 p(zj = 1)p(x|zj = 1)
j=1 πjN(x|µj, Σj)
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 20 / 41
0.5 0.3 0.2 (a) 0.5 1 0.5 1 (b) 0.5 1 0.5 1 (a) 0.5 1 0.5 1 (b) 0.5 1 0.5 1 (c) 0.5 1 0.5 1
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 21 / 41
N
n=1
k=1
N
n=1
j=1 πjN(xn|µj, Σj)
k
N
n=1
N
n=1
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 22 / 41
N
n=1
k=1 πk = 1. This can
k=1
N
n=1
j=1 πjN(xn|µj, Σj)
k=1 πk = 1, we find λ = N. Using this to eliminate λ and rearranging we obtain
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 23 / 41
1
2
j=1 πjN(xn|µj, Σj)
3
N
n=1
N
n=1
n=1 γ(znk).
4
n=1 ln
k=1 πkN(xn|µk, Σk)
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 24 / 41
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 25 / 41
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 26 / 41
(a)
(b)
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 27 / 41
q m p s
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 28 / 41
q m p s
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 29 / 41
q m p s
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 30 / 41
1
2
3
4
5
6
7
8
9
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 31 / 41
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 32 / 41
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 33 / 41
1
2
3
4
5
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 34 / 41
1
2
3
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 34 / 41
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 35 / 41
1
2
1
2
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 36 / 41
j=1 Nij be
j
Ni
i
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 37 / 41
1
2
3
4
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 38 / 41
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 39 / 41
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 40 / 41
N
N be the be the probability that a randomly chosen object belongs to
N be the be the probability that a randomly chosen object belongs to
R
i=1 C
j=1
1 2[H(U) + H(V )].
Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 41 / 41