Clustering: Models and Algorithms Shikui Tu 2019-02-28 1 Outline - - PowerPoint PPT Presentation

▶

Sep 08, 2022 460 likes •807 views

Clustering: Models and Algorithms Shikui Tu 2019-02-28 1 Outline Clustering K-mean clustering, hierarchical clustering Adaptive learning (online learning) CL, FSCL, RPCL Gaussian Mixture Models (GMM)

SLIDE 1

Clustering: Models and Algorithms

Shikui Tu 2019-02-28

SLIDE 2

Outline

Clustering

– K-mean clustering, hierarchical clustering

Adaptive learning (online learning)

– CL, FSCL, RPCL

Gaussian Mixture Models (GMM)
Expectation-Maximization (EM) for maximum

likelihood

SLIDE 3

What is clustering?

8 APRIL 2016 • VOL 352 ISSUE 6282, SCIENCE
Six malignant tumors (melanoma)

SLIDE 4

How to represent a cluster

SLIDE 5

How to define error?

! xt ||! - xt ||2

Square distance:

||! - x1 ||2 + ||! - x2 ||2 + ||! - x3 ||2

SLIDE 6

Matrix derivatives

http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/3274/pdf/imm3274.pdf

SLIDE 7

Clustering the data

We have the following data: We want to cluster the data into two clusters (red and blue)

How?

SLIDE 8

Minimize the sum of square distances J

rnk = 1 if and only if data point xn is assigned to cluster k;

therwise rnk = 0.

minimize

k = 1, 2; K = 2 clusters n = 1, …, N; N: the total number of points. rn1 = 1 rn2 = 0

µ2 µ1

We need to calculate { rnk }and { µk }.

SLIDE 9

If we know rn1 , rn2 for all n=1,…,N

µ2 µ1

Since the points have been assigned to cluster 1 or cluster 2, we calculate

µ1 = mean of the points in cluster 1

Or formally

µ2 = mean of the points in cluster 2

We call it the M Step.

SLIDE 10

If we know µ1, µ2

µ2 µ1

|| xn – µ1 ||2 < || xn – µ2 ||2 We should assign point xn to cluster 1, because Then,

rn1 = 1 rn2 = 0

Or formally We call it the E Step

SLIDE 11

Initialization

µ2 µ1

SLIDE 12

Given µ1, µ2 , calculate rn1 , rn2 for all n=1,…,N

E Step

Assign the points to the nearest cluster:

Steps

Equal distance line

SLIDE 13

Given rn1 , rn2 , calculate µ1, µ2

M Step

Calculate the means of the points in each cluster:

Steps 13

SLIDE 14

Given µ1, µ2 , calculate rn1 , rn2 for all n=1,…,N

E Step

Assign the points to the nearest cluster:

Steps 14

SLIDE 15

Given rn1 , rn2 , calculate µ1, µ2

M Step

Calculate the means of the points in each cluster:

Steps 15

SLIDE 16

Initialization E-Step M-Step E-Step M-Step M-Step E-Step E-Step Convergence If J does not change, or { µ1, µ2 } do not change, then the algorithm converges.

SLIDE 17

K

!1,…,!k

– !i – !i

SLIDE 18

Basic ingredients

Model or structure
Objective function
Algorithm
Convergence

SLIDE 19

Questions for K-mean algorithm

Does it find the global optimum of J?

– No, the nearest local optimum, depending on initialization

If Euclidean distance is not good for some data,

do we have other choices?

Can we assign each data point to the clusters

probabilistically?

If K (the total number of clusters) is unknown, can

we estimate it from the data?

SLIDE 20

Outline

Clustering

– K-mean clustering, hierarchical clustering

Adaptive learning (online learning)

– CL, FSCL, RPCL

Gaussian Mixture Models (GMM)
Expectation-Maximization (EM) for maximum

likelihood

SLIDE 21

Hierarchical Clustering

k-means clustering requires

– k – Positions of initial centers – A distance measure between points (e.g. Euclidean distance)

Hierarchical clustering requires a measure of

distance between groups of data points

Adapted from Blei, D. Hierarchial Cluster [PwerPoint slides]. www.cs.princeton.edu/courses/archive/spr08/cos424/slides/clustering-2.pdf

SLIDE 22

Hierarchical Clustering

Agglomerative clustering
A very simple procedure:

– Assign each data point into its own group – Repeat: look for the two closest groups and merge them into one group – Stop when all the data points are merged into a single cluster

Adapted from Blei, D. Hierarchial Cluster [PwerPoint slides]. www.cs.princeton.edu/courses/archive/spr08/cos424/slides/clustering-2.pdf

SLIDE 23

Distance Measure

Distance between data points a and b:

–

Group A and B

– Single-linkage – Complete-linkage – Average-linkage

d(A, B) = min

a∈A,b∈B d(a, b)

d(a, b) d(A, B) = max

a∈A,b∈B d(a, b)

d(A, B) = P

a∈A,b∈B d(a, b)

|A| · |B|

SLIDE 24

Dendrogram

Jain, A. K., Murty, M. N., Flynn, P. J. (1999) "Data Clustering: A Review". ACM Computing Surveys (CSUR), 31(3), p.264-323, 1999.

Distance

SLIDE 25

Outline

Clustering

– K-mean clustering, hierarchical clustering

Adaptive learning (online learning)

– CL, FSCL, RPCL

Gaussian Mixture Models (GMM)
Expectation-Maximization (EM) for maximum

likelihood

SLIDE 26

From batch to adaptive

Given a batch of data points
Data points come one by one:

…

x1 x2 xN

SLIDE 27

Competitive learning

Data points come one by one:

…

x1 x2 xN

SLIDE 28

When starting with “bad initializations”

SLIDE 29

A four-cluster case

SLIDE 30

frequency sensitive competitive learning (FSCL) [Ahalt et al., 1990]

The idea is to penalize the frequent winners:

SLIDE 31

FSCL is not good when there are extra centers

When k is pre-assigned to 5. the frequency sensitive mechanism also brings the extra one into data to disturb the correct locations of others

SLIDE 32

Rival penalized competitive learning (RPCL)

(Xu, Krzyzak, & Oja, 1992 , 1993) The RPCL differs from FSCL by implementing pj,t as follows:

where γ approximately takes a number between 0.05 and 0.1 for controlling the penalizing strength.

SLIDE 33

Rival penalized mechanism makes extra agents driven far away.

SLIDE 34

Clustering: Models and Algorithms

Shikui Tu 2019-02-28

Outline

– K-mean clustering, hierarchical clustering

– CL, FSCL, RPCL

likelihood

What is clustering?

How to represent a cluster

How to define error?

Square distance:

Matrix derivatives

Clustering the data

How?

Minimize the sum of square distances J

If we know rn1 , rn2 for all n=1,…,N

If we know µ1, µ2

Initialization

Given µ1, µ2 , calculate rn1 , rn2 for all n=1,…,N

Given rn1 , rn2 , calculate µ1, µ2

Given µ1, µ2 , calculate rn1 , rn2 for all n=1,…,N

Given rn1 , rn2 , calculate µ1, µ2

K

– !i – !i

Basic ingredients

Questions for K-mean algorithm

do we have other choices?

probabilistically?

we estimate it from the data?

Outline

– K-mean clustering, hierarchical clustering

– CL, FSCL, RPCL

likelihood

Hierarchical Clustering

– k – Positions of initial centers – A distance measure between points (e.g. Euclidean distance)

distance between groups of data points

Hierarchical Clustering

– Assign each data point into its own group – Repeat: look for the two closest groups and merge them into one group – Stop when all the data points are merged into a single cluster

Distance Measure

–

– Single-linkage – Complete-linkage – Average-linkage

d(A, B) = min

d(a, b) d(A, B) = max

d(A, B) = P

|A| · |B|

Dendrogram

Outline

– K-mean clustering, hierarchical clustering

– CL, FSCL, RPCL

likelihood

From batch to adaptive

…

Competitive learning

…

When starting with “bad initializations”

A four-cluster case

frequency sensitive competitive learning (FSCL) [Ahalt et al., 1990]

FSCL is not good when there are extra centers

Rival penalized competitive learning (RPCL)

Thank you!