Clustering Algorithms Dalya Baron (Tel Aviv University) XXX Winter - - PowerPoint PPT Presentation

clustering algorithms
SMART_READER_LITE
LIVE PREVIEW

Clustering Algorithms Dalya Baron (Tel Aviv University) XXX Winter - - PowerPoint PPT Presentation

Clustering Algorithms Dalya Baron (Tel Aviv University) XXX Winter School, November 2018 Clustering Feature 2 Feature 1 Clustering cluster #1 Feature 2 cluster #2 Feature 1 Clustering Why should we look for clusters? cluster #1 Feature


slide-1
SLIDE 1

Clustering Algorithms

Dalya Baron (Tel Aviv University) XXX Winter School, November 2018

slide-2
SLIDE 2

Clustering

Feature 1 Feature 2

slide-3
SLIDE 3

Clustering

Feature 1 Feature 2 cluster #1 cluster #2

slide-4
SLIDE 4

Clustering

Feature 1 Feature 2 cluster #1 cluster #2 Why should we look for clusters?

slide-5
SLIDE 5

Clustering

slide-6
SLIDE 6

K-means

Input: measured features, and the number of clusters, k. The algorithm will classify all the objects in the sample into k clusters. Feature 1 Feature 2

slide-7
SLIDE 7

K-means

(I) The algorithm places randomly k points that represent the centroids of the clusters. The algorithm performs several iterations, in each of them: (II) The algorithm associates each object with a single cluster, according to its distance from the cluster centroid. (III) The algorithm recalculates the cluster centroid according to the objects that are associated with it.

Feature 1 Feature 2

slide-8
SLIDE 8

K-means

(I) The algorithm places randomly k points that represent the centroids of the clusters. The algorithm performs several iterations, in each of them: (II) The algorithm associates each object with a single cluster, according to its distance from the cluster centroid. (III) The algorithm recalculates the cluster centroid according to the objects that are associated with it.

Feature 1 Feature 2 Two centroids are randomly placed

slide-9
SLIDE 9

K-means

(I) The algorithm places randomly k points that represent the centroids of the clusters. The algorithm performs several iterations, in each of them: (II) The algorithm associates each object with a single cluster, according to its distance from the cluster centroid. (III) The algorithm recalculates the cluster centroid according to the objects that are associated with it.

Feature 1 Feature 2 The objects are associated to the closest cluster centroid (Euclidean distance).

slide-10
SLIDE 10

K-means

(I) The algorithm places randomly k points that represent the centroids of the clusters. The algorithm performs several iterations, in each of them: (II) The algorithm associates each object with a single cluster, according to its distance from the cluster centroid. (III) The algorithm recalculates the cluster centroid according to the objects that are associated with it.

Feature 1 Feature 2 New cluster centroids are computed using the average location of the cluster members.

slide-11
SLIDE 11

K-means

(I) The algorithm places randomly k points that represent the centroids of the clusters. The algorithm performs several iterations, in each of them: (II) The algorithm associates each object with a single cluster, according to its distance from the cluster centroid. (III) The algorithm recalculates the cluster centroid according to the objects that are associated with it.

Feature 1 Feature 2 The objects are associated to the closest cluster centroid (Euclidean distance).

slide-12
SLIDE 12

K-means

(I) The algorithm places randomly k points that represent the centroids of the clusters. The algorithm performs several iterations, in each of them: (II) The algorithm associates each object with a single cluster, according to its distance from the cluster centroid. (III) The algorithm recalculates the cluster centroid according to the objects that are associated with it.

Feature 1 Feature 2 The process stops when the objects that are associated with a given class do not change.

slide-13
SLIDE 13

The anatomy of K-means

Internal choices and/or internal cost function: (I) Initial centroids are randomly selected from the set of examples. (II) The global cost function that is minimized by K-means:

Euclidean distance cluster centroids cluster members

slide-14
SLIDE 14

The anatomy of K-means

Internal choices and/or internal cost function: (I) Initial centroids are randomly selected from the set of examples. (II) The global cost function that is minimized by K-means:

Euclidean distance cluster centroids cluster members

k=3, and two different random placements of centroids

slide-15
SLIDE 15

The anatomy of K-means

Input dataset: a list of objects with measured features. For which datasets should we use K-means?

Feature 1 Feature 2 Feature 1 Feature 2

slide-16
SLIDE 16
slide-17
SLIDE 17

The anatomy of K-means

Input dataset: a list of objects with measured features. What happens when we have an outlier in the dataset?

Feature 1 Feature 2

  • utlier!
slide-18
SLIDE 18

The anatomy of K-means

Input dataset: a list of objects with measured features. What happens when we have an outlier in the dataset?

Feature 1 Feature 2

  • utlier!
slide-19
SLIDE 19

The anatomy of K-means

Input dataset: a list of objects with measured features. What happens when the features have different physical units?

input dataset K-means output

slide-20
SLIDE 20

The anatomy of K-means

Input dataset: a list of objects with measured features. What happens when the features have different physical units?

input dataset K-means output How can we avoid this?

slide-21
SLIDE 21

The anatomy of K-means

Hyper-parameters: the number of clusters, k. Can we find the optimal k using the cost function?

k=2 k=3 k=5

slide-22
SLIDE 22

The anatomy of K-means

Hyper-parameters: the number of clusters, k. Can we find the optimal k using the cost function?

k=2 k=3 k=5

Number of clusters Minimal cost function

Elbow

slide-23
SLIDE 23

Questions?

slide-24
SLIDE 24

Hierarchal Clustering

Correa-Gallego+ 2016

  • r, how to visualize complicated similarity measures
slide-25
SLIDE 25

Hierarchal Clustering

Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method. Initialization: each object is a cluster of size 1. Feature 1 Feature 2

slide-26
SLIDE 26

Hierarchal Clustering

Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method. Initialization: each object is a cluster of size 1. Feature 1 Feature 2 Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest.

slide-27
SLIDE 27

Hierarchal Clustering

Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method. Initialization: each object is a cluster of size 1. Feature 1 Feature 2 Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. distance Dendrogram

slide-28
SLIDE 28

Hierarchal Clustering

Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method. Initialization: each object is a cluster of size 1. Feature 1 Feature 2 Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. distance Dendrogram

slide-29
SLIDE 29

Hierarchal Clustering

Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method. Initialization: each object is a cluster of size 1. Feature 1 Feature 2 Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. distance Dendrogram

slide-30
SLIDE 30

Hierarchal Clustering

Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method. Initialization: each object is a cluster of size 1. Feature 1 Feature 2 Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. distance Dendrogram

slide-31
SLIDE 31

Hierarchal Clustering

Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method. Initialization: each object is a cluster of size 1. Feature 1 Feature 2 Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. distance Dendrogram

slide-32
SLIDE 32

Hierarchal Clustering

Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method. Initialization: each object is a cluster of size 1. Feature 1 Feature 2 Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. distance Dendrogram

slide-33
SLIDE 33

Hierarchal Clustering

Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method. Initialization: each object is a cluster of size 1. Feature 1 Feature 2 Next: the algorithm merges the two closest clusters into a single cluster. Then, the algorithm re-calculates the distance of the newly-formed cluster to all the rest. distance Dendrogram

slide-34
SLIDE 34

Hierarchal Clustering

Input: measured features, or a distance matrix that represents the pair-wise distances between the objects. Also, we must specify a linkage method. Initialization: each object is a cluster of size 1. Feature 1 Feature 2 The process stops when all the objects are merged into a single cluster distance Dendrogram

slide-35
SLIDE 35

The anatomy of Hierarchal Clustering

Internal choices and/or internal cost function: The linkage method is used to define a distance between two newly formed

  • clusters. Methods include: single (minimal), complete (maximal), average, etc.

Feature 1 Feature 2 distance single Dendrogram

slide-36
SLIDE 36

Feature 1 Feature 2 distance complete Dendrogram

The anatomy of Hierarchal Clustering

Internal choices and/or internal cost function: The linkage method is used to define a distance between two newly formed

  • clusters. Methods include: single (minimal), complete (maximal), average, etc.
slide-37
SLIDE 37

Feature 1 Feature 2 distance average Dendrogram

The anatomy of Hierarchal Clustering

Internal choices and/or internal cost function: The linkage method is used to define a distance between two newly formed

  • clusters. Methods include: single (minimal), complete (maximal), average, etc.
slide-38
SLIDE 38

Feature 1 Feature 2 distance Dendrogram d

The anatomy of Hierarchal Clustering

Hyper-parameters: clusters are defined beneath a threshold d. Alternatively, we can select a threshold d that corresponds to the desired number of clusters, k.

slide-39
SLIDE 39

Feature 1 Feature 2 distance Dendrogram d

The anatomy of Hierarchal Clustering

Hyper-parameters: clusters are defined beneath a threshold d. Alternatively, we can select a threshold d that corresponds to the desired number of clusters, k.

slide-40
SLIDE 40

The anatomy of Hierarchal Clustering

Hyper-parameters: clusters are defined beneath a threshold d. Alternatively, we can select a threshold d that corresponds to the desired number of clusters, k. We can use the resulting dendrogram to choose a “good” threshold:

distance

slide-41
SLIDE 41

The anatomy of Hierarchal Clustering

Hyper-parameters: clusters are defined beneath a threshold d. Alternatively, we can select a threshold d that corresponds to the desired number of clusters, k. We can use the resulting dendrogram to choose a “good” threshold:

distance

slide-42
SLIDE 42

The anatomy of Hierarchal Clustering

Input dataset: can either be a list of objects with measured properties, or a distance matrix that represents pair-wise distances between objects. What happens if we have an outlier in the dataset?

slide-43
SLIDE 43

The anatomy of Hierarchal Clustering

Input dataset: can either be a list of objects with measured properties, or a distance matrix that represents pair-wise distances between objects. What happens if the dataset does not have clear clusters?

distance

slide-44
SLIDE 44

The anatomy of Hierarchal Clustering

Input dataset: can either be a list of objects with measured properties, or a distance matrix that represents pair-wise distances between objects. Different linkage methods are helpful with different datasets.

single linkage complete linkage average linkage

slide-45
SLIDE 45

“Statistics, Data Mining, and Machine Learning in Astronomy”, by Ivezic, Connolly, Vanderplas, and Gray (2013).

Hierarchal Clustering in Astronomy

slide-46
SLIDE 46

Visualizing similarity matrices with Hierarchical Clustering

Input: 10,000 emission line spectra, covering the wavelength range 300 - 700

  • nm. There are ~90 emission lines in each spectrum, with an average SNR of 2-4.

wavelength (nm) wavelength (nm) normalized flux normalized flux

slide-47
SLIDE 47

Visualizing similarity matrices with Hierarchical Clustering

We compute a correlation matrix of all the observed wavelengths. wavelength (nm) wavelength (nm) correlation coefficient

slide-48
SLIDE 48

Visualizing similarity matrices with Hierarchical Clustering

We convert the correlation matrix to a distance matrix, and build a dendrogram

slide-49
SLIDE 49

Visualizing similarity matrices with Hierarchical Clustering

We reorder the correlation matrix (the wavelengths) according to the resulting dendrogram. reordered axis

slide-50
SLIDE 50

Visualizing similarity matrices with Hierarchical Clustering

de Souza et. al 2015

slide-51
SLIDE 51

Questions?

slide-52
SLIDE 52

Gaussian Mixture models

See: http://scikit-learn.org/stable/auto_examples/mixture/plot_gmm_covariances.html#sphx-glr-auto-examples-mixture-plot-gmm- covariances-py

slide-53
SLIDE 53

Questions?