[PPT] - Multidimensional Scaling Applied Multivariate Statistics Spring 2012 PowerPoint Presentation

SLIDE 1

Multidimensional Scaling

Applied Multivariate Statistics – Spring 2012

SLIDE 2

Outline

Fundamental Idea
Classical Multidimensional Scaling
Non-metric Multidimensional Scaling

2

Appl. Multivariate Statistics - Spring 2012

SLIDE 3

Basic Idea

3

Appl. Multivariate Statistics - Spring 2012

How to represent in two dimensions?

SLIDE 4

Idea 1: Projection

4

Appl. Multivariate Statistics - Spring 2012

SLIDE 5

Idea 2: Squeeze on table

5

Appl. Multivariate Statistics - Spring 2012

Close points stay close

SLIDE 6

Which idea is better?

6

Appl. Multivariate Statistics - Spring 2012

SLIDE 7

Idea of MDS

Represent high-dimensional point cloud in few (usually 2)

dimensions keeping distances between points similar

Classical/Metric MDS: Use a clever projection

R: cmdscale

Non-metric MDS: Squeeze data on table

R: isoMDS

7

Appl. Multivariate Statistics - Spring 2012

SLIDE 8

Classical MDS

Problem: Given euclidean distances among points, recover

the position of the points!

Example: Road distance between 21 European cities

(almost euclidean, but not quite)

8

Appl. Multivariate Statistics - Spring 2012

…

SLIDE 9

Classical MDS

First try:

9

Appl. Multivariate Statistics - Spring 2012

SLIDE 10

Classical MDS

Flip axes:

10

Appl. Multivariate Statistics - Spring 2012

Can identify points up to

shift
rotation
reflection

SLIDE 11

Classical MDS

Another example: Airpollution in US cities
Range of manu and popul is much bigger than range of

wind

Need to standardize to give every variable equal weight

11

Appl. Multivariate Statistics - Spring 2012

SLIDE 12

Classical MDS

12

Appl. Multivariate Statistics - Spring 2012

SLIDE 13

Classical MDS: Theory

Input: Euclidean distances between n objects in p

dimensions

Output: Position of points up to rotation, reflection, shift
Two steps:
Compute inner products matrix B from distance
Compute positions from B

13

Appl. Multivariate Statistics - Spring 2012

SLIDE 14

Classical MDS: Theory – Step 1

Inner products matrix B = XXT
Connect to distance:
Center points to avoid shift invariance
Invert realtionship:

“doubly centered”

14

Appl. Multivariate Statistics - Spring 2012

d2

ij = bii + bjj ¡ 2bij

bij = ¡1

2(d2 ij ¡ d2 i: ¡ d2 :j + d2 ::)

SLIDE 15

Classical MDS: Theory – Step 2

Since B = XXT, we need the “square root” of B
B is a symmetric and positive definite n*n matrix
Thus, B can be diagonalized:

D is a diagonal matrix with on diagonal (“eigenvalues”) V contains as columns normalized eigenvectors

Some eigenvalues will be zero; drop them:
Take “square root”:

15

Appl. Multivariate Statistics - Spring 2012

B = V ¤V T ¸1 ¸ ¸2 ¸ ::: ¸ ¸n

X = V1¤

¡ 1

2

1

B = V1¤1V T

1

SLIDE 16

Classical MDS: Low-dim representation

Keep only few (e.g. 2) largest eigenvalues and

corresponding eigenvectors

The resulting X will be the low-dimensional representation

we were looking for

Goodness of fit (GOF) if we reduce to m dimensions:

(should be at least 0.8)

Finds “optimal” low-dim representation: Minimizes

16

Appl. Multivariate Statistics - Spring 2012

GOF = Pm

i=1 ¸i

Pn

i=1 ¸i

S = Pn

i=1

Pn

j=1

³ d2

ij ¡ (d(m) ij )2´

SLIDE 17

Classical MDS: Pros and Cons

+ Optimal for euclidean input data + Still optimal, if B has non-negative eigenvalues (pos. semidefinite) + Very fast

No guarantees if B has negative eigenvalues

However, in practice, it is still used then. New measures for Goodness of fit:

17

Appl. Multivariate Statistics - Spring 2012

GOF = Pm

i=1 j¸ij

Pn

i=1 j¸ij

GOF = Pm

i=1 ¸2 i

Pn

i=1 ¸2 i

GOF = Pm

i=1 max(0;¸i)

Pn

i=1 max(0;¸i)

Used in R function “cmdscale”

SLIDE 18

Non-metric MDS: Idea

Sometimes, there is no strict metric on original points
Example: How much do you like the portraits?

(1: Not at all, 10: Very much)

18

Appl. Multivariate Statistics - Spring 2012

2 6 9

OR

1 5 10 ??

SLIDE 19

Non-metric MDS: Idea

Absolute values are not

that meaningful

Ranking is important
Non-metric MDS finds a low-dimensional

representation, which respects the ranking of distances

19

Appl. Multivariate Statistics - Spring 2012

> >

SLIDE 20

Non-metric MDS: Theory

is the true dissimilarity, dij is the distance of representation
Minimize STRESS ( is an increasing function):
Optimize over both position of points and µ
is called “disparity”
Solved numerically (isotonic regression);

Classical MDS as starting value; very time consuming

20

Appl. Multivariate Statistics - Spring 2012

S = P

i<j(µ(±ij)¡dij)2

P

i<j d2 ij

±ij µ ^ dij = µ(±ij)

SLIDE 21

Non-metric MDS: Example for intuition (only)

21

Appl. Multivariate Statistics - Spring 2012

True points in high dimensional space 3 2 5 B A C dAB < dBC < dAC

STRESS = 19.7

Compute best representation

SLIDE 22

Non-metric MDS: Example for intuition (only)

22

Appl. Multivariate Statistics - Spring 2012

True points in high dimensional space 2.7 2 4.8 B A C dAB < dBC < dAC

STRESS = 20.1

Compute best representation

SLIDE 23

Non-metric MDS: Example for intuition (only)

23

Appl. Multivariate Statistics - Spring 2012

True points in high dimensional space 2.9 2 5.2 B A C dAB < dBC < dAC

STRESS = 18.9

Stop if minimal STRESS is found. We will finally represent the distances dAB = 2, dBC = 2.9, dAC = 5.2 Compute best representation

SLIDE 24

Non-metric MDS: Pros and Cons

+ Fulfills a clear objective without many assumptions (minimize STRESS) + Results don’t change with rescaling or monotonic variable transformation + Works even if you only have rank information

Slow in large problems
Usually only local (not global) optimum found
Only gets ranks of distances right

24

Appl. Multivariate Statistics - Spring 2012

SLIDE 25

Non-metric MDS: Example

Do people in the same party vote alike?
Agreement of 15 congressman in 19 votes

25

Appl. Multivariate Statistics - Spring 2012

…

SLIDE 26

Non-metric MDS: Example

26

Appl. Multivariate Statistics - Spring 2012

SLIDE 27

Concepts to know

Classical MDS:
Finds low-dim projection that respects distances
Optimal for euclidean distances
No clear guarantees for other distances
fast
Non-metric MDS:
Squeezes data points on table
respects only rankings of distances
(locally) solves clear objective
slow

27

Appl. Multivariate Statistics - Spring 2012

SLIDE 28

R commands to know

cmdscale included in standard R distribution
isoMDS from package “MASS”

28

Appl. Multivariate Statistics - Spring 2012