[PPT] - Linear Manifold Clustering Robert Haralick and Rave Harpaz Outline PowerPoint Presentation

SLIDE 1

Linear Manifold Clustering

Robert Haralick and Rave Harpaz

SLIDE 2

Outline

Background The linear manifold cluster model The Linear manifold clustering algorithm Linear manifold modeling Linear manifold subspace correlation clustering Conclusion

SLIDE 3

Background

Clustering is the process of classifying a collection patterns, into classes called clusters so that the patterns within a cluster are “similar” to one another, yet “dissimilar” to patterns in other clusters. Each clustering technique makes implicit assumptions The shape of the clusters The similarity criteria The grouping technique

SLIDE 4

Cluster Models

database 2 hyper-spherical hyper-ellipsoidal arbitrary shaped linear nonlinear

SLIDE 5

K-Means Hyper-Spherical Clusters

Choose K points at random to be cluster centers Assign each point to its closest cluster center Make the new cluster centers be the cluster means Iterate

SLIDE 6

K-Means Clusters

SLIDE 7

Subspace Clustering

Definition

Subspace clustering produces clusters which are compact on a subset of dimensions aligned with the coordinate axes and not compact on the orthogonal complement of those dimensions.

x y z x

z

full space subspace (x-z projection) Subspace clustering handles High dimensional data Irrelevant features

SLIDE 8

Pattern and Correlation Clustering

1 2 3 4 5 6 7 8

parallel coordinate view Object similarity is no longer measured by physical distance, but by the behavior patterns objects manifest or the magnitude of correlations they induce. Problem Statement: Identify groups of points that exhibit coherent behavior patterns across a subset of the measurement features.

SLIDE 9

Pattern and Correlation Clustering - Applications

1 2 3 4 5 6 7 8

Gene expression micro-array analysis - identify groups of genes that exhibit similar expression patterns under some subset of conditions, from which gene function or regulatory mechanisms may be inferred. Collaborative filtering/recommendation systems - sets of customers/users with similar interest patterns need to be identified so that future interests can be predicted and proper recommendations be made. Dimensionality reduction by correlation Finance - identify groups of stocks that show similar price fluctuations under a certain time period.

SLIDE 10

Linear Manifold Clusters

Definition

L is a linear manifold of vector space V if and only if for some subspace S of V and translation t ∈ V, L = {x ∈ V|for some s ∈ S, x = t + s}. The dimension of L is the dimension of S, and if the dimension of L is one less than the dimension of V then L is called a hyperplane.

SLIDE 11

Linear Manifold Clusters

Definition

L is a linear manifold of vector space V if and only if for some subspace S of V and translation t ∈ V, L = {x ∈ V|for some s ∈ S, x = t + s}. The dimension of L is the dimension of S, and if the dimension of L is one less than the dimension of V then L is called a hyperplane. A linear manifold is, in other words, a subspace that may have been shifted away from the origin. A subspace is a linear manifold that contains the origin.

SLIDE 12

Dense Linear Manifold Clusters

50 100 150 50 100 50 100 150 200 C1 C2 C3

SLIDE 13

The Linear Manifold Cluster Model

The linear manifold cluster model has the following properties: The points in each cluster are embedded in a lower dimensional linear manifold.

SLIDE 14

The Linear Manifold Cluster Model

The linear manifold cluster model has the following properties: The points in each cluster are embedded in a lower dimensional linear manifold. The intrinsic dimensionality of the cluster is the dimensionality of the linear manifold.

SLIDE 15

The Linear Manifold Cluster Model

The linear manifold cluster model has the following properties: The points in each cluster are embedded in a lower dimensional linear manifold. The intrinsic dimensionality of the cluster is the dimensionality of the linear manifold. The manifold is arbitrarily oriented.

SLIDE 16

The Linear Manifold Cluster Model

The linear manifold cluster model has the following properties: The points in each cluster are embedded in a lower dimensional linear manifold. The intrinsic dimensionality of the cluster is the dimensionality of the linear manifold. The manifold is arbitrarily oriented. The points in the cluster induce a correlation among two or more attributes (or linear combinations of attributes) of the data.

SLIDE 17

The Linear Manifold Cluster Model

The linear manifold cluster model has the following properties: The points in each cluster are embedded in a lower dimensional linear manifold. The intrinsic dimensionality of the cluster is the dimensionality of the linear manifold. The manifold is arbitrarily oriented. The points in the cluster induce a correlation among two or more attributes (or linear combinations of attributes) of the data. In the orthogonal complement space to the manifold the points form a compact densely populated region, which can be used to cluster the data.

SLIDE 18

The Linear Manifold Cluster Model

Comment

Classical clustering algorithms such as K-means assume that each cluster is associated with a zero dimensional manifold (the center) and therefore omit the possibility that a cluster may have non-zero dimensional linear manifold associated with it.

SLIDE 19

The Range Space of a Matrix

Suppose B is a matrix. B =     . . . . . . . . . b1 b2 · · · bN . . . . . . . . .     and x is a vector x =      x1 x2 . . . xN      Let y = Bx.

SLIDE 20

The Range Space of a Matrix

y = Bx =     . . . . . . . . . b1 b2 · · · bN . . . . . . . . .          x1 x2 . . . xN      y =

N

n=1

xnbn y is a linear combination of the columns of B.

SLIDE 21

The Linear Manifold Cluster Model

Each point x in a k-D linear manifold cluster is modeled by:

x = µ + Bφ + Bǫ

x : d × 1 random vector µ : d × 1 translation vector in Rd B : d × k matrix B : d × d − k matrix, B

′B = 0

φ : k × 1 random vector ∼ U(−R, R) ǫ : d − k × 1 random vector ∼ N(0, Σ) |Σ| is small

b1 b2 b3

µ

SLIDE 22

Linear Manifold Cluster Model

x = µ + Bφ + Bǫ E[x] = E[µ + Bφ + Bǫ] = E[µ] + E[Bφ] + E[Bǫ] = µ + BE[φ] + BE[ǫ] = µ

SLIDE 23

Orthogonal Projection

Definition

Let V be a vector space and W be any subspace of V. Represent vector v ∈ V as v = w + w⊥ where w ∈ W and w⊥ ∈ W ⊥. Then w is called the orthogonal projection of v onto W and w⊥ is the orthogonal projection of v

nto W ⊥.

Theorem

Let V be a vector space and W be any subspace of V. Let B be a matrix whose columns constitute an orthonormal basis of

W. Let v ∈ V satisfy v = w + w⊥ where w ∈ W and w⊥ ∈ W ⊥.

Then w = BB

′v

SLIDE 24

Singular Value Decomposition

Definition

The Singular Value Decomposition of a real matrix X N×K is the factoring of X as X N×K = UN×NΛN×KV

′ K×K

where UU

′

= I VV

′

= I Λ = rectangular diagonal

SLIDE 25

Thin Singular Value Decomposition

Definition

The Thin Singular Value Decomposition of a real matrix X N×K, K < N is the factoring of X as X N×K = UN×K

K

ΛK×K

K

V

′ K×K

where U

′

KUK

= IK×K VV

′

= IK×K ΛK = diagonal

SLIDE 26

Orthonormal Basis of Subspace

Theorem

Let X N×K have columns which span a K-dimensional subspace

W. Let the thin singular value decomposition of X be

X N×K = UN×K

K

ΛK×K

K

V

′ K×K

Then UKU

′

KX

= X

Proof.

UN×K

K

U

′ K×N

K

X N×K = UN×K

K

U

′ K×N

K

(UN×K

K

ΛK×K

K

V

′ K×K)

= UN×K

K

(U

′ K×N

K

UN×K

K

)ΛK×K

K

V

′ K×K

= UN×K

K

ΛK×K

K

V

′ K×K

= X

SLIDE 27

Distance To Linear Manifold

Theorem

Let a linear manifold L be represented by L = {z | z = µ + Bφ} where µ is a vector that translates the origin to the manifold and the columns of B are

rthonormal. Then the Euclidean distance of x to L is given by

ρ(x, L) = (I − BB

′)(x − µ)

Proof.

BB′ is the orthogonal projection operator to the subspace spanned by the columns of B I − BB

′ is the orthogonal projection operator to the orthogonal complement of

the subspace spanned by the columns of B (I − BB

′)(x − µ) is the projection of x to the orthogonal complement of the linear

manifold L (I − BB

′)(x − µ) is the distance of x to the manifold L

SLIDE 28

Distance To Linear Manifold

Proposition

Let B be a matrix whose columns are orthonormal. Then (I − BB

′)y

=

y2 − B

′y2

Proof.

(I − BB

′)y2

= y − BB

′y2

= (y − BB

′y) ′(y − BB ′y)

= y

′y − 2y ′BB ′y + y ′(BB ′)(BB ′)y

= y

′y − 2y ′BB ′y + y ′(B(B ′B)B ′)y

= y

′y − 2y ′BB ′y + y ′(BB ′)y

= y

′y − y ′BB ′y

= y2 − B

′y2

SLIDE 29

The Linear Manifold Clustering Algorithm

50 100 150 50 100 50 100 150 200 C1 C2 C3

Outline - stochastic model fitting technique

SLIDE 30

The Linear Manifold Clustering Algorithm

50 100 150 50 100 50 100 150 200 C1 C2 C3

Outline - stochastic model fitting technique

1

Sample trial linear manifolds of various dimensions.

SLIDE 31

The Linear Manifold Clustering Algorithm

50 100 150 50 100 50 100 150 200 C1 C2 C3 threshold

Outline - stochastic model fitting technique

1

Sample trial linear manifolds of various dimensions.

2

Compute distance histograms of the data to each trial manifold.

SLIDE 32

The Linear Manifold Clustering Algorithm

50 100 150 50 100 50 100 150 200 C1 C2 C3 threshold

Outline - stochastic model fitting technique

1

Sample trial linear manifolds of various dimensions.

2

Compute distance histograms of the data to each trial manifold.

3

Of all the manifolds sampled, select the one whose associated histogram shows the best separation between a mode near zero and the rest of the data.

SLIDE 33

The Linear Manifold Clustering Algorithm

50 100 150 50 100 50 100 150 200 C1 C2 C3 threshold

Outline - stochastic model fitting technique

1

Sample trial linear manifolds of various dimensions.

2

Compute distance histograms of the data to each trial manifold.

3

Of all the manifolds sampled, select the one whose associated histogram shows the best separation between a mode near zero and the rest of the data.

4

Partition the data based on the best separation.

SLIDE 34

The Linear Manifold Clustering Algorithm

50 100 150 50 100 50 100 150 200 C1 C2 C3 threshold

Outline - stochastic model fitting technique

1

Sample trial linear manifolds of various dimensions.

2

Compute distance histograms of the data to each trial manifold.

3

Of all the manifolds sampled, select the one whose associated histogram shows the best separation between a mode near zero and the rest of the data.

4

Partition the data based on the best separation.

5

Repeat the procedure on each block of the partitioned data.

SLIDE 35

How Are Trial Linear Manifolds Sampled?

To construct a k-D linear manifold we need to sample k + 1 points.

SLIDE 36

How Are Trial Linear Manifolds Sampled?

To construct a k-D linear manifold we need to sample k + 1 points. constructing a 2D linear manifold x = µ + Bφ

SLIDE 37

How Are Trial Linear Manifolds Sampled?

To construct a k-D linear manifold we need to sample k + 1 points. constructing a 2D linear manifold

x0 x1 x2

x = µ + Bφ

SLIDE 38

How Are Trial Linear Manifolds Sampled?

To construct a k-D linear manifold we need to sample k + 1 points. constructing a 2D linear manifold

x0 x1 x2

x = µ + Bφ µ = x0 X = (x1 − x0, x2 − x0) = UKΛKV

′

B = UK

SLIDE 39

Selecting the best trial manifold/best separation

20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120

SLIDE 40

Selecting the best trial manifold/best separation

20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120

To compute a separation score we first need to find the two classes or distributions involved.

SLIDE 41

Selecting the best trial manifold/best separation

20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120

To compute a separation score we first need to find the two classes or distributions involved. This problem is cast into histogram thresholding problem.

SLIDE 42

Selecting the best trial manifold/best separation

20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120

To compute a separation score we first need to find the two classes or distributions involved. This problem is cast into histogram thresholding problem. Left and right sides of the histogram are parametrically estimated

SLIDE 43

Bayesian Minimum Error Thresholding

−2 2 4 6 8 10 12 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 x p(x) Classification error for Mixture of Two Gaussians T

Find threshold T to minimize: P(error; T) =

x>T

p(x|c1)P(c1)dx +

x≤T

p(x|c2)P(c2)dx where P(x|c1) = 1 √ 2πσ1 e

− 1

2 ( x−µ1 σ1

)2

P(x|c2) = 1 √ 2πσ2 e

− 1

2 ( x−µ2 σ2

)2

SLIDE 44

Kittler and Illingworth Minimum Error Thresholding (1986)

−2 2 4 6 8 10 12 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 x p(x) Classification error for Mixture of Two Gaussians T

Let the range of the distance to manifold be divided up into N bins each of width ∆. For any threshold T = n∆, define

p1(T) = n

i=0 h(i∆)

p2(T) = N−1

i=n+1 h(i∆)

µ1(T) =

n

i=0 i∆h(i∆)

p1(T)

µ2(T) =

N−1

i=n+1 i∆h(i∆)

p2(T)

σ1(T) =

n

i=0(i∆−µ1(T))2h(i∆)

p1(T)

σ2(T) =

N−1

i=n+1(i∆−µ2(T))2h(i∆)

p2(T)

For any distance δ to manifold, define P(δ | 1) = P(δ | µ1, σ1) P(δ | 2) = P(δ | µ2, σ2)

SLIDE 45

Kittler and Illingworth Minimum Error

For quantized distance m∆, the probability of correct classification is Pc(m∆, T) =

P(m∆ | µ1(T), σ1(T))P1(T)

m∆ ≤ T P(m∆ | µ2(T), σ2(T))P2(T) m∆ > T =           

1 √ 2πσ1(T) e − 1 2 m∆−µ1(T) σ1(T) 2

P1(T) m∆ ≤ T

1 √ 2πσ2(T) e − 1 2 m∆−µ2(T) σ2(T) 2

P2(T) m∆ > T −2 log(Pc(m∆, T, 1)) = log(2π) + 2 log σ1(T) +

m∆ − µ1(T)

σ1(T) 2 − 2 log P1(T), m∆ ≤ T −2 log(Pc(m∆, T, 2)) = log(2π) + 2 log σ2(T) +

m∆ − µ2(T)

σ2(T) 2 − 2 log P2(T), m∆ > T

SLIDE 46

Kittler and Illingworth Minimum Error

Find the T to minimize J(T) =

n

m=0

[2 log σ1(T) +

m∆ − µ1(T)

σ1(T) 2 − 2 log P1(T)]h(m∆) +

N−1

m=n+1

[2 log σ2(T) +

m∆ − µ2(T)

σ2(T) 2 − 2 log P2(T)]h(m∆) = 2 log σ1(T)

n

m=0

h(m∆) + 1 σ2

1(T) n

m=0

(m∆ − µ1(T))2 h(m∆) − 2 log P1(T)

n

m=0

h(m∆) + 2 log σ2(T)

N−1

m=n+1

h(m∆) + 1 σ2

2(T) N−1

m=n+1

(m∆ − µ2(T))2 h(m∆) − 2 log P2(T)

N−1

m=n+1

h(m∆) = 2P1(T) log σ1(T) + P1(T) − 2P1(T) log P1(T) + 2P2(T) log σ2(T) + P2(T) − 2P2(T) log P2(T) = 1 + 2[P1(T) log σ1(T) + P2(T) log σ2(T) − P1(T) log P1(T) − P2(T) log P2(T)]

SLIDE 47

Selecting The Best Trial Manifold

−2 2 4 6 8 10 12 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 x p(x) Classification error for Mixture of Two Gaussians T

Find T to minimize: P(error; T) =

x>T

p(x|c1)P(c1)dx +

x≤T

p(x|c2)P(c2)dx J(T) = 1+2 (P1(T) log σ1(T) + P2(T) log σ2(T))−2 (P1(T) log P1(T) + P2(T) log P2(T)) Find T to maximize: discriminability(T) = (µ1(T) − µ2(T))2 σ2

1(T) + σ2 2(T) [J(T ′) − J(T)]

SLIDE 48

Selecting The Best Trial Manifold

−2 2 4 6 8 10 12 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 x p(x) Classification error for Mixture of Two Gaussians T

t J(t) depth τ τ’

Find T to minimize: P(error; T) =

x>T

p(x|c1)P(c1)dx +

x≤T

p(x|c2)P(c2)dx J(T) = 1+2 (P1(T) log σ1(T) + P2(T) log σ2(T))−2 (P1(T) log P1(T) + P2(T) log P2(T)) Find T to maximize: discriminability(T) = (µ1(T) − µ2(T))2 σ2

1(T) + σ2 2(T) [J(T ′) − J(T)]

SLIDE 49

Selecting The Best Trial Manifold

−2 2 4 6 8 10 12 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 x p(x) Classification error for Mixture of Two Gaussians T

t J(t) depth τ τ’

−5 5 10 15 20 25 30 −5 5 10 15 20 25 30 −5 5 10 15 20 25 30

SLIDE 50

Selecting The Best Trial Manifold

−2 2 4 6 8 10 12 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 x p(x) Classification error for Mixture of Two Gaussians T

t J(t) depth τ τ’

−5 5 10 15 20 25 30 −5 5 10 15 20 25 30 −5 5 10 15 20 25 30

Find T to maximize: discriminability(T) = (µ1(T) − µ2(T))2 σ2

1(T) + σ2 2(T) [J(T ′) − J(T)]

SLIDE 51

Probability All Draws From Same Cluster

Suppose C clusters each about the same size What is the probability p that in k + 1 random draws, all will be from the same cluster? p = C−k

SLIDE 52

Number of Trials

Each trial consists of k + 1 draws p: the probability that in k + 1 random draws all will be from the same cluster What is the probability that in S trials each of k + 1 draws,

ne trial will have all draws from the same cluster?

P(Success) = 1 − (1 − p)S Want P(Success) ≥ 1 − ǫ 1 − (1 − p)S ≥ 1 − ǫ (1 − p)S ≤ ǫ S log(1 − p) ≤ log ǫ S ≥ log ǫ log(1 − p)

SLIDE 53

A Run of the Algorithm

50 100 150 50 100 50 100 150 200 C1 C2 C3

SLIDE 54

A Run of the Algorithm

50 100 150 50 100 50 100 150 200 C1 C2 C3 threshold

SLIDE 55

A Run of the Algorithm

50 100 150 50 100 50 100 150 200 C1 C2 C3 threshold

SLIDE 56

A Run of the Algorithm

50 100 150 50 100 50 100 150 200 C1 C2 C3 threshold

SLIDE 57

A Run of the Algorithm

50 100 150 50 100 50 100 150 200 C1 C2 C3 threshold

SLIDE 58

A Run of the Algorithm

50 100 150 50 100 50 100 150 200 C1 C2 C3 threshold threshold

SLIDE 59

A Run of the Algorithm

50 100 150 50 100 50 100 150 200 C1 C2 C3 threshold threshold

SLIDE 60

A Run of the Algorithm

50 100 150 50 100 50 100 150 200 C1 C2 C3 threshold threshold

SLIDE 61

A Run of the Algorithm

50 100 150 50 100 50 100 150 200 C1 C2 C3 threshold threshold

SLIDE 62

Empirical Evaluation - Accuracy

size clusters dim LM dim LMCLUS ORCLUS DBSCAN HPPC accuracy time accuracy time accuracy time accuracy time D1 3000 3 4 2-3 0.95 0:0:08 0.80 0:0:22 0.34 0:0:9 0.72 0:0:51 D2 3000 3 20 13-17 0.98 0:0:33 0.59 0:2:18 0.65 0:0:36 0.97 0:1:39 D3 30000 4 30 1-4 1.00 0:15:38 0.65 1:5:30 1.00 1:31:52 0.99 0:1:32 D4 6000 3 30 4-12 0.99 0:9:22 0.98 0:8:20 0.66 0:3:49 0.97 0:0:12 D5 4000 3 100 2-3 1.00 0:0:20 0.88 0:54:30 0.65 0:5:24 0.99 0:3:54 D6 90000 3 10 1-2 0.99 0:0:29 1.00 0:29:02 0.67 4:58:49 1.00 0:1:23 D7 5000 4 10 2-6 0.99 0:2:05 0.99 0:2:41 0.74 0:0:54 0.96 0:0:35 D8 10000 5 50 1-4 0.99 0:1:42 0.63 1:33:52 1.00 0:17:00 0.99 0:3:43 D9 80000 8 30 2-7 0.99 3:12:46 0.96 13:30:30 1.00 10:51:15 0.99 0:4:57 D10 5000 5 3 1-2 0.86 0:0:48 0.68 0:0:45 0.59 0:0:5 0.78 0:0:33 ⋆D11 1500 3 3 1 0.98 0:0:01 0.99 0:0:10 0.43 0:0:02 0.33 0:0:52 ⋆D12 1500 3 3 2 0.97 0:0:02 0.99 0:0:11 0.34 0:0:02 0.33 0:0:26 ⋆D13 1500 3 7 3 0.97 0:0:05 0.99 0:0:17 0.33 0:0:04 0.33 0:0:34 ⋆D14 5000 5 20 4 0.99 0:5:46 1.00 0:10:42 0.21 0:1:39 0.20 0:1:30 ⋆D15 4000 4 50 3 0.99 0:9:14 1.00 0:25:52 0.25 0:2:34 0.25 0:3:20

Summary alg # data sets able to time accuracy≥ 0.85 cluster ⋆ rank LMCLUS 15 + 1.5 ORCLUS 10 + 10 DBSCAN 3

9

HPPC 8

1

SLIDE 63

Efficiency and Scalability

1 2 3 4 5 6 7 8 9 10 x 10

5

0.5 1 1.5 2 2.5 x 104 Number of points Time (seconds) Scalability Time vs. Number of points

LMCLUS ORCLUS DBSCAN

20 40 60 80 100 120 2000 4000 6000 8000 10000 Scalability Time vs. number of Dimensions Number of Dimensions Time (seconds)

LMCLUS ORCLUS DBSCAN

alg complexity LMCLUS O(N2K2L3d) ORCLUS O(K3 + KNd + K2d3) DBSCAN O(N2d) HPPC O(Nd)

SLIDE 64

Handwritten Digit Recognition 3823 × 64 (UCI MLR)

32 32

xi = {xi1, xi2, . . . , xi64}

alg even dig.

dd dig.

accuracy accuracy LMCLUS 0.99 0.87 ORCLUS 0.85 0.83 DBSCAN 0.82 0.58 HPPC 0.50 0.93 even: 5 clusters

dd: 7 clusters (2 for dig. 1 and 9)

LM dim: 1-2

SLIDE 65

E3D Point Cloud Segmentation (ALPHATECH Inc.)

SLIDE 66

Time Series Clustering 600 × 60 (UCI KDD Archive)

utput

normal cyclic increasing decreasing upward downward total cluster trend trend shift shift C1 57 57 C2 80 1 81 C3 43 99 142 C4 20 98 118 C5 99 99 C6 41 41 C7 23 23 C8 1 36 1 1 39 total 100 100 100 100 100 100 600

alg accuracy LMCLUS 0.89 ORCLUS 0.50 DBSCAN 0.68 HPPC 0.64

SLIDE 67

Publications

1 Linear manifold clustering in high dimensional spaces by stochastic search (with Robert Haralick). Pattern Recognition (2007), vol. 40(10), pp 2672-2684. 2 Linear Manifold Correlation Clustering (with Robert Haralick). Invited paper, International Journal of Information Technology and Intelligent Computing (2007), vol 2, no. 2 . 3 Mining Subspace Correlations (with Robert Haralick), In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2007), pp 335-342. 4 Exploiting the Geometry of Gene Expression Patterns for Unsupervised Learning (with Robert Haralick), In Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), vol. 2 pp 670-674. 5 Linear Manifold Clustering (with Robert Haralick), In Proceedings of the International Conference on Data Mining and Machine Learning (MLDM 2005), Lecture Notes in Computer Science, Springer Verlag LNAI 3587 pp 132-141. 6 Linear Manifold Embedding of Pattern Clusters (with Robert Haralick), DIMACS Workshop on Detecting and Processing Regularities in High Throughput Biological Data, 2005.