Graph Refjnement for Clustering Zhenyue Zhang Zhejiang University - - PowerPoint PPT Presentation

graph refjnement for clustering
SMART_READER_LITE
LIVE PREVIEW

Graph Refjnement for Clustering Zhenyue Zhang Zhejiang University - - PowerPoint PPT Presentation

. . . . . . . . . . . . . . . . . Graph Refjnement for Clustering Zhenyue Zhang Zhejiang University Jointed work with Limin Li, Jiayun Mao, Zheng Zhai . . . . . . . . . . . . . . . . . . . . . . . MLA 2017


slide-1
SLIDE 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Graph Refjnement for Clustering

Zhenyue Zhang

Zhejiang University

Jointed work with Limin Li, Jiayun Mao, Zheng Zhai MLA 2017 · Beijing Jiaotong University

slide-2
SLIDE 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Graphs

The roles of graphs

  • feature selection, dimensionality reduction,
  • clustering
  • smart messaging as Allo (Google)
  • lot of applications ...
slide-3
SLIDE 3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Many graph-based methods sufger from graph noise because of

  • incorrect connections or weights,
  • missing information
  • noisy data if graphs are constructed from data points
  • unsuitable measurement used for graph construction
  • confmicting information from multi-view data sets, view distortion
  • difgerent magnitude, neighborhoods, distribution, and noise process
  • difgerent view-specifjc graphs
slide-4
SLIDE 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Graph modifjcation

  • data cleaning
  • graph approximation in a special form
  • graph fusion for multi-view learning
  • graph coarsening
slide-5
SLIDE 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

We will talk about the three issues for graph modifjcation:

  • Uniform feature selection/projection for multi-view data
  • UMCD for multiple dissimilarity matrices/kernels
  • UCA for multiple similarity matrices
  • Uniform neighborhood graph from multi-view data
  • Construct a uniform sparse graph for all views
  • Modify view-specifjc graphs for multi-view learning methods
  • Graph refjnement
  • Improve methods in manifold learning (LLE, LE, LPP), subspace

learning (SSC, LRR), multi-view learning (CRCS,MKkC)

slide-6
SLIDE 6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part I: Uniform Feature Selection

slide-7
SLIDE 7

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Multi-view observations (column vectors) xv

i ∈ X v ⊂ Rdv,

i = 1, · · · , n, v = 1, · · · , m

  • Webpages: contents or hyperlinks of contents
  • Multiple-language environment: documents in multiple languages
  • Images: pixels or text captions (labels)
  • Publications: contents (key words), and citations
  • Gene representations: gene sequences, expressions in difgerent

cellular environments, or the status of its somatic mutation in difgerent tumors.

  • ....
slide-8
SLIDE 8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

View distortion

Question:

  • 1. Can we simulate view distortion in term of latent ”uniform true

features”?

  • 2. How to retrieve the features from multiple noisy graphs

approximately?

slide-9
SLIDE 9

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Given observed vectors xv

i ∈ X v ⊂ Rdv in view v, we model the view

distortion as a nonlinear mapping of the noisy features, xv

i = fv(yi, ϵv i ),

i = 1, · · · , n,

  • fv: view-specifjc distortion function
  • {yi}: uniform latent features in a low-dimensional space
  • {ϵv

i }: view-specifjc noise vectors

slide-10
SLIDE 10

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A simple form xv

i = gv(Gvyi + ϵv i ),

yi ∈ Rd, or xv

i = (ϕv ◦ gv)(Gvyi + ϵv i )

gv: a nonlinear mapping, Gv: an affjne transformation

−0.2 0.2 0.4 0.6 0.8 0.5 1 −0.2 0.2 0.4 0.6 0.8 1 0.5 1 1.5 2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.5 1 1.5 2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.5 1 1.5 2 2.5 3 3.5 4 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Figure: Left: Intact 3D samples {yj}. The right three {xv

j }

xv

j = exp(Gvyj + ϵj),

Gv = DQT

v

slide-11
SLIDE 11

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Two models

Model I: UMDS for multiple squared dissimilarity matrices {Dv} min

YYT=I,{Wv}

v

∥Av − YTWvY∥2

F/∥Av∥2 F

(1) The input matrices {Av} could be

  • Av = − 1

2HDvH for a squared dissimilarity matrix Dv in view v, where

H = I − 1

neeT

  • Av = HKvH for a kernel Kv
slide-12
SLIDE 12

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Model II: UCA for multiple similarity matrices {Sv}: max

UTU=I

v

τ −2

v

  • UTSvU
  • 2

F.

(2) where Sv is a view-specifjc similarity matrix in view v. Basic idea:

  • Sv

v S

Bv , Bv: view-deviation

  • Factorization: S

UDUT, Bv U U Bv

11

Bv

12

Bv

21

Bv

22

U U

T

  • minimize the deviation blocks Bv

12, Bv 21, and Bv 22

slide-13
SLIDE 13

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Model II: UCA for multiple similarity matrices {Sv}: max

UTU=I

v

τ −2

v

  • UTSvU
  • 2

F.

(2) where Sv is a view-specifjc similarity matrix in view v. Basic idea:

  • Sv = τv(S + Bv), Bv: view-deviation
  • Factorization: S = UDUT,

Bv = (U, U⊥) ( Bv

11

Bv

12

Bv

21

Bv

22

) (U, U⊥)T

  • minimize the deviation blocks Bv

12, Bv 21, and Bv 22

slide-14
SLIDE 14

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Equivalences

The uniform model: max

UTU=I

v

  • UTAvU
  • 2

F.

(3) KKT condition (fjrst order necessary condition): U solves the nonlinear eigenvalue problem with C U

v AvUUTAv

C U U U UTU I It can be solved by

  • Eigen-subspace iteration (small scale), or
  • Subspace extension (large scale)
slide-15
SLIDE 15

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Equivalences

The uniform model: max

UTU=I

v

  • UTAvU
  • 2

F.

(3) KKT condition (fjrst order necessary condition): U solves the nonlinear eigenvalue problem with C(U) = ∑

v AvUUTAv

C(U)U = UΛ, UTU = I. It can be solved by

  • Eigen-subspace iteration (small scale), or
  • Subspace extension (large scale)
slide-16
SLIDE 16

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Equivalences

The uniform model: max

UTU=I

v

  • UTAvU
  • 2

F.

(3) KKT condition (fjrst order necessary condition): U solves the nonlinear eigenvalue problem with C(U) = ∑

v AvUUTAv

C(U)U = UΛ, UTU = I. It can be solved by

  • Eigen-subspace iteration (small scale), or
  • Subspace extension (large scale)
slide-17
SLIDE 17

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Convergence: (a) {f(Uℓ)} is convergent. (b) Any accumulation point U∗ satisfjes C∗U∗ = U∗Λ∗ (c) If λd(C∗) > λd+1(C∗), then {Pℓ+1 − Pℓ} tends to zero. (d) Pℓ → P∗ if {Pℓ} has an isolated accumulation point P∗.

slide-18
SLIDE 18

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Synthetic data

xv

j = exp

( DQT

v yj + ϵv j

) , yj ∈ R3,

  • Each Qv has two orthonormal columns
  • D = diag(1, s) with s ∈ (0, 1): measuring singularity
  • εv

j ∼ N(0, σ), σ: noise level

slide-19
SLIDE 19

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0.2 0.4 0.6 0.8 0.9 0.92 0.94 0.96 0.98 1 d = 4, σ = 0.084211 s 0.2 0.4 0.6 0.8 0.9 0.92 0.94 0.96 0.98 1 d = 4, σ = 0.13684 s 0.2 0.4 0.6 0.8 0.9 0.92 0.94 0.96 0.98 1 d = 6, σ = 0.084211 s 0.2 0.4 0.6 0.8 0.9 0.92 0.94 0.96 0.98 1 d = 6, σ = 0.13684 s MDS UMDS 0.05 0.1 0.15 0.2 0.9 0.92 0.94 0.96 0.98 1 d = 4, s = 0 σ 0.05 0.1 0.15 0.2 0.9 0.92 0.94 0.96 0.98 1 d = 4, s = 0.8 σ 0.05 0.1 0.15 0.2 0.9 0.92 0.94 0.96 0.98 1 d = 6, s = 0 σ 0.05 0.1 0.15 0.2 0.9 0.92 0.94 0.96 0.98 1 d = 6, s = 0.8 σ MDS UMDS

slide-20
SLIDE 20

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Real-word data

  • News stories in six topics from BBC, Reuters, and Guardian
  • Reuters Multilingual data: documents over 6 categories written in English,

French, German, Spanish, and Italian

  • UCIDigit: hand written digits in Fourier coeffjcient, profjle correlation, and

average of local pixels in a 2 x 3 window

  • Webpages on Cornell, Texas, Washington, Wisconsin in content or link,

across course, project, student, faculty or stafg

  • BBCnews on business, entertainment, politics, sport, tech

BBCsports on athletics, cricket, football, rugby, or tennis

  • Cora: research papers (absence/presence or link) in 7 classes
slide-21
SLIDE 21

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table: Scale of the real-world data sets

Data set BBC-2V BBC-3V BBCsp-2V BBCsp-3V Digit1 Digit2 Digit3 Reuters1 Reuter2 Reuter3/4 Cornell Texas Washington Wisconsin # of samples 2,012 1,207 544 282 2,000 2,000 2,000 1,200 1,200 1,200/6,000 91 102 156 179 # of view1 6,838 5,470 3,183 2,582 76 76 76 13,211 13,211 13,211 1,703 1,702 1,703 1,703 features view2 6,790 5,550 3,203 2,545 216 240 216 11,665 8,395 11,665 195 187 230 256 view3 5,482 2,464 240 10,033 7,116 10,033 view4 8,395 view5 7,116 # of clusters 5 5 5 5 10 10 10 6 6 6 4 4 4 4

slide-22
SLIDE 22

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Clustering accuracy

Data set OKkC MKkC LMKkC UMDS UCA SNF LMF NMFce CRSCp CRSCc ACC NMI ACC NMI ACC NMI ACC NMI ACC NMI ACC NMI ACC NMI ACC NMI ACC NMI ACC NMI BBC-2V 0.910 0.760 0.916 0.772 0.917 0.775 0.911 0.760 0.917 0.776 0.894 0.739 0.886 0.723 0.895 0.728 0.901 0.744 0.906 0.754 BBC-3V 0.896 0.753 0.891 0.742 0.889 0.741 0.896 0.755 0.916 0.782 0.903 0.757 0.890 0.746 0.825 0.615 0.878 0.718 0.876 0.712 BBCsp-2V 0.704 0.681 0.825 0.661 0.825 0.661 0.706 0.683 0.829 0.668 0.865 0.764 0.825 0.671 0.718 0.542 0.814 0.626 0.709 0.533 BBCsp-3V 0.750 0.706 0.755 0.675 0.751 0.668 0.748 0.695 0.755 0.660 0.776 0.708 0.549 0.372 0.698 0.506 0.705 0.609 0.684 0.563 Digit1 0.873 0.785 0.725 0.711 0.695 0.677 0.895 0.819 0.916 0.852 0.792 0.754 0.867 0.781 — — 0.876 0.787 0.873 0.787 Digit2 0.886 0.812 0.876 0.802 0.887 0.812 0.910 0.841 0.919 0.851 0.795 0.757 0.885 0.812 — — 0.901 0.823 0.882 0.801 Digit3 0.892 0.806 0.727 0.713 0.697 0.672 0.912 0.840 0.922 0.863 0.817 0.786 0.895 0.817 — — 0.770 0.752 0.878 0.794 Reuters1 0.502 0.368 0.486 0.356 0.486 0.356 0.500 0.367 0.485 0.355 0.515 0.369 0.458 0.341 0.483 0.354 0.449 0.321 0.425 0.306 Reuters2 0.494 0.363 0.497 0.368 0.513 0.383 0.503 0.361 0.506 0.374 0.505 0.373 0.432 0.368 0.625 0.455 0.446 0.303 0.447 0.299 Reuters3 0.501 0.367 0.493 0.364 0.490 0.363 0.498 0.362 0.489 0.359 0.525 0.395 0.464 0.347 0.545 0.361 0.461 0.328 0.442 0.318 Reuters4 0.482 0.328 0.480 0.339 — — 0.489 0.337 0.461 0.312 0.477 0.333 0.447 0.326 — — 0.448 0.296 0.445 0.304 Cornell 0.547 0.258 0.473 0.198 0.431 0.105 0.663 0.387 0.442 0.147 0.515 0.209 0.452 0.146 0.357 0.035 0.442 0.117 0.494 0.209 Texas 0.546 0.353 0.642 0.393 0.482 0.341 0.669 0.461 0.446 0.237 0.598 0.344 0.491 0.349 0.375 0.076 0.633 0.332 0.571 0.301 Washington 0.660 0.399 0.660 0.401 0.621 0.400 0.666 0.420 0.628 0.352 0.551 0.279 0.673 0.407 0.378 0.067 0.583 0.384 0.628 0.358 Wisconsin 0.544 0.413 0.597 0.434 0.569 0.433 0.687 0.488 0.731 0.479 0.603 0.439 0.549 0.397 0.307 0.011 0.430 0.608 0.625 0.467

slide-23
SLIDE 23

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CPU time

Data set OKkC MKkC LMKkC UMDS UCA SNF LMF NMFce CRSCp CRSCc BBC-2V 140.99 2.90 106.97 1.31 0.75 32.24 62.80 203.68 16.01 1.95 BBC-3V 48.86 1.25 64.06 0.91 0.46 10.89 33.80 158.81 11.88 2.88 BBCsp-2V 11.86 0.54 7.49 1.38 0.10 1.48 6.05 82.98 1.85 0.25 BBCsp-3V 5.48 0.92 3.22 0.85 0.06 0.75 6.25 38.39 2.01 1.44 Digit1 254.25 42.11 451.53 2.35 1.86 37.84 88.98 — 20.29 25.06 Digit2 255.08 21.49 397.28 2.56 1.34 32.60 75.35 — 21.14 22.89 Digit3 276.42 47.36 848.15 2.35 2.03 49.49 88.83 — 33.78 33.08 Reuters1 57.58 3.06 49.92 1.07 0.46 10.60 32.12 164.17 12.48 2.88 Reuters2 51.21 3.15 65.62 5.53 0.46 10.75 32.43 157.43 11.99 1.33 Reuters3 57.25 3.06 232.81 1.48 0.65 15.84 42.56 172.56 24.86 1.88 Reuters4 3194.11 55.65 — 36.65 21.51 1653.26 1198.62 — 450.20 52.12 Cornell 6.14 0.28 1.73 0.35 0.04 0.35 3.79 23.05 0.56 0.74 Texas 7.49 1.05 2.08 0.44 0.04 0.44 4.20 23.54 1.53 2.22 Washington 7.20 0.97 1.45 0.36 0.03 0.48 5.25 24.82 1.60 2.88 Wisconsin 8.90 0.63 2.56 0.31 0.03 0.47 5.79 31.31 1.64 2.33

slide-24
SLIDE 24

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part II: Learning Uniform Neighborhood Graph

slide-25
SLIDE 25

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Class-inconsistence of neighbors

1 2 3 4 5 6 7 8 9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Percentage of class−inconsistent neighbors view 1 view 2 view 3

Figure: Average percentages of neighbors from difgerent digital manifolds in each k-nearest neighbor set (k = 30) on Digit3.

slide-26
SLIDE 26

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Let N v

j be a neighborhood of point j in view v.

  • union neighborhood: N U

j = N 1 j ∪ · · · ∪ N m j

  • joint neighborhood: N J

j = N 1 j ∩ · · · ∩ N m j

  • class-consistent neighborhood N C

j

  • f N U

j

Table: The 200 smallest

J j

and

U j

and

C j

  • n Digit3.

Neighborhood size k 10 20 30 40 50 Average size of

J j

0.49 1.09 1.83 Average size of

U j

24.7 49.0 73.1 97.0 120.8 Average size of

C j

21.3 34.5 43.0 50.1 57.4

slide-27
SLIDE 27

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Let N v

j be a neighborhood of point j in view v.

  • union neighborhood: N U

j = N 1 j ∪ · · · ∪ N m j

  • joint neighborhood: N J

j = N 1 j ∩ · · · ∩ N m j

  • class-consistent neighborhood N C

j

  • f N U

j

Table: The 200 smallest {N J

j } and {N U j } and {N C j } on Digit3.

Neighborhood size k 10 20 30 40 50 Average size of {N J

j }

0.49 1.09 1.83 Average size of {N U

j }

24.7 49.0 73.1 97.0 120.8 Average size of {N C

j }

21.3 34.5 43.0 50.1 57.4

slide-28
SLIDE 28

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Question: Can we determine uniform neighborhoods containing class-consistent neighbors?

slide-29
SLIDE 29

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Joint sparse neighborhoods

Let Xv

j = Xv(:, N v j ), represent xv j by its neighbors as xv j ≈ Xv j wv j :

min

Wj

λ∥Wj∥2,1 + 1 2

m

v=1

av

j

  • xv

j − Xv j wv j

  • 2

2,

s.t. eTwv

j = 1, ∀v,

(4)

  • Wj = [w1

j , · · · , wm j ], ∥Wj∥2,1 = ∑ i ∥(w1 ij, · · · , wm ij )∥2

  • λ is a trade-ofg parameter to tune the joint sparsity of Wj
  • {av

j } are normalization constants,

av

j = (1 + |N U j |)/(∥xv j ∥2 2 +

i∈N U

j

∥xv

i ∥2 2),

  • r

av

j = 1/(∥xv j ∥2 2 + µ)

  • It is solvable by ADMM method
slide-30
SLIDE 30

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The JSN graph

Connection between nodes i and j:

  • wv

ij: affjnity relation of neighbor i to the center node j in view v

  • ωij = ∥(w1

ij, · · · , wm ij )∥: the total affjnity of neighbor i to the center j

JSN graph S = (sij), sij = ∥(ωij, ωji)∥.

slide-31
SLIDE 31

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Advantage: the JSN graph has stronger class-consistent connections, measured by θj = ∑

i∈ ¯ N C

j |sij|

i |sij|

, where ¯ N C

j

is the index set of class-consistent neighbors of node j.

Data set Digit1 Digit2 Digit3 Animal COIL-20 Reuters BBCn BBCs Cora 2V 3V 2V 3V JSN* 0.847 0.896 0.891 0.505 0.969 0.525 0.788 0.732 0.740 0.677 0.623 GS** 0.517 0.659 0.576 0.401 0.842 0.510 0.705 0.679 0.703 0.669 0.527

* using the uniform setting λ = 0.5 and the neighborhood size k = 30 for all data sets. ** using the best settings of its two parameters among multiple tested values for each data set.

slide-32
SLIDE 32

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Applications

Applications

  • I. Multi-view manifold clustering
  • Spectral method on the JSN graph
  • II. Modify view-specifjc graphs to improve multi-view methods
  • Filter partial edges of each view-specifjc graph
  • Weight view-specifjc graphs
slide-33
SLIDE 33

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Multi-view manifold clustering

Norm for sij ∥ · ∥2 ∥ · ∥1 ∥ · ∥∞ Norm for ωij ∥ · ∥2 ∥ · ∥1 ∥ · ∥∞ OG ∥ · ∥2 ∥ · ∥1 ∥ · ∥∞ OG ∥ · ∥2 ∥ · ∥1 ∥ · ∥∞ OG Algorithm GS JSN GS JSN GS JSN GS JSN GS JSN GS JSN GS JSN GS JSN GS JSN NMI Digit1 0.79 0.84 0.80 0.84 0.80 0.84 0.78 0.80 0.85 0.80 0.86 0.80 0.84 0.79 0.79 0.84 0.79 0.84 0.80 0.84 0.78 Digit2 0.85 0.95 0.85 0.95 0.85 0.95 0.69 0.85 0.95 0.86 0.95 0.85 0.95 0.70 0.84 0.94 0.85 0.94 0.84 0.94 0.69 Digit3 0.86 0.94 0.89 0.94 0.87 0.94 0.78 0.87 0.94 0.89 0.94 0.87 0.94 0.79 0.86 0.94 0.88 0.94 0.86 0.94 0.78 Animal 0.31 0.33 0.33 0.34 0.31 0.33 0.19 0.31 0.33 0.32 0.33 0.31 0.33 0.19 0.31 0.33 0.33 0.33 0.31 0.33 0.18 COIL-20 0.87 0.92 0.86 0.91 0.85 0.92 0.87 0.84 0.94 0.84 0.94 0.87 0.92 0.86 0.87 0.90 0.86 0.92 0.84 0.91 0.86 Reuters 0.35 0.43 0.33 0.43 0.36 0.45 0.35 0.30 0.44 0.31 0.45 0.34 0.44 0.35 0.37 0.43 0.36 0.43 0.36 0.44 0.37 BBCn-2V 0.73 0.83 0.71 0.82 0.81 0.84 0.77 0.47 0.83 0.42 0.69 0.72 0.84 0.77 0.81 0.84 0.78 0.83 0.83 0.84 0.76 BBCn-3V 0.72 0.74 0.43 0.71 0.82 0.83 0.76 0.41 0.63 0.32 0.62 0.72 0.79 0.76 0.81 0.81 0.73 0.76 0.83 0.82 0.76 BBCs-2V 0.60 0.76 0.54 0.76 0.61 0.82 0.73 0.50 0.75 0.47 0.76 0.61 0.82 0.73 0.60 0.83 0.59 0.78 0.70 0.83 0.74 BBCs-3V 0.78 0.72 0.64 0.72 0.76 0.72 0.69 0.61 0.73 0.40 0.72 0.77 0.73 0.69 0.78 0.71 0.79 0.72 0.76 0.79 0.70 Cora 0.47 0.51 0.47 0.52 0.52 0.52 0.17 0.46 0.49 0.34 0.44 0.48 0.52 0.17 0.52 0.52 0.48 0.52 0.53 0.52 0.16

slide-34
SLIDE 34

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Graph modifjcation

Graph-fjltering: Given view-specifjc graphs {Gv} with adjacency weights {ev

ij} of edges, we modify {Gv} to {¯

Gv} with adjacency weights {¯ ev

ij}

¯ ev

ij = nijev ij =

{ ev

ij,

j ∈ N S

i or i ∈ N S j or i = j;

0,

  • therwise,

Each ¯ Gv is sparse as the learned JSN graph.

slide-35
SLIDE 35

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table: Clustering accuracy of algorithms on the original graphs or the fjltered graphs by the JSN network (k = 30).

Data set NMI ACC CentroidSC PairwiseSC MKkC UCA SNF CentroidSC PairwiseSC MKkC UCA SNF

  • rig, modi.
  • rig. modi.
  • rig. modi.
  • rig. modi.
  • rig. modi.
  • rig. modi.
  • rig. modi.
  • rig. modi.
  • rig. modi.
  • rig. modi.

Digit1 0.79 0.83 0.78 0.85 0.71 0.82 0.86 0.86 0.72 0.83 0.87 0.90 0.86 0.91 0.73 0.85 0.93 0.92 0.77 0.83 Digit2 0.81 0.90 0.78 0.91 0.80 0.89 0.85 0.91 0.75 0.87 0.89 0.95 0.88 0.96 0.88 0.95 0.91 0.96 0.78 0.84 Digit3 0.83 0.91 0.82 0.91 0.72 0.89 0.87 0.91 0.75 0.92 0.91 0.95 0.90 0.95 0.73 0.94 0.93 0.96 0.79 0.96 Animal 0.30 0.34 0.31 0.34 0.27 0.30 0.23 0.34 0.20 0.31 0.42 0.56 0.50 0.56 0.50 0.59 0.45 0.57 0.36 0.47 COIL-20 0.79 0.82 0.80 0.82 0.80 0.81 0.85 0.82 0.75 0.84 0.71 0.73 0.71 0.72 0.69 0.74 0.75 0.76 0.66 0.76 Reuters 0.35 0.40 0.35 0.41 0.36 0.39 0.36 0.39 0.38 0.39 0.48 0.53 0.48 0.54 0.49 0.59 0.49 0.53 0.52 0.52 BBCn-2V 0.77 0.81 0.77 0.81 0.77 0.78 0.78 0.80 0.74 0.76 0.91 0.93 0.92 0.93 0.92 0.92 0.92 0.93 0.90 0.91 BBCn-3V 0.73 0.79 0.74 0.80 0.74 0.78 0.75 0.79 0.75 0.76 0.88 0.92 0.89 0.93 0.89 0.91 0.89 0.92 0.90 0.90 BBCs-2V 0.58 0.80 0.53 0.80 0.66 0.76 0.66 0.81 0.82 0.81 0.74 0.92 0.65 0.92 0.83 0.90 0.83 0.93 0.93 0.93 BBCs-3V 0.63 0.70 0.66 0.71 0.68 0.72 0.68 0.72 0.68 0.69 0.72 0.78 0.75 0.78 0.76 0.77 0.76 0.77 0.76 0.77 Cora 0.19 0.48 0.19 0.49 0.17 0.44 0.17 0.47 0.33 0.53 0.38 0.68 0.37 0.69 0.36 0.61 0.37 0.65 0.48 0.71

slide-36
SLIDE 36

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Graph-weighting: Given view-specifjc graphs {Gv} with adjacency weights {ev

ij} of edges, we weight them to {˜

Gv} using the JSN graph S = (sij) ˜ ev

ij =

{ sijev

ij,

i ̸= j; ev

ij,

i = j.

slide-37
SLIDE 37

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table: Clustering accuracy of algorithms on the weighted kernels or similarities using the GS weights or JSN weights (k = 30).

Data set NMI ACC CentroidSC PairwiseSC MKkC UCA SNF MMCJSN CentroidSC PairwiseSC MKkC UCA SNF MMCJSN GSW JSW GSW JSW GSW JSW GSW JSW GSW JSW GSW JSW GSW JSW GSW JSW GSW JSW GSW JSW Digit1 0.81 0.83 0.83 0.83 0.81 0.82 0.84 0.84 0.82 0.84 0.80 0.81 0.82 0.82 0.82 0.80 0.83 0.86 0.86 0.81 0.82 0.82 Digit2 0.85 0.94 0.86 0.94 0.84 0.92 0.86 0.93 0.89 0.95 0.94 0.86 0.97 0.87 0.97 0.85 0.97 0.86 0.96 0.84 0.98 0.97 Digit3 0.88 0.93 0.89 0.93 0.86 0.92 0.91 0.94 0.89 0.95 0.93 0.94 0.96 0.94 0.96 0.88 0.96 0.96 0.98 0.84 0.97 0.96 Animal 0.32 0.35 0.34 0.35 0.32 0.34 0.32 0.34 0.31 0.33 0.33 0.41 0.45 0.43 0.47 0.58 0.58 0.49 0.62 0.42 0.50 0.52 COIL-20 0.85 0.90 0.87 0.90 0.79 0.77 0.89 0.91 0.91 0.94 0.89 0.70 0.79 0.72 0.79 0.58 0.67 0.75 0.81 0.80 0.83 0.76 Reuters 0.38 0.41 0.38 0.42 0.23 0.39 0.32 0.41 0.43 0.43 0.44 0.51 0.54 0.52 0.54 0.34 0.53 0.47 0.53 0.55 0.55 0.55 BBCn-2V 0.65 0.82 0.65 0.83 0.82 0.80 0.83 0.82 0.84 0.84 0.84 0.77 0.94 0.77 0.94 0.94 0.93 0.94 0.94 0.95 0.95 0.95 BBCn-3V 0.71 0.70 0.72 0.70 0.82 0.81 0.82 0.81 0.83 0.83 0.82 0.87 0.86 0.87 0.86 0.94 0.93 0.94 0.94 0.94 0.94 0.94 BBCs-2V 0.63 0.77 0.64 0.77 0.77 0.81 0.77 0.79 0.81 0.81 0.83 0.72 0.90 0.73 0.90 0.92 0.93 0.89 0.91 0.89 0.88 0.94 BBCs-3V 0.75 0.79 0.75 0.80 0.65 0.74 0.64 0.71 0.80 0.80 0.75 0.87 0.90 0.88 0.91 0.74 0.77 0.74 0.78 0.82 0.83 0.85 Cora 0.46 0.51 0.46 0.51 0.49 0.51 0.51 0.53 0.52 0.49 0.52 0.66 0.67 0.66 0.68 0.61 0.68 0.64 0.65 0.69 0.64 0.68

slide-38
SLIDE 38

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part III: Graph Refjnement Question: Can we refjne a given graph to highlight the latent group structure?

slide-39
SLIDE 39

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

An ideal graph G = (gij) for clustering should be

  • structurally sparse:

few of connection between difgerent groups

  • nonnegative and positive semidefjnite:

similarity measurement

  • low-rank:

small number of groups

slide-40
SLIDE 40

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Constructed graphs may be

  • approximately low-rank but generally dense (similarity-based)
  • sparse but group-number rank may be lost (neighborhood-based)
slide-41
SLIDE 41

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Existing approaches for graph modifjcation

  • SPLR, Richard/Savalle/Vayatis,2012: implicit strategies for graph

denoising

  • ℓ1-norm based minimization for sparsity
  • nuclear norm for approximately low-rank
  • DCD, Yang/Corander/Oja, 2016: explicit forms for graph

approximation

  • doubly stochastic structure: sPTP, P probability matrix
  • PTD−1P

Our approach: explicit sparsity and semi-explicit restriction on group-number rank and positive semi-defjnite

slide-42
SLIDE 42

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Existing approaches for graph modifjcation

  • SPLR, Richard/Savalle/Vayatis,2012: implicit strategies for graph

denoising

  • ℓ1-norm based minimization for sparsity
  • nuclear norm for approximately low-rank
  • DCD, Yang/Corander/Oja, 2016: explicit forms for graph

approximation

  • doubly stochastic structure: sPTP, P probability matrix
  • PTD−1P

Our approach: explicit sparsity and semi-explicit restriction on group-number rank and positive semi-defjnite

slide-43
SLIDE 43

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

SLSA: Simultaneously Low-rank and Sparse Approximation, minZ,U { f(Z; A) + θ∥Z − UUT∥2

F

} , s.t. ∥Zoff∥0 ≤ η, Z ≥ 0, UTU = IK. (5)

  • the input symmetric matrix A should be rescaled as

√ K ∥A∥F A

  • f(Z; A) is a loss function of the approximation of Z to A
  • Zoff = Z − diag(z11, · · · , znn), K: number of groups
  • η: an estimated sparsity, for example, η = 2⌊(ρn2/K − n)/2⌋
slide-44
SLIDE 44

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Example: A = S + E

  • symmetric S = diag(S1, S2, S3) with block orders 20, 15, 20
  • (Sk)ii = 1, (Sk)ij, i ̸= j, are chosen from [0.1, 1] uniformly
  • E: symmetric, non-zero entries out of the diagonal blocks only
  • three kinds of distributions of E, rescaled to have entries in [0, 1]:

(1) absolute of standard normal distribution, (2) Poisson distribution λk

k! e−λ. We choose λ = 1, and

(3) sparse uniform distribution on [0.9, 1] with 20% nonzero entries whose positions are randomly chosen.

slide-45
SLIDE 45

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure: Comparison of SLSA using difgerent loss functions fF = ∥Z − A∥2

F or

f1 = ∥Z − A∥1 on the three matrices (left): the fjrst and last iteration solutions for fF (middle) or f1 (right). noise =∥E∥F/∥S∥F

slide-46
SLIDE 46

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Algorithm

Let F(Z, U) = f(Z; A) + θ∥Z − UUT∥2

  • F. Alternatively update U and Z as

Uk = arg min

U F(Zk−1, U),

s.t. UTU = IK; (6) Zk = arg min

Z F(Z, Uk),

s.t. Z ≥ 0, ∥Zoff∥0 ≤ η. (7) Initially set Z0 = A.

  • Closed forms exist for solutions of both the subproblems.
  • Low cost:
  • For (??), subspace updating using QR decomposition, 4Kn2 + ηKn

each iteration.

  • For (??), (2K + 9 + log n)n2.
slide-47
SLIDE 47

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Convergence

Theorem 1

  • The sequence {F(Zk, Uk)} converges decreasingly.
  • If any accumulation point Z∗ of {Zk} has difgerent K-th and

(K + 1)-st largest eigenvalues, then UkUT

k − Uk+1UT k+1 → 0.

Theorem 2

An accumulation point (Z∗, U∗) of {(Zk, Uk+1)} satisfjes the KKT cond. 0 ∈ ∂f(Z) + 2θ(Z − UUT), ZU = Udiag ( λ1(Z), · · · , λK(Z) ) , if λK(Z∗) > λK+1(Z∗).

Theorem 3

If λK(Z∗) > λK+1(Z∗) for any accumulation point Z∗ of {Zk}, {UkUT

k }

has an isolated accumulation point, then {UkUT

k } and {Zk} converge.

slide-48
SLIDE 48

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Comparison with SPLR/DCD

Synthetic graph with three groups, generated as A = S + t∥S∥F ∥E∥F E, S = diag(S1, S2, S3)

  • each Si: fully connected symmetric subgraph of a group of 100

members with entries randomly chosen from (0, 1), diag(Si) = I.

  • E: symmetric and sparse, having 20% nonzero entries in ofg-diagonal

blocks whose positions are randomly chosen

  • t: the signal-noise ratio (SNR).
slide-49
SLIDE 49

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Measurements for the quality of refjnement cb(Z) = ∥ZΩc∥F

∥S∥F ,

Ωc : set of index pairs between groups cw(Z) = ∥ZΩ∥F

∥S∥F ,

Ω : set of index pairs within groups cm(Z) = ∑

(i,j)∈Ω

( zij − (∑

ℓ ziℓ)(∑ ℓ zjℓ)

ij zij

) / ∑

ij zij

cr(Z) =

σK+1(Z)

1 K (σ1(Z)+···+σK(Z))

  • The solution Z is rescaled to be αZ with α = arg minα ∥S − αZ∥F.
  • A larger generalized modularity cm(Z) means a stronger connection

within groups if cw(Z) is still large.

  • A smaller cr(Z) means a better approximation of group-number rank.
slide-50
SLIDE 50

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table: Comparison of SPLR and SLSA

SNR SPLR (λ2 = 0.01λ1) SLSA (ρ = 1) λ1 cb cw cm cr θ cb cw cm cr 0.50 0.1 0.22 0.91 0.48 0.08 1 0.11 0.97 0.65 0.06 0.3 0.02 0.92 0.65 0.00 3 0.00 0.96 0.67 0.02 0.5 0.00 0.93 0.67 0.00 5 0.00 0.95 0.67 0.01 0.7 0.00 0.92 0.67 0.00 7 0.00 0.94 0.67 0.01 1.00 0.1 0.45 0.69 0.31 0.16 1 0.32 0.77 0.50 0.09 0.3 0.22 0.87 0.45 0.01 3 0.00 0.95 0.67 0.02 0.5 0.04 0.92 0.64 0.02 5 0.00 0.94 0.67 0.01 0.7 0.00 0.13 0.42 0.57 7 0.00 0.94 0.67 0.01 1.50 0.1 0.48 0.45 0.19 0.25 1 0.37 0.64 0.42 0.12 0.3 0.40 0.70 0.25 0.05 3 0.12 0.91 0.64 0.03 0.5 0.31 0.74 0.43 0.08 5 0.03 0.94 0.66 0.01 0.7 NaN NaN NaN NaN 7 0.02 0.93 0.66 0.01

η = 2⌊(ρn2/K − n)/2⌋ in SLSA.

slide-51
SLIDE 51

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table: DCD Results

SNR cb cw cm 0.5 0.1910 0.8812 0.4405 1.0 0.3326 0.7769 0.2993 1.5 0.4158 0.6459 0.1993

slide-52
SLIDE 52

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Real-world data

Table: Comparisons of spectral clustering on real world data sets.

Graph Data AC Purity Name m n ng F.G. NbSp DCD SPLR SLSA F.G. NbSp DCD SPLR SLSA Dense Wine 12 178 3 0.815 0.837 0.674 0.815 0.871 0.815 0.837 0.674 0.815 0.871 Ecoli 7 336 8 0.548 0.562 0.524 0.610 0.741 0.804 0.848 0.765 0.801 0.807 Forest 27 523 4 0.744 0.753 0.713 0.748 0.769 0.744 0.763 0.713 0.748 0.769 Glass 9 214 6 0.556 0.528 0.542 0.561 0.598 0.650 0.650 0.654 0.659 0.645 Verterbral 6 310 3 0.748 0.726 0.532 0.752 0.758 0.752 0.774 0.726 0.758 0.761 Orl 10304 400 40 0.818 0.797 0.795 0.840 0.828 0.838 0.805 0.825 0.855 0.835 Sparse Mnist 784 1000 10 0.537 0.700 0.704 0.737 0.779 0.582 0.743 0.747 0.749 0.779 Coil20 16384 1440 20 0.611 0.829 0.830 0.840 0.842 0.626 0.878 0.878 0.886 0.878 Satimage 36 4435 6 0.589 0.644 0.641

  • 0.698

0.630 0.645 0.683

  • 0.712

Gisette 5000 7000 2 0.677 0.917 0.931

  • 0.930

0.677 0.917 0.931

  • 0.930
  • Gaussian graph A =

( exp(− ∥xi−xj∥2

σ

) )

  • AC = 1

n maxπ

i n(Ci, C∗ π(i))

  • Purity = 1

n

k maxj n(Ck, C∗ j )

  • F.G. = ’full graph’, NbSp = ’neighborhood sparse graph’
slide-53
SLIDE 53

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Applications

Applications of SLSA:

  • subspace learning
  • SSC (subspace clustering)
  • LRR (low-rank representation of subspaces)
  • nonlinear manifold dimensionality reduction
  • LLE: locally linear embedding
  • LE: Laplacian eigenmap
  • LPP: the linear version of LE
  • multi-view learning for clustering
  • CRSC: co-regularized spectral clustering
  • MKkC: multiple kernel k-means clustering
slide-54
SLIDE 54

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Subspace learning

Refjne the connection graph for SSC or the similarity S for LRR:

  • SSC learns a connection graph G
  • LRR learns a similarity matrix S
  • Subspaces are segmented via spectral clustering on G or S.

The SLSA can improve the sparsity and group-number rank of G or S.

  • sparsity: percentage of small entries:

n2

i j

aij

ij aij

  • group-number rank:

K

K 1 1 K .

slide-55
SLIDE 55

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Subspace learning

Refjne the connection graph for SSC or the similarity S for LRR:

  • SSC learns a connection graph G
  • LRR learns a similarity matrix S
  • Subspaces are segmented via spectral clustering on G or S.

The SLSA can improve the sparsity and group-number rank of G or S.

  • sparsity: percentage of small entries:

ρ(τ) = max { |Ω|/n2 : ∑

(i,j)∈Ω |aij| ≤ τ maxij |aij|

}

  • group-number rank: ζ = KσK+1/(σ1 + · · · + σK).
slide-56
SLIDE 56

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table: Average values of ρ = ρ(10−3) and ζ of SSC, LRR, and their SLSA refjnements on three representative examples with 642, 636, 635 face images of 10 individuals, respectively

SSC LRR λ 9.00 10.60 12.20 13.80 15.40 17.00 0.60 0.84 1.08 1.32 1.56 1.80 ρ Original 0.9898 0.9880 0.9857 0.9828 0.9795 0.9760 0.7142 0.7733 0.8077 0.8308 0.8477 0.8607 SLSA 0.9518 0.9513 0.9512 0.9512 0.9512 0.9512 0.9505 0.9505 0.9505 0.9506 0.9506 0.9506 ζ Original 0.9781 0.9701 0.9673 0.9671 0.9675 0.9643 0.9369 0.9420 0.9560 0.9690 0.9804 0.9881 SLSA 0.7097 0.7102 0.7104 0.7152 0.7216 0.7288 0.7565 0.7438 0.7413 0.7386 0.7344 0.7293

slide-57
SLIDE 57

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Manifold learning

Refjne the weights of LLE for low-dimansional embedding:

  • LLE
  • Estimate weights {wij}: xj ≈ ∑

i∈Nj wijxi

  • Low-dimension embedding: minYYT=Id

j ∥yj − ∑ i∈Nj wijyi∥2 2.

  • Improve the LLE embedding
  • SLSR refjnes A = |W| + |W|T to get S
  • normalize S to ˆ

S = (ˆ sij) with ˆ sij = sij/ ∑

ℓ̸=j sℓj

  • embedding: minYYT=Id

j ∥yj − ∑ i∈Nj ˆ

sijyi∥2

2.

SLSA can decrease the dispersion of each group and increase the separation of difgerent groups:

  • dispersion of each group gd Ck

1 Ck i Ck yi

ck 2

  • separation from other groups gs Ck

i k ci

ck 2

  • ck: class center ck

1 Ck i Ck yi

slide-58
SLIDE 58

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Manifold learning

Refjne the weights of LLE for low-dimansional embedding:

  • LLE
  • Estimate weights {wij}: xj ≈ ∑

i∈Nj wijxi

  • Low-dimension embedding: minYYT=Id

j ∥yj − ∑ i∈Nj wijyi∥2 2.

  • Improve the LLE embedding
  • SLSR refjnes A = |W| + |W|T to get S
  • normalize S to ˆ

S = (ˆ sij) with ˆ sij = sij/ ∑

ℓ̸=j sℓj

  • embedding: minYYT=Id

j ∥yj − ∑ i∈Nj ˆ

sijyi∥2

2.

SLSA can decrease the dispersion of each group and increase the separation of difgerent groups:

  • dispersion of each group gd(Ck) =

1 |Ck|

i∈Ck ∥yi − ck∥2

  • separation from other groups gs(Ck) = mini̸=k ∥ci − ck∥2
  • ck: class center ck =

1 |Ck|

i∈Ck yi

slide-59
SLIDE 59

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table: Group dispersion/separation and clustering accuracy of LLE embedding and modifjed one by SLSA

n.s. Proj.

∥I−ˆ S∥0 ∥I−W∥0 Digit

1 2 3 4 5 6 7 8 9 AC 10 LLE 1 gd 0.005 0.037 0.025 0.032 0.029 0.041 0.058 0.093 0.013 0.027 0.797 gs 0.019 0.105 0.114 0.111 0.114 0.104 0.110 0.111 0.019 0.103 SLSA-Modifjed 1.82 gd 0.006 0.029 0.029 0.034 0.015 0.031 0.098 0.095 0.012 0.021 0.821 gs 0.025 0.108 0.112 0.110 0.114 0.109 0.112 0.115 0.025 0.104 3.64 gd 0.007 0.021 0.018 0.016 0.013 0.029 0.099 0.095 0.013 0.024 0.828 gs 0.028 0.108 0.113 0.113 0.114 0.112 0.112 0.115 0.028 0.104 5.45 gd 0.015 0.023 0.011 0.011 0.012 0.029 0.098 0.024 0.029 0.026 0.958 gs 0.132 0.132 0.132 0.132 0.133 0.132 0.135 0.131 0.131 0.131 20 LLE 1 gd 0.015 0.041 0.023 0.032 0.066 0.064 0.103 0.085 0.019 0.032 0.701 gs 0.045 0.094 0.088 0.077 0.103 0.065 0.092 0.068 0.045 0.068 SLSA-Modifjed 0.95 gd 0.008 0.028 0.030 0.028 0.018 0.028 0.061 0.025 0.011 0.019 0.863 gs 0.039 0.103 0.106 0.105 0.110 0.101 0.102 0.108 0.039 0.102 1.43 gd 0.015 0.030 0.036 0.028 0.018 0.031 0.030 0.094 0.027 0.029 0.954 gs 0.131 0.131 0.130 0.130 0.132 0.130 0.134 0.131 0.130 0.130 2.38 gd 0.012 0.028 0.032 0.025 0.016 0.030 0.026 0.093 0.028 0.018 0.953 gs 0.132 0.132 0.132 0.131 0.133 0.131 0.134 0.132 0.132 0.132

slide-60
SLIDE 60

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Refjne the adjacency matrix W = (wij) for LE or LLE:

  • LE: minY

ij wij∥yi − yj∥2 2, s.t. YDYT = Id, YDe = 0

  • LLE: minP

ij wij∥Pxi − Pxj∥2 2, s.t. PXD(PX)T = Id

SLSA can signifjcantly improve the low-dimensional embeddings of LE/LPP for clustering via refjning the adjacency matrix.

slide-61
SLIDE 61

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table: AC of LE/LPP and their SLSA refjnement (handwritten data)

LE (k = 10): AC = 0.809 LPP (k = 30): AC = 0.830 θ \ ρ 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1.0 1.1 1.2 1.3 1.4 1.5 1.6 0.5 0.953 0.954 0.953 0.951 0.952 0.949 0.949 0.920 0.921 0.923 0.921 0.920 0.921 0.920 1.0 0.951 0.951 0.951 0.952 0.950 0.951 0.949 0.924 0.925 0.928 0.927 0.924 0.926 0.923 1.5 0.948 0.949 0.950 0.950 0.948 0.947 0.944 0.927 0.926 0.930 0.929 0.926 0.927 0.925 2.0 0.947 0.947 0.948 0.949 0.950 0.946 0.943 0.929 0.931 0.935 0.929 0.928 0.927 0.925 2.5 0.945 0.946 0.948 0.948 0.950 0.945 0.941 0.929 0.931 0.935 0.928 0.930 0.927 0.925 3.0 0.943 0.946 0.947 0.949 0.950 0.945 0.939 0.930 0.933 0.936 0.930 0.929 0.926 0.926

slide-62
SLIDE 62

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Multi-view learning

Refjne multi-view graphs {Sv} for CRSC or kernels {Kv} for MKkC:

  • CRSC: Given multiple similarity matrices {Sv},
  • min UTU=I

UT v Uv=I

v

{ tr ( UT

v LvUv

) + λv∥UvUT

v − UUT∥2 F

}

  • Lv = D−1/2

v

(Dv − Sv)D−1/2

v

, normalized Laplacian.

  • MKkC: Given multiple kernel matrices {Kv},
  • Combine the multiple kernels {Kv}: K(θ) = ∑

v θ2 vKv, ∑ θv = 1

  • Apply spectral method on K(θ): maxθ,U∈U tr(UTK(θ)U − K(θ))

SLSA can improve the graph fusion of CRSC and MKkC.

slide-63
SLIDE 63

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table: AC values of CRSC/MKkC and their SLSA refjnements (Twitter data)

Data set politics-ie

  • lympics

football CRSC 0.948 (λ = 2) 0.890 (λ = 1) 0.863 (λ = 2) SLSA θ\ρ 1.0 1.2 1.4 1.6 θ\ρ 1.8 2.0 2.2 2.4 θ\ρ 0.6 0.7 0.8 0.9 improv. 0.02 0.951 0.950 0.951 0.953 0.1 0.923 0.918 0.929 0.911 1.5 0.893 0.899 0.911 0.896 0.04 0.951 0.951 0.948 0.951 0.3 0.933 0.919 0.935 0.916 1.7 0.896 0.899 0.901 0.915 0.06 0.951 0.948 0.945 0.950 0.5 0.933 0.934 0.933 0.935 1.9 0.895 0.901 0.891 0.904 MKkC 0.778 0.851 0.834 SLSA θ\ρ 1.3 1.4 1.5 1.6 θ\ρ 0.9 1.1 1.3 1.5 θ\ρ 0.8 1.0 1.2 1.4 improv. 0.7 0.933 0.905 0.905 0.914 0.1 0.920 0.933 0.943 0.923 0.3 0.889 0.887 0.891 0.891 1.1 0.934 0.945 0.945 0.948 0.3 0.916 0.944 0.943 0.945 0.4 0.856 0.891 0.893 0.892 1.5 0.942 0.948 0.948 0.945 0.5 0.928 0.938 0.956 0.914 0.5 0.858 0.891 0.895 0.891

slide-64
SLIDE 64

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Thank You!