[PPT] - Graph Refjnement for Clustering Zhenyue Zhang Zhejiang University PowerPoint Presentation

SLIDE 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Graph Refjnement for Clustering

Zhenyue Zhang

Zhejiang University

Jointed work with Limin Li, Jiayun Mao, Zheng Zhai MLA 2017 · Beijing Jiaotong University

SLIDE 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Graphs

The roles of graphs

feature selection, dimensionality reduction,
clustering
smart messaging as Allo (Google)
lot of applications ...

SLIDE 3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Many graph-based methods sufger from graph noise because of

incorrect connections or weights,
missing information
noisy data if graphs are constructed from data points
unsuitable measurement used for graph construction
confmicting information from multi-view data sets, view distortion
difgerent magnitude, neighborhoods, distribution, and noise process
difgerent view-specifjc graphs

SLIDE 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Graph modifjcation

data cleaning
graph approximation in a special form
graph fusion for multi-view learning
graph coarsening

SLIDE 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

We will talk about the three issues for graph modifjcation:

Uniform feature selection/projection for multi-view data
UMCD for multiple dissimilarity matrices/kernels
UCA for multiple similarity matrices
Uniform neighborhood graph from multi-view data
Construct a uniform sparse graph for all views
Modify view-specifjc graphs for multi-view learning methods
Graph refjnement
Improve methods in manifold learning (LLE, LE, LPP), subspace

learning (SSC, LRR), multi-view learning (CRCS,MKkC)

SLIDE 6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part I: Uniform Feature Selection

SLIDE 7

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Multi-view observations (column vectors) xv

i ∈ X v ⊂ Rdv,

i = 1, · · · , n, v = 1, · · · , m

Webpages: contents or hyperlinks of contents
Multiple-language environment: documents in multiple languages
Images: pixels or text captions (labels)
Publications: contents (key words), and citations
Gene representations: gene sequences, expressions in difgerent

cellular environments, or the status of its somatic mutation in difgerent tumors.

....

SLIDE 8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

View distortion

Question:

1. Can we simulate view distortion in term of latent ”uniform true

features”?

2. How to retrieve the features from multiple noisy graphs

approximately?

SLIDE 9

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Given observed vectors xv

i ∈ X v ⊂ Rdv in view v, we model the view

distortion as a nonlinear mapping of the noisy features, xv

i = fv(yi, ϵv i ),

i = 1, · · · , n,

fv: view-specifjc distortion function
{yi}: uniform latent features in a low-dimensional space
{ϵv

i }: view-specifjc noise vectors

SLIDE 10

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A simple form xv

i = gv(Gvyi + ϵv i ),

yi ∈ Rd, or xv

i = (ϕv ◦ gv)(Gvyi + ϵv i )

gv: a nonlinear mapping, Gv: an affjne transformation

−0.2 0.2 0.4 0.6 0.8 0.5 1 −0.2 0.2 0.4 0.6 0.8 1 0.5 1 1.5 2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.5 1 1.5 2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.5 1 1.5 2 2.5 3 3.5 4 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Figure: Left: Intact 3D samples {yj}. The right three {xv

j }

xv

j = exp(Gvyj + ϵj),

Gv = DQT

v

SLIDE 11

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Two models

Model I: UMDS for multiple squared dissimilarity matrices {Dv} min

YYT=I,{Wv}

∑

v

∥Av − YTWvY∥2

F/∥Av∥2 F

(1) The input matrices {Av} could be

Av = − 1

2HDvH for a squared dissimilarity matrix Dv in view v, where

H = I − 1

neeT

Av = HKvH for a kernel Kv

SLIDE 12

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Model II: UCA for multiple similarity matrices {Sv}: max

UTU=I

∑

v

τ −2

v

UTSvU
2

F.

(2) where Sv is a view-specifjc similarity matrix in view v. Basic idea:

Sv

v S

Bv , Bv: view-deviation

Factorization: S

UDUT, Bv U U Bv

11

Bv

12

Bv

21

Bv

22

U U

T

minimize the deviation blocks Bv

12, Bv 21, and Bv 22

SLIDE 13

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Model II: UCA for multiple similarity matrices {Sv}: max

UTU=I

∑

v

τ −2

v

UTSvU
2

F.

(2) where Sv is a view-specifjc similarity matrix in view v. Basic idea:

Sv = τv(S + Bv), Bv: view-deviation
Factorization: S = UDUT,

Bv = (U, U⊥) ( Bv

11

Bv

12

Bv

21

Bv

22

) (U, U⊥)T

minimize the deviation blocks Bv

12, Bv 21, and Bv 22

SLIDE 14

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Equivalences

The uniform model: max

UTU=I

∑

v

UTAvU
2

F.

(3) KKT condition (fjrst order necessary condition): U solves the nonlinear eigenvalue problem with C U

v AvUUTAv

C U U U UTU I It can be solved by

Eigen-subspace iteration (small scale), or
Subspace extension (large scale)

SLIDE 15

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Equivalences

The uniform model: max

UTU=I

∑

v

UTAvU
2

F.

(3) KKT condition (fjrst order necessary condition): U solves the nonlinear eigenvalue problem with C(U) = ∑

v AvUUTAv

C(U)U = UΛ, UTU = I. It can be solved by

Eigen-subspace iteration (small scale), or
Subspace extension (large scale)

SLIDE 16

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Equivalences

The uniform model: max

UTU=I

∑

v

UTAvU
2

F.

(3) KKT condition (fjrst order necessary condition): U solves the nonlinear eigenvalue problem with C(U) = ∑

v AvUUTAv

C(U)U = UΛ, UTU = I. It can be solved by

Eigen-subspace iteration (small scale), or
Subspace extension (large scale)

SLIDE 17

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Convergence: (a) {f(Uℓ)} is convergent. (b) Any accumulation point U∗ satisfjes C∗U∗ = U∗Λ∗ (c) If λd(C∗) > λd+1(C∗), then {Pℓ+1 − Pℓ} tends to zero. (d) Pℓ → P∗ if {Pℓ} has an isolated accumulation point P∗.

SLIDE 18

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Synthetic data

xv

j = exp

( DQT

v yj + ϵv j

) , yj ∈ R3,

Each Qv has two orthonormal columns
D = diag(1, s) with s ∈ (0, 1): measuring singularity
εv

j ∼ N(0, σ), σ: noise level

SLIDE 19

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

0.2 0.4 0.6 0.8 0.9 0.92 0.94 0.96 0.98 1 d = 4, σ = 0.084211 s 0.2 0.4 0.6 0.8 0.9 0.92 0.94 0.96 0.98 1 d = 4, σ = 0.13684 s 0.2 0.4 0.6 0.8 0.9 0.92 0.94 0.96 0.98 1 d = 6, σ = 0.084211 s 0.2 0.4 0.6 0.8 0.9 0.92 0.94 0.96 0.98 1 d = 6, σ = 0.13684 s MDS UMDS 0.05 0.1 0.15 0.2 0.9 0.92 0.94 0.96 0.98 1 d = 4, s = 0 σ 0.05 0.1 0.15 0.2 0.9 0.92 0.94 0.96 0.98 1 d = 4, s = 0.8 σ 0.05 0.1 0.15 0.2 0.9 0.92 0.94 0.96 0.98 1 d = 6, s = 0 σ 0.05 0.1 0.15 0.2 0.9 0.92 0.94 0.96 0.98 1 d = 6, s = 0.8 σ MDS UMDS

SLIDE 20

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Real-word data

News stories in six topics from BBC, Reuters, and Guardian
Reuters Multilingual data: documents over 6 categories written in English,

French, German, Spanish, and Italian

UCIDigit: hand written digits in Fourier coeffjcient, profjle correlation, and

average of local pixels in a 2 x 3 window

Webpages on Cornell, Texas, Washington, Wisconsin in content or link,

across course, project, student, faculty or stafg

BBCnews on business, entertainment, politics, sport, tech

BBCsports on athletics, cricket, football, rugby, or tennis

Cora: research papers (absence/presence or link) in 7 classes

SLIDE 21

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table: Scale of the real-world data sets

Data set BBC-2V BBC-3V BBCsp-2V BBCsp-3V Digit1 Digit2 Digit3 Reuters1 Reuter2 Reuter3/4 Cornell Texas Washington Wisconsin # of samples 2,012 1,207 544 282 2,000 2,000 2,000 1,200 1,200 1,200/6,000 91 102 156 179 # of view1 6,838 5,470 3,183 2,582 76 76 76 13,211 13,211 13,211 1,703 1,702 1,703 1,703 features view2 6,790 5,550 3,203 2,545 216 240 216 11,665 8,395 11,665 195 187 230 256 view3 5,482 2,464 240 10,033 7,116 10,033 view4 8,395 view5 7,116 # of clusters 5 5 5 5 10 10 10 6 6 6 4 4 4 4

SLIDE 22

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Clustering accuracy

Data set OKkC MKkC LMKkC UMDS UCA SNF LMF NMFce CRSCp CRSCc ACC NMI ACC NMI ACC NMI ACC NMI ACC NMI ACC NMI ACC NMI ACC NMI ACC NMI ACC NMI BBC-2V 0.910 0.760 0.916 0.772 0.917 0.775 0.911 0.760 0.917 0.776 0.894 0.739 0.886 0.723 0.895 0.728 0.901 0.744 0.906 0.754 BBC-3V 0.896 0.753 0.891 0.742 0.889 0.741 0.896 0.755 0.916 0.782 0.903 0.757 0.890 0.746 0.825 0.615 0.878 0.718 0.876 0.712 BBCsp-2V 0.704 0.681 0.825 0.661 0.825 0.661 0.706 0.683 0.829 0.668 0.865 0.764 0.825 0.671 0.718 0.542 0.814 0.626 0.709 0.533 BBCsp-3V 0.750 0.706 0.755 0.675 0.751 0.668 0.748 0.695 0.755 0.660 0.776 0.708 0.549 0.372 0.698 0.506 0.705 0.609 0.684 0.563 Digit1 0.873 0.785 0.725 0.711 0.695 0.677 0.895 0.819 0.916 0.852 0.792 0.754 0.867 0.781 — — 0.876 0.787 0.873 0.787 Digit2 0.886 0.812 0.876 0.802 0.887 0.812 0.910 0.841 0.919 0.851 0.795 0.757 0.885 0.812 — — 0.901 0.823 0.882 0.801 Digit3 0.892 0.806 0.727 0.713 0.697 0.672 0.912 0.840 0.922 0.863 0.817 0.786 0.895 0.817 — — 0.770 0.752 0.878 0.794 Reuters1 0.502 0.368 0.486 0.356 0.486 0.356 0.500 0.367 0.485 0.355 0.515 0.369 0.458 0.341 0.483 0.354 0.449 0.321 0.425 0.306 Reuters2 0.494 0.363 0.497 0.368 0.513 0.383 0.503 0.361 0.506 0.374 0.505 0.373 0.432 0.368 0.625 0.455 0.446 0.303 0.447 0.299 Reuters3 0.501 0.367 0.493 0.364 0.490 0.363 0.498 0.362 0.489 0.359 0.525 0.395 0.464 0.347 0.545 0.361 0.461 0.328 0.442 0.318 Reuters4 0.482 0.328 0.480 0.339 — — 0.489 0.337 0.461 0.312 0.477 0.333 0.447 0.326 — — 0.448 0.296 0.445 0.304 Cornell 0.547 0.258 0.473 0.198 0.431 0.105 0.663 0.387 0.442 0.147 0.515 0.209 0.452 0.146 0.357 0.035 0.442 0.117 0.494 0.209 Texas 0.546 0.353 0.642 0.393 0.482 0.341 0.669 0.461 0.446 0.237 0.598 0.344 0.491 0.349 0.375 0.076 0.633 0.332 0.571 0.301 Washington 0.660 0.399 0.660 0.401 0.621 0.400 0.666 0.420 0.628 0.352 0.551 0.279 0.673 0.407 0.378 0.067 0.583 0.384 0.628 0.358 Wisconsin 0.544 0.413 0.597 0.434 0.569 0.433 0.687 0.488 0.731 0.479 0.603 0.439 0.549 0.397 0.307 0.011 0.430 0.608 0.625 0.467

SLIDE 23

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CPU time

Data set OKkC MKkC LMKkC UMDS UCA SNF LMF NMFce CRSCp CRSCc BBC-2V 140.99 2.90 106.97 1.31 0.75 32.24 62.80 203.68 16.01 1.95 BBC-3V 48.86 1.25 64.06 0.91 0.46 10.89 33.80 158.81 11.88 2.88 BBCsp-2V 11.86 0.54 7.49 1.38 0.10 1.48 6.05 82.98 1.85 0.25 BBCsp-3V 5.48 0.92 3.22 0.85 0.06 0.75 6.25 38.39 2.01 1.44 Digit1 254.25 42.11 451.53 2.35 1.86 37.84 88.98 — 20.29 25.06 Digit2 255.08 21.49 397.28 2.56 1.34 32.60 75.35 — 21.14 22.89 Digit3 276.42 47.36 848.15 2.35 2.03 49.49 88.83 — 33.78 33.08 Reuters1 57.58 3.06 49.92 1.07 0.46 10.60 32.12 164.17 12.48 2.88 Reuters2 51.21 3.15 65.62 5.53 0.46 10.75 32.43 157.43 11.99 1.33 Reuters3 57.25 3.06 232.81 1.48 0.65 15.84 42.56 172.56 24.86 1.88 Reuters4 3194.11 55.65 — 36.65 21.51 1653.26 1198.62 — 450.20 52.12 Cornell 6.14 0.28 1.73 0.35 0.04 0.35 3.79 23.05 0.56 0.74 Texas 7.49 1.05 2.08 0.44 0.04 0.44 4.20 23.54 1.53 2.22 Washington 7.20 0.97 1.45 0.36 0.03 0.48 5.25 24.82 1.60 2.88 Wisconsin 8.90 0.63 2.56 0.31 0.03 0.47 5.79 31.31 1.64 2.33

SLIDE 24

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part II: Learning Uniform Neighborhood Graph

SLIDE 25

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Class-inconsistence of neighbors

1 2 3 4 5 6 7 8 9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Percentage of class−inconsistent neighbors view 1 view 2 view 3

Figure: Average percentages of neighbors from difgerent digital manifolds in each k-nearest neighbor set (k = 30) on Digit3.

SLIDE 26

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Let N v

j be a neighborhood of point j in view v.

union neighborhood: N U

j = N 1 j ∪ · · · ∪ N m j

joint neighborhood: N J

j = N 1 j ∩ · · · ∩ N m j

class-consistent neighborhood N C

j

f N U

j

Table: The 200 smallest

J j

and

U j

and

C j

n Digit3.

Neighborhood size k 10 20 30 40 50 Average size of

J j

0.49 1.09 1.83 Average size of

U j

24.7 49.0 73.1 97.0 120.8 Average size of

C j

21.3 34.5 43.0 50.1 57.4

SLIDE 27

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Let N v

j be a neighborhood of point j in view v.

union neighborhood: N U

j = N 1 j ∪ · · · ∪ N m j

joint neighborhood: N J

j = N 1 j ∩ · · · ∩ N m j

class-consistent neighborhood N C

j

f N U

j

Table: The 200 smallest {N J

j } and {N U j } and {N C j } on Digit3.

Neighborhood size k 10 20 30 40 50 Average size of {N J

j }

0.49 1.09 1.83 Average size of {N U

j }

24.7 49.0 73.1 97.0 120.8 Average size of {N C

j }

21.3 34.5 43.0 50.1 57.4

SLIDE 28

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Question: Can we determine uniform neighborhoods containing class-consistent neighbors?

SLIDE 29

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Joint sparse neighborhoods

Let Xv

j = Xv(:, N v j ), represent xv j by its neighbors as xv j ≈ Xv j wv j :

min

Wj

λ∥Wj∥2,1 + 1 2

m

∑

v=1

av

j

xv

j − Xv j wv j

2

2,

s.t. eTwv

j = 1, ∀v,

(4)

Wj = [w1

j , · · · , wm j ], ∥Wj∥2,1 = ∑ i ∥(w1 ij, · · · , wm ij )∥2

λ is a trade-ofg parameter to tune the joint sparsity of Wj
{av

j } are normalization constants,

av

j = (1 + |N U j |)/(∥xv j ∥2 2 +

∑

i∈N U

j

∥xv

i ∥2 2),

r

av

j = 1/(∥xv j ∥2 2 + µ)

It is solvable by ADMM method

SLIDE 30

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The JSN graph

Connection between nodes i and j:

wv

ij: affjnity relation of neighbor i to the center node j in view v

ωij = ∥(w1

ij, · · · , wm ij )∥: the total affjnity of neighbor i to the center j

JSN graph S = (sij), sij = ∥(ωij, ωji)∥.

SLIDE 31

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Advantage: the JSN graph has stronger class-consistent connections, measured by θj = ∑

i∈ ¯ N C

j |sij|

∑

i |sij|

, where ¯ N C

j

is the index set of class-consistent neighbors of node j.

Data set Digit1 Digit2 Digit3 Animal COIL-20 Reuters BBCn BBCs Cora 2V 3V 2V 3V JSN* 0.847 0.896 0.891 0.505 0.969 0.525 0.788 0.732 0.740 0.677 0.623 GS** 0.517 0.659 0.576 0.401 0.842 0.510 0.705 0.679 0.703 0.669 0.527

* using the uniform setting λ = 0.5 and the neighborhood size k = 30 for all data sets. ** using the best settings of its two parameters among multiple tested values for each data set.

SLIDE 32

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Applications

I. Multi-view manifold clustering
Spectral method on the JSN graph
II. Modify view-specifjc graphs to improve multi-view methods
Filter partial edges of each view-specifjc graph
Weight view-specifjc graphs

SLIDE 33

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Multi-view manifold clustering

Norm for sij ∥ · ∥2 ∥ · ∥1 ∥ · ∥∞ Norm for ωij ∥ · ∥2 ∥ · ∥1 ∥ · ∥∞ OG ∥ · ∥2 ∥ · ∥1 ∥ · ∥∞ OG ∥ · ∥2 ∥ · ∥1 ∥ · ∥∞ OG Algorithm GS JSN GS JSN GS JSN GS JSN GS JSN GS JSN GS JSN GS JSN GS JSN NMI Digit1 0.79 0.84 0.80 0.84 0.80 0.84 0.78 0.80 0.85 0.80 0.86 0.80 0.84 0.79 0.79 0.84 0.79 0.84 0.80 0.84 0.78 Digit2 0.85 0.95 0.85 0.95 0.85 0.95 0.69 0.85 0.95 0.86 0.95 0.85 0.95 0.70 0.84 0.94 0.85 0.94 0.84 0.94 0.69 Digit3 0.86 0.94 0.89 0.94 0.87 0.94 0.78 0.87 0.94 0.89 0.94 0.87 0.94 0.79 0.86 0.94 0.88 0.94 0.86 0.94 0.78 Animal 0.31 0.33 0.33 0.34 0.31 0.33 0.19 0.31 0.33 0.32 0.33 0.31 0.33 0.19 0.31 0.33 0.33 0.33 0.31 0.33 0.18 COIL-20 0.87 0.92 0.86 0.91 0.85 0.92 0.87 0.84 0.94 0.84 0.94 0.87 0.92 0.86 0.87 0.90 0.86 0.92 0.84 0.91 0.86 Reuters 0.35 0.43 0.33 0.43 0.36 0.45 0.35 0.30 0.44 0.31 0.45 0.34 0.44 0.35 0.37 0.43 0.36 0.43 0.36 0.44 0.37 BBCn-2V 0.73 0.83 0.71 0.82 0.81 0.84 0.77 0.47 0.83 0.42 0.69 0.72 0.84 0.77 0.81 0.84 0.78 0.83 0.83 0.84 0.76 BBCn-3V 0.72 0.74 0.43 0.71 0.82 0.83 0.76 0.41 0.63 0.32 0.62 0.72 0.79 0.76 0.81 0.81 0.73 0.76 0.83 0.82 0.76 BBCs-2V 0.60 0.76 0.54 0.76 0.61 0.82 0.73 0.50 0.75 0.47 0.76 0.61 0.82 0.73 0.60 0.83 0.59 0.78 0.70 0.83 0.74 BBCs-3V 0.78 0.72 0.64 0.72 0.76 0.72 0.69 0.61 0.73 0.40 0.72 0.77 0.73 0.69 0.78 0.71 0.79 0.72 0.76 0.79 0.70 Cora 0.47 0.51 0.47 0.52 0.52 0.52 0.17 0.46 0.49 0.34 0.44 0.48 0.52 0.17 0.52 0.52 0.48 0.52 0.53 0.52 0.16

SLIDE 34

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Graph modifjcation

Graph-fjltering: Given view-specifjc graphs {Gv} with adjacency weights {ev

ij} of edges, we modify {Gv} to {¯

Gv} with adjacency weights {¯ ev

ij}

¯ ev

ij = nijev ij =

{ ev

ij,

j ∈ N S

i or i ∈ N S j or i = j;

0,

therwise,

Each ¯ Gv is sparse as the learned JSN graph.

SLIDE 35

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table: Clustering accuracy of algorithms on the original graphs or the fjltered graphs by the JSN network (k = 30).

Data set NMI ACC CentroidSC PairwiseSC MKkC UCA SNF CentroidSC PairwiseSC MKkC UCA SNF

rig, modi.
rig. modi.
rig. modi.
rig. modi.
rig. modi.
rig. modi.
rig. modi.
rig. modi.
rig. modi.
rig. modi.

Digit1 0.79 0.83 0.78 0.85 0.71 0.82 0.86 0.86 0.72 0.83 0.87 0.90 0.86 0.91 0.73 0.85 0.93 0.92 0.77 0.83 Digit2 0.81 0.90 0.78 0.91 0.80 0.89 0.85 0.91 0.75 0.87 0.89 0.95 0.88 0.96 0.88 0.95 0.91 0.96 0.78 0.84 Digit3 0.83 0.91 0.82 0.91 0.72 0.89 0.87 0.91 0.75 0.92 0.91 0.95 0.90 0.95 0.73 0.94 0.93 0.96 0.79 0.96 Animal 0.30 0.34 0.31 0.34 0.27 0.30 0.23 0.34 0.20 0.31 0.42 0.56 0.50 0.56 0.50 0.59 0.45 0.57 0.36 0.47 COIL-20 0.79 0.82 0.80 0.82 0.80 0.81 0.85 0.82 0.75 0.84 0.71 0.73 0.71 0.72 0.69 0.74 0.75 0.76 0.66 0.76 Reuters 0.35 0.40 0.35 0.41 0.36 0.39 0.36 0.39 0.38 0.39 0.48 0.53 0.48 0.54 0.49 0.59 0.49 0.53 0.52 0.52 BBCn-2V 0.77 0.81 0.77 0.81 0.77 0.78 0.78 0.80 0.74 0.76 0.91 0.93 0.92 0.93 0.92 0.92 0.92 0.93 0.90 0.91 BBCn-3V 0.73 0.79 0.74 0.80 0.74 0.78 0.75 0.79 0.75 0.76 0.88 0.92 0.89 0.93 0.89 0.91 0.89 0.92 0.90 0.90 BBCs-2V 0.58 0.80 0.53 0.80 0.66 0.76 0.66 0.81 0.82 0.81 0.74 0.92 0.65 0.92 0.83 0.90 0.83 0.93 0.93 0.93 BBCs-3V 0.63 0.70 0.66 0.71 0.68 0.72 0.68 0.72 0.68 0.69 0.72 0.78 0.75 0.78 0.76 0.77 0.76 0.77 0.76 0.77 Cora 0.19 0.48 0.19 0.49 0.17 0.44 0.17 0.47 0.33 0.53 0.38 0.68 0.37 0.69 0.36 0.61 0.37 0.65 0.48 0.71

SLIDE 36

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Graph-weighting: Given view-specifjc graphs {Gv} with adjacency weights {ev

ij} of edges, we weight them to {˜

Gv} using the JSN graph S = (sij) ˜ ev

ij =

{ sijev

ij,

i ̸= j; ev

ij,

i = j.

SLIDE 37

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table: Clustering accuracy of algorithms on the weighted kernels or similarities using the GS weights or JSN weights (k = 30).

Data set NMI ACC CentroidSC PairwiseSC MKkC UCA SNF MMCJSN CentroidSC PairwiseSC MKkC UCA SNF MMCJSN GSW JSW GSW JSW GSW JSW GSW JSW GSW JSW GSW JSW GSW JSW GSW JSW GSW JSW GSW JSW Digit1 0.81 0.83 0.83 0.83 0.81 0.82 0.84 0.84 0.82 0.84 0.80 0.81 0.82 0.82 0.82 0.80 0.83 0.86 0.86 0.81 0.82 0.82 Digit2 0.85 0.94 0.86 0.94 0.84 0.92 0.86 0.93 0.89 0.95 0.94 0.86 0.97 0.87 0.97 0.85 0.97 0.86 0.96 0.84 0.98 0.97 Digit3 0.88 0.93 0.89 0.93 0.86 0.92 0.91 0.94 0.89 0.95 0.93 0.94 0.96 0.94 0.96 0.88 0.96 0.96 0.98 0.84 0.97 0.96 Animal 0.32 0.35 0.34 0.35 0.32 0.34 0.32 0.34 0.31 0.33 0.33 0.41 0.45 0.43 0.47 0.58 0.58 0.49 0.62 0.42 0.50 0.52 COIL-20 0.85 0.90 0.87 0.90 0.79 0.77 0.89 0.91 0.91 0.94 0.89 0.70 0.79 0.72 0.79 0.58 0.67 0.75 0.81 0.80 0.83 0.76 Reuters 0.38 0.41 0.38 0.42 0.23 0.39 0.32 0.41 0.43 0.43 0.44 0.51 0.54 0.52 0.54 0.34 0.53 0.47 0.53 0.55 0.55 0.55 BBCn-2V 0.65 0.82 0.65 0.83 0.82 0.80 0.83 0.82 0.84 0.84 0.84 0.77 0.94 0.77 0.94 0.94 0.93 0.94 0.94 0.95 0.95 0.95 BBCn-3V 0.71 0.70 0.72 0.70 0.82 0.81 0.82 0.81 0.83 0.83 0.82 0.87 0.86 0.87 0.86 0.94 0.93 0.94 0.94 0.94 0.94 0.94 BBCs-2V 0.63 0.77 0.64 0.77 0.77 0.81 0.77 0.79 0.81 0.81 0.83 0.72 0.90 0.73 0.90 0.92 0.93 0.89 0.91 0.89 0.88 0.94 BBCs-3V 0.75 0.79 0.75 0.80 0.65 0.74 0.64 0.71 0.80 0.80 0.75 0.87 0.90 0.88 0.91 0.74 0.77 0.74 0.78 0.82 0.83 0.85 Cora 0.46 0.51 0.46 0.51 0.49 0.51 0.51 0.53 0.52 0.49 0.52 0.66 0.67 0.66 0.68 0.61 0.68 0.64 0.65 0.69 0.64 0.68

SLIDE 38

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part III: Graph Refjnement Question: Can we refjne a given graph to highlight the latent group structure?

SLIDE 39

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

An ideal graph G = (gij) for clustering should be

structurally sparse:

few of connection between difgerent groups

nonnegative and positive semidefjnite:

similarity measurement

low-rank:

small number of groups

SLIDE 40

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Constructed graphs may be

approximately low-rank but generally dense (similarity-based)
sparse but group-number rank may be lost (neighborhood-based)

SLIDE 41

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Existing approaches for graph modifjcation

SPLR, Richard/Savalle/Vayatis,2012: implicit strategies for graph

denoising

ℓ1-norm based minimization for sparsity
nuclear norm for approximately low-rank
DCD, Yang/Corander/Oja, 2016: explicit forms for graph

approximation

doubly stochastic structure: sPTP, P probability matrix
PTD−1P

Our approach: explicit sparsity and semi-explicit restriction on group-number rank and positive semi-defjnite

SLIDE 42

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Existing approaches for graph modifjcation

SPLR, Richard/Savalle/Vayatis,2012: implicit strategies for graph

denoising

ℓ1-norm based minimization for sparsity
nuclear norm for approximately low-rank
DCD, Yang/Corander/Oja, 2016: explicit forms for graph

approximation

doubly stochastic structure: sPTP, P probability matrix
PTD−1P

Our approach: explicit sparsity and semi-explicit restriction on group-number rank and positive semi-defjnite

SLIDE 43

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

SLSA: Simultaneously Low-rank and Sparse Approximation, minZ,U { f(Z; A) + θ∥Z − UUT∥2

F

} , s.t. ∥Zoff∥0 ≤ η, Z ≥ 0, UTU = IK. (5)

the input symmetric matrix A should be rescaled as

√ K ∥A∥F A

f(Z; A) is a loss function of the approximation of Z to A
Zoff = Z − diag(z11, · · · , znn), K: number of groups
η: an estimated sparsity, for example, η = 2⌊(ρn2/K − n)/2⌋

SLIDE 44

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Example: A = S + E

symmetric S = diag(S1, S2, S3) with block orders 20, 15, 20
(Sk)ii = 1, (Sk)ij, i ̸= j, are chosen from [0.1, 1] uniformly
E: symmetric, non-zero entries out of the diagonal blocks only
three kinds of distributions of E, rescaled to have entries in [0, 1]:

(1) absolute of standard normal distribution, (2) Poisson distribution λk

k! e−λ. We choose λ = 1, and

(3) sparse uniform distribution on [0.9, 1] with 20% nonzero entries whose positions are randomly chosen.

SLIDE 45

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure: Comparison of SLSA using difgerent loss functions fF = ∥Z − A∥2

F or

f1 = ∥Z − A∥1 on the three matrices (left): the fjrst and last iteration solutions for fF (middle) or f1 (right). noise =∥E∥F/∥S∥F

SLIDE 46

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Algorithm

Let F(Z, U) = f(Z; A) + θ∥Z − UUT∥2

F. Alternatively update U and Z as

Uk = arg min

U F(Zk−1, U),

s.t. UTU = IK; (6) Zk = arg min

Z F(Z, Uk),

s.t. Z ≥ 0, ∥Zoff∥0 ≤ η. (7) Initially set Z0 = A.

Closed forms exist for solutions of both the subproblems.
Low cost:
For (??), subspace updating using QR decomposition, 4Kn2 + ηKn

each iteration.

For (??), (2K + 9 + log n)n2.

SLIDE 47

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Convergence

Theorem 1

The sequence {F(Zk, Uk)} converges decreasingly.
If any accumulation point Z∗ of {Zk} has difgerent K-th and

(K + 1)-st largest eigenvalues, then UkUT

k − Uk+1UT k+1 → 0.

Theorem 2

An accumulation point (Z∗, U∗) of {(Zk, Uk+1)} satisfjes the KKT cond. 0 ∈ ∂f(Z) + 2θ(Z − UUT), ZU = Udiag ( λ1(Z), · · · , λK(Z) ) , if λK(Z∗) > λK+1(Z∗).

Theorem 3

If λK(Z∗) > λK+1(Z∗) for any accumulation point Z∗ of {Zk}, {UkUT

k }

has an isolated accumulation point, then {UkUT

k } and {Zk} converge.

SLIDE 48

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Comparison with SPLR/DCD

Synthetic graph with three groups, generated as A = S + t∥S∥F ∥E∥F E, S = diag(S1, S2, S3)

each Si: fully connected symmetric subgraph of a group of 100

members with entries randomly chosen from (0, 1), diag(Si) = I.

E: symmetric and sparse, having 20% nonzero entries in ofg-diagonal

blocks whose positions are randomly chosen

t: the signal-noise ratio (SNR).

SLIDE 49

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Measurements for the quality of refjnement cb(Z) = ∥ZΩc∥F

∥S∥F ,

Ωc : set of index pairs between groups cw(Z) = ∥ZΩ∥F

∥S∥F ,

Ω : set of index pairs within groups cm(Z) = ∑

(i,j)∈Ω

( zij − (∑

ℓ ziℓ)(∑ ℓ zjℓ)

∑

ij zij

) / ∑

ij zij

cr(Z) =

σK+1(Z)

1 K (σ1(Z)+···+σK(Z))

The solution Z is rescaled to be αZ with α = arg minα ∥S − αZ∥F.
A larger generalized modularity cm(Z) means a stronger connection

within groups if cw(Z) is still large.

A smaller cr(Z) means a better approximation of group-number rank.

SLIDE 50

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table: Comparison of SPLR and SLSA

SNR SPLR (λ2 = 0.01λ1) SLSA (ρ = 1) λ1 cb cw cm cr θ cb cw cm cr 0.50 0.1 0.22 0.91 0.48 0.08 1 0.11 0.97 0.65 0.06 0.3 0.02 0.92 0.65 0.00 3 0.00 0.96 0.67 0.02 0.5 0.00 0.93 0.67 0.00 5 0.00 0.95 0.67 0.01 0.7 0.00 0.92 0.67 0.00 7 0.00 0.94 0.67 0.01 1.00 0.1 0.45 0.69 0.31 0.16 1 0.32 0.77 0.50 0.09 0.3 0.22 0.87 0.45 0.01 3 0.00 0.95 0.67 0.02 0.5 0.04 0.92 0.64 0.02 5 0.00 0.94 0.67 0.01 0.7 0.00 0.13 0.42 0.57 7 0.00 0.94 0.67 0.01 1.50 0.1 0.48 0.45 0.19 0.25 1 0.37 0.64 0.42 0.12 0.3 0.40 0.70 0.25 0.05 3 0.12 0.91 0.64 0.03 0.5 0.31 0.74 0.43 0.08 5 0.03 0.94 0.66 0.01 0.7 NaN NaN NaN NaN 7 0.02 0.93 0.66 0.01

η = 2⌊(ρn2/K − n)/2⌋ in SLSA.

SLIDE 51

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table: DCD Results

SNR cb cw cm 0.5 0.1910 0.8812 0.4405 1.0 0.3326 0.7769 0.2993 1.5 0.4158 0.6459 0.1993

SLIDE 52

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Real-world data

Table: Comparisons of spectral clustering on real world data sets.

Graph Data AC Purity Name m n ng F.G. NbSp DCD SPLR SLSA F.G. NbSp DCD SPLR SLSA Dense Wine 12 178 3 0.815 0.837 0.674 0.815 0.871 0.815 0.837 0.674 0.815 0.871 Ecoli 7 336 8 0.548 0.562 0.524 0.610 0.741 0.804 0.848 0.765 0.801 0.807 Forest 27 523 4 0.744 0.753 0.713 0.748 0.769 0.744 0.763 0.713 0.748 0.769 Glass 9 214 6 0.556 0.528 0.542 0.561 0.598 0.650 0.650 0.654 0.659 0.645 Verterbral 6 310 3 0.748 0.726 0.532 0.752 0.758 0.752 0.774 0.726 0.758 0.761 Orl 10304 400 40 0.818 0.797 0.795 0.840 0.828 0.838 0.805 0.825 0.855 0.835 Sparse Mnist 784 1000 10 0.537 0.700 0.704 0.737 0.779 0.582 0.743 0.747 0.749 0.779 Coil20 16384 1440 20 0.611 0.829 0.830 0.840 0.842 0.626 0.878 0.878 0.886 0.878 Satimage 36 4435 6 0.589 0.644 0.641

0.698

0.630 0.645 0.683

0.712

Gisette 5000 7000 2 0.677 0.917 0.931

0.930

0.677 0.917 0.931

0.930
Gaussian graph A =

( exp(− ∥xi−xj∥2

σ

) )

AC = 1

n maxπ

∑

i n(Ci, C∗ π(i))

Purity = 1

n

∑

k maxj n(Ck, C∗ j )

F.G. = ’full graph’, NbSp = ’neighborhood sparse graph’

SLIDE 53

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Applications

Applications of SLSA:

subspace learning
SSC (subspace clustering)
LRR (low-rank representation of subspaces)
nonlinear manifold dimensionality reduction
LLE: locally linear embedding
LE: Laplacian eigenmap
LPP: the linear version of LE
multi-view learning for clustering
CRSC: co-regularized spectral clustering
MKkC: multiple kernel k-means clustering

SLIDE 54

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Subspace learning

Refjne the connection graph for SSC or the similarity S for LRR:

SSC learns a connection graph G
LRR learns a similarity matrix S
Subspaces are segmented via spectral clustering on G or S.

The SLSA can improve the sparsity and group-number rank of G or S.

sparsity: percentage of small entries:

n2

i j

aij

ij aij

group-number rank:

K

K 1 1 K .

SLIDE 55

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Subspace learning

Refjne the connection graph for SSC or the similarity S for LRR:

SSC learns a connection graph G
LRR learns a similarity matrix S
Subspaces are segmented via spectral clustering on G or S.

The SLSA can improve the sparsity and group-number rank of G or S.

sparsity: percentage of small entries:

ρ(τ) = max { |Ω|/n2 : ∑

(i,j)∈Ω |aij| ≤ τ maxij |aij|

}

group-number rank: ζ = KσK+1/(σ1 + · · · + σK).

SLIDE 56

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table: Average values of ρ = ρ(10−3) and ζ of SSC, LRR, and their SLSA refjnements on three representative examples with 642, 636, 635 face images of 10 individuals, respectively

SSC LRR λ 9.00 10.60 12.20 13.80 15.40 17.00 0.60 0.84 1.08 1.32 1.56 1.80 ρ Original 0.9898 0.9880 0.9857 0.9828 0.9795 0.9760 0.7142 0.7733 0.8077 0.8308 0.8477 0.8607 SLSA 0.9518 0.9513 0.9512 0.9512 0.9512 0.9512 0.9505 0.9505 0.9505 0.9506 0.9506 0.9506 ζ Original 0.9781 0.9701 0.9673 0.9671 0.9675 0.9643 0.9369 0.9420 0.9560 0.9690 0.9804 0.9881 SLSA 0.7097 0.7102 0.7104 0.7152 0.7216 0.7288 0.7565 0.7438 0.7413 0.7386 0.7344 0.7293

SLIDE 57

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Manifold learning

Refjne the weights of LLE for low-dimansional embedding:

LLE
Estimate weights {wij}: xj ≈ ∑

i∈Nj wijxi

Low-dimension embedding: minYYT=Id

∑

j ∥yj − ∑ i∈Nj wijyi∥2 2.

Improve the LLE embedding
SLSR refjnes A = |W| + |W|T to get S
normalize S to ˆ

S = (ˆ sij) with ˆ sij = sij/ ∑

ℓ̸=j sℓj

embedding: minYYT=Id

∑

j ∥yj − ∑ i∈Nj ˆ

sijyi∥2

2.

SLSA can decrease the dispersion of each group and increase the separation of difgerent groups:

dispersion of each group gd Ck

1 Ck i Ck yi

ck 2

separation from other groups gs Ck

i k ci

ck 2

ck: class center ck

1 Ck i Ck yi

SLIDE 58

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Manifold learning

Refjne the weights of LLE for low-dimansional embedding:

LLE
Estimate weights {wij}: xj ≈ ∑

i∈Nj wijxi

Low-dimension embedding: minYYT=Id

∑

j ∥yj − ∑ i∈Nj wijyi∥2 2.

Improve the LLE embedding
SLSR refjnes A = |W| + |W|T to get S
normalize S to ˆ

S = (ˆ sij) with ˆ sij = sij/ ∑

ℓ̸=j sℓj

embedding: minYYT=Id

∑

j ∥yj − ∑ i∈Nj ˆ

sijyi∥2

2.

SLSA can decrease the dispersion of each group and increase the separation of difgerent groups:

dispersion of each group gd(Ck) =

1 |Ck|

∑

i∈Ck ∥yi − ck∥2

separation from other groups gs(Ck) = mini̸=k ∥ci − ck∥2
ck: class center ck =

1 |Ck|

∑

i∈Ck yi

SLIDE 59

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table: Group dispersion/separation and clustering accuracy of LLE embedding and modifjed one by SLSA

n.s. Proj.

∥I−ˆ S∥0 ∥I−W∥0 Digit

1 2 3 4 5 6 7 8 9 AC 10 LLE 1 gd 0.005 0.037 0.025 0.032 0.029 0.041 0.058 0.093 0.013 0.027 0.797 gs 0.019 0.105 0.114 0.111 0.114 0.104 0.110 0.111 0.019 0.103 SLSA-Modifjed 1.82 gd 0.006 0.029 0.029 0.034 0.015 0.031 0.098 0.095 0.012 0.021 0.821 gs 0.025 0.108 0.112 0.110 0.114 0.109 0.112 0.115 0.025 0.104 3.64 gd 0.007 0.021 0.018 0.016 0.013 0.029 0.099 0.095 0.013 0.024 0.828 gs 0.028 0.108 0.113 0.113 0.114 0.112 0.112 0.115 0.028 0.104 5.45 gd 0.015 0.023 0.011 0.011 0.012 0.029 0.098 0.024 0.029 0.026 0.958 gs 0.132 0.132 0.132 0.132 0.133 0.132 0.135 0.131 0.131 0.131 20 LLE 1 gd 0.015 0.041 0.023 0.032 0.066 0.064 0.103 0.085 0.019 0.032 0.701 gs 0.045 0.094 0.088 0.077 0.103 0.065 0.092 0.068 0.045 0.068 SLSA-Modifjed 0.95 gd 0.008 0.028 0.030 0.028 0.018 0.028 0.061 0.025 0.011 0.019 0.863 gs 0.039 0.103 0.106 0.105 0.110 0.101 0.102 0.108 0.039 0.102 1.43 gd 0.015 0.030 0.036 0.028 0.018 0.031 0.030 0.094 0.027 0.029 0.954 gs 0.131 0.131 0.130 0.130 0.132 0.130 0.134 0.131 0.130 0.130 2.38 gd 0.012 0.028 0.032 0.025 0.016 0.030 0.026 0.093 0.028 0.018 0.953 gs 0.132 0.132 0.132 0.131 0.133 0.131 0.134 0.132 0.132 0.132

SLIDE 60

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Refjne the adjacency matrix W = (wij) for LE or LLE:

LE: minY

∑

ij wij∥yi − yj∥2 2, s.t. YDYT = Id, YDe = 0

LLE: minP

∑

ij wij∥Pxi − Pxj∥2 2, s.t. PXD(PX)T = Id

SLSA can signifjcantly improve the low-dimensional embeddings of LE/LPP for clustering via refjning the adjacency matrix.

SLIDE 61

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table: AC of LE/LPP and their SLSA refjnement (handwritten data)

LE (k = 10): AC = 0.809 LPP (k = 30): AC = 0.830 θ \ ρ 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1.0 1.1 1.2 1.3 1.4 1.5 1.6 0.5 0.953 0.954 0.953 0.951 0.952 0.949 0.949 0.920 0.921 0.923 0.921 0.920 0.921 0.920 1.0 0.951 0.951 0.951 0.952 0.950 0.951 0.949 0.924 0.925 0.928 0.927 0.924 0.926 0.923 1.5 0.948 0.949 0.950 0.950 0.948 0.947 0.944 0.927 0.926 0.930 0.929 0.926 0.927 0.925 2.0 0.947 0.947 0.948 0.949 0.950 0.946 0.943 0.929 0.931 0.935 0.929 0.928 0.927 0.925 2.5 0.945 0.946 0.948 0.948 0.950 0.945 0.941 0.929 0.931 0.935 0.928 0.930 0.927 0.925 3.0 0.943 0.946 0.947 0.949 0.950 0.945 0.939 0.930 0.933 0.936 0.930 0.929 0.926 0.926

SLIDE 62

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Multi-view learning

Refjne multi-view graphs {Sv} for CRSC or kernels {Kv} for MKkC:

CRSC: Given multiple similarity matrices {Sv},
min UTU=I

UT v Uv=I

∑

v

{ tr ( UT

v LvUv

) + λv∥UvUT

v − UUT∥2 F

}

Lv = D−1/2

v

(Dv − Sv)D−1/2

v

, normalized Laplacian.

MKkC: Given multiple kernel matrices {Kv},
Combine the multiple kernels {Kv}: K(θ) = ∑

v θ2 vKv, ∑ θv = 1

Apply spectral method on K(θ): maxθ,U∈U tr(UTK(θ)U − K(θ))

SLSA can improve the graph fusion of CRSC and MKkC.

SLIDE 63

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table: AC values of CRSC/MKkC and their SLSA refjnements (Twitter data)

Data set politics-ie

lympics

football CRSC 0.948 (λ = 2) 0.890 (λ = 1) 0.863 (λ = 2) SLSA θ\ρ 1.0 1.2 1.4 1.6 θ\ρ 1.8 2.0 2.2 2.4 θ\ρ 0.6 0.7 0.8 0.9 improv. 0.02 0.951 0.950 0.951 0.953 0.1 0.923 0.918 0.929 0.911 1.5 0.893 0.899 0.911 0.896 0.04 0.951 0.951 0.948 0.951 0.3 0.933 0.919 0.935 0.916 1.7 0.896 0.899 0.901 0.915 0.06 0.951 0.948 0.945 0.950 0.5 0.933 0.934 0.933 0.935 1.9 0.895 0.901 0.891 0.904 MKkC 0.778 0.851 0.834 SLSA θ\ρ 1.3 1.4 1.5 1.6 θ\ρ 0.9 1.1 1.3 1.5 θ\ρ 0.8 1.0 1.2 1.4 improv. 0.7 0.933 0.905 0.905 0.914 0.1 0.920 0.933 0.943 0.923 0.3 0.889 0.887 0.891 0.891 1.1 0.934 0.945 0.945 0.948 0.3 0.916 0.944 0.943 0.945 0.4 0.856 0.891 0.893 0.892 1.5 0.942 0.948 0.948 0.945 0.5 0.928 0.938 0.956 0.914 0.5 0.858 0.891 0.895 0.891

SLIDE 64

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .