Co-manifold learning with missing data Gal Mishne, Eric C. Chi and - - PowerPoint PPT Presentation

▶

Oct 01, 2022 368 likes •561 views

Co-manifold learning with missing data Gal Mishne, Eric C. Chi and Ronald R. Coifman Department of Mathematics, Yale University Department of Statistics, North Carolina State University June 12, 2019 Gal Mishne (Yale) Co-Manifold Learning

SLIDE 1

Co-manifold learning with missing data

Gal Mishne, Eric C. Chi and Ronald R. Coifman

Department of Mathematics, Yale University Department of Statistics, North Carolina State University

June 12, 2019

Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 1 / 14

SLIDE 2

The Biclustering Problem

Task

Given a data matrix X ∈ Rn×p, find subgroups of rows & columns that go together. Text mining: similar documents share a small set of highly correlated words. Collaborative filtering: likeminded customers share similar preferences for a subset of products Cancer genomics: subtypes of cancerous tumors share similar molecular profiles over a subset of genes

Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 2 / 14

SLIDE 3

Cancer Genomics

Lung cancer is heterogenous at the molecular level Which genes are driving lung cancer? These genes are potential drug targets Collect expression data

Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 3 / 14

SLIDE 4

Simple Solution: Cluster Dendrogram

Each dendrogram is constructed independently of multiscale structure in

ther dimension.

Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 4 / 14

SLIDE 5

From Co-clustering to Co-Manifold Learning

I would add that in many real-world applications there is no “true” fixed number of biclusters, i.e. the truth is a bit more continuous... –Anonymous Referee 2 Clustered Dendrogram

●
−0.4

−0.3 −0.2 −0.1 0.0 −0.3 −0.2 −0.1 0.0 0.1 Intrinsic Coordinate 1 Intrinsic Coordinate 2

New Row Coordinate System

−0.2

−0.1 0.0 0.1 0.2 −0.1 0.0 0.1 Intrinsic Coordinate 1 Intrinsic Coordinate 2

New Column Coordinate System

Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 5 / 14

SLIDE 6

What if data matrices are not completely observed?

Missing data scenario Complete data: X ∈ Rn×p Suppose we only get to observe Θ ⊂ {1, . . . , n} × {1, . . . , p}. Possibly by design: too expensive to collect / measure all np possible entries Goal: Recover row and column coordinate systems, not necessarily complete missing data

Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 6 / 14

2 3 5 1 10 2 15 1

X[i, j] = kyi zjk2

PΘ(X) = ( X[i, j] (i, j) ∈ Θ

therwise

y - helix z - 2D plane

SLIDE 7

Co-Manifold Learning

Solve co-clustering-missing problem at multiple row and column scales Build multiscale row and column metrics Calculate non-linear embeddings

Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 7 / 14

SLIDE 8

Step 1: Co-clustering an Incomplete Data Matrix

min

U F(U) = 1

2kPΩ(X U)k2

F + γc

X

i<j

Ω(kU·i U·jk2) + γr X

k<l

Ω(kUk· Ul·k2)

−2 −1 1 2 0.0 0.2 0.4 0.6 0.8 1.0

Folded concave penalty = ) less bias towards 0

Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 8 / 14

SLIDE 9

Step 1: Majorization-Minimization (MM)

G(U | V) = 1 2k˜ X Uk2

F + γc

X

i<j

˜ wc,ijkU·i U·jk2 + γr X

k<l

˜ wr,kl kUk· Ul·k2 + c ˜ X = PΩ(X) + PΩc(V) ˜ wc,ij = Ω0(kV·i V·jk2) and ˜ wr,kl = Ω0(kVk· Vl·k2) Can be solved with Convex Bi-clustering [Chi et al. 2017].

−2 −1 1 2 0.0 0.2 0.4 0.6 0.8 1.0

Gal Mishne (Yale)

Co-Manifold Learning June 12, 2019 8 / 14

SLIDE 10

Step 1: Majorization-Minimization (MM)

Majorization: G(U | V) = 1 2kX Uk2

F + γc

X

i<j

˜ wc,ijkU·i U·jk2 + γr X

k<l

˜ wr,kl kUk· Ul·k2 + c F(U) = G(U | U) F(U)  G(U | V) for all U MM: Solve sequence of Convex Biclustering Problems Ut+1 = arg min

G(U | Ut)

Proposition

Under suitable regularity conditions, the sequence Ut generated by Algorithm 1 has at least one limit point, and all limit points are d-stationary points of minimizing F(U).

Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 9 / 14

SLIDE 11

Step 1: Smoothing Rows and Columns at Different Scale

Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 10 / 14

SLIDE 12

Co-Manifold Learning

Solve co-clustering-missing problem at multiple row and column scales Build multiscale row and column metrics Calculate non-linear embeddings

Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 7 / 14

SLIDE 13

Step 2: Multiscale metric

Intuition: Pair of rows are close over multiple scale ! distance should be small Pair of rows are far apart over multiple scales ! distance should be big Step 1: Fill in X over multiple γr, γc scales: ˜ X

(r,c) = PΘ(X) + PΘc(U(γr, γc))

Step 2: Take weighted combination over all scales of pairwise distances d(Xi·, Xj·) = X

r,c

(γrγc)αk˜ X

(r,c) i·

˜ X

(r,c) j·

k2 α tunable to emphasize local versus global structure

Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 11 / 14

SLIDE 14

Co-Manifold Learning

Solve co-clustering-missing problem at multiple row and column scales Build multiscale row and column metrics Calculate non-linear embeddings

Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 7 / 14

SLIDE 15

Step 3: Spectral Embedding

Example: Diffusion Map (Coifman & Lafon, 2006) Construct an affinity matrix A[i, j] = exp{−d2(Xi·, Xj·)/σ2} Compute row-stochastic matrix P = D−1A, D[i, i] = X

A[i, j] Eigendecomposition of P: keep first d eigenvalues and eigenvectors Mapping Ψ embeds the rows into the Euclidean space Rd: Ψ : Xi· →

λ1ψ1(i), λ2ψ2(i), . . . , λdψd(i)

T.

Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 12 / 14

SLIDE 16

Some Examples

Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 13 / 14

Nonlinear Uncoupled Nonlinear Uncoupled Linear Coupled Nonlinear Coupled

SLIDE 17

Some Examples

Gal Mishne (Yale) Co-Manifold Learning June 12, 2019 14 / 14

Lung500 Linkage Quantitative evaluation   via clustering

10 20 30 40 50 60 70 80 90

percentage of missing values

0.2 0.4 0.6 0.8 1

ARI

Co-manifold DM-missing NLPCA FRPCAG =1 FRPCAG =100 10 20 30 40 50 60 70 80 90

percentage of missing values

0.2 0.4 0.6 0.8 1

ARI

Co-manifold learning with missing data

Gal Mishne, Eric C. Chi and Ronald R. Coifman

June 12, 2019

The Biclustering Problem

Task

Cancer Genomics

Lung cancer is heterogenous at the molecular level Which genes are driving lung cancer? These genes are potential drug targets Collect expression data

Simple Solution: Cluster Dendrogram

Each dendrogram is constructed independently of multiscale structure in

From Co-clustering to Co-Manifold Learning

I would add that in many real-world applications there is no “true” fixed number of biclusters, i.e. the truth is a bit more continuous... –Anonymous Referee 2 Clustered Dendrogram

New Row Coordinate System

New Column Coordinate System

What if data matrices are not completely observed?

Missing data scenario Complete data: X ∈ Rn×p Suppose we only get to observe Θ ⊂ {1, . . . , n} × {1, . . . , p}. Possibly by design: too expensive to collect / measure all np possible entries Goal: Recover row and column coordinate systems, not necessarily complete missing data

X[i, j] = kyi zjk2

PΘ(X) = ( X[i, j] (i, j) ∈ Θ

y - helix z - 2D plane

Co-Manifold Learning

Solve co-clustering-missing problem at multiple row and column scales Build multiscale row and column metrics Calculate non-linear embeddings

Step 1: Co-clustering an Incomplete Data Matrix

min

2kPΩ(X U)k2

X

Ω(kU·i U·jk2) + γr X

Ω(kUk· Ul·k2)

Folded concave penalty = ) less bias towards 0

Step 1: Majorization-Minimization (MM)

G(U | V) = 1 2k˜ X Uk2

X

˜ wc,ijkU·i U·jk2 + γr X

˜ wr,kl kUk· Ul·k2 + c ˜ X = PΩ(X) + PΩc(V) ˜ wc,ij = Ω0(kV·i V·jk2) and ˜ wr,kl = Ω0(kVk· Vl·k2) Can be solved with Convex Bi-clustering [Chi et al. 2017].

Step 1: Majorization-Minimization (MM)

Majorization: G(U | V) = 1 2kX Uk2

X

˜ wc,ijkU·i U·jk2 + γr X

˜ wr,kl kUk· Ul·k2 + c F(U) = G(U | U) F(U)  G(U | V) for all U MM: Solve sequence of Convex Biclustering Problems Ut+1 = arg min

G(U | Ut)

Proposition

Under suitable regularity conditions, the sequence Ut generated by Algorithm 1 has at least one limit point, and all limit points are d-stationary points of minimizing F(U).

Step 1: Smoothing Rows and Columns at Different Scale

Co-Manifold Learning

Solve co-clustering-missing problem at multiple row and column scales Build multiscale row and column metrics Calculate non-linear embeddings

Step 2: Multiscale metric

Intuition: Pair of rows are close over multiple scale ! distance should be small Pair of rows are far apart over multiple scales ! distance should be big Step 1: Fill in X over multiple γr, γc scales: ˜ X

Step 2: Take weighted combination over all scales of pairwise distances d(Xi·, Xj·) = X

(γrγc)αk˜ X

˜ X

k2 α tunable to emphasize local versus global structure

Co-Manifold Learning

Solve co-clustering-missing problem at multiple row and column scales Build multiscale row and column metrics Calculate non-linear embeddings

Step 3: Spectral Embedding

Example: Diffusion Map (Coifman & Lafon, 2006) Construct an affinity matrix A[i, j] = exp{−d2(Xi·, Xj·)/σ2} Compute row-stochastic matrix P = D−1A, D[i, i] = X

A[i, j] Eigendecomposition of P: keep first d eigenvalues and eigenvectors Mapping Ψ embeds the rows into the Euclidean space Rd: Ψ : Xi· →

T.

Some Examples

Nonlinear Uncoupled Nonlinear Uncoupled Linear Coupled Nonlinear Coupled

Some Examples

Lung500 Linkage Quantitative evaluation via clustering

Lung500 Linkage Quantitative evaluation   via clustering