Robust Spectral Inference for Joint Stochastic Matrix Factorization - - PowerPoint PPT Presentation

robust spectral inference for joint stochastic matrix
SMART_READER_LITE
LIVE PREVIEW

Robust Spectral Inference for Joint Stochastic Matrix Factorization - - PowerPoint PPT Presentation

Robust Spectral Inference for Joint Stochastic Matrix Factorization Kun Dong Cornell University October 20, 2016 K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 1 / 17


slide-1
SLIDE 1

Robust Spectral Inference for Joint Stochastic Matrix Factorization

Kun Dong

Cornell University

October 20, 2016

  • K. Dong (Cornell University)

Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 1 / 17

slide-2
SLIDE 2

Introduction

  • K. Dong (Cornell University)

Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 2 / 17

Topic Modeling

  • Idea: Represent documents as combination of topics.
  • Advantages:
  • Low-dimensional representation of documents
  • Uncover hidden structure from large collections
  • Applications:
  • Summarizing documents with the topics
  • Clustering documents by similarity in topics
slide-3
SLIDE 3

Joint Stochastic Matrix Factorization

  • K. Dong (Cornell University)

Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 3 / 17

Co-occurrence Matrix

  • The relationships between words can be more revealing

than the words themselves.

C ≈ BABT

  • C ∈ Rn×n - Word-Word Matrix. Cij = p(X1 = i, X2 = j)
  • A ∈ Rk×k - Topic-Topic Matrix. Akℓ = p(Z1 = k, Z2 = ℓ)
  • B ∈ Rn×k - Word-Topic Matrix. Bik = p(X = i|Z = k)
slide-4
SLIDE 4

What We Observe

  • K. Dong (Cornell University)

Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 4 / 17

slide-5
SLIDE 5

Anchor Word

  • K. Dong (Cornell University)

Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 5 / 17

  • Separability:The word-topic matrix B is

p-separable if for each topic k there is some word i such that Ai,k ≥ p and Ai,ℓ = 0 for ℓ = k

  • Every topic k has an anchor word i exclusive

to it.

  • Documents containing anchor word i must

contain topic k.

slide-6
SLIDE 6

Anchor Word Algorithm

  • K. Dong (Cornell University)

Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 6 / 17

  • Under this assumption, Arora et al.(2013) showed Anchor

Word Algorithm computes this decomposition in polynomial time.

  • Use QR with row-pivoting after random projection on C.

Choose the points that are farthest away from each other.

  • However, it fails to produce doubly nonnegative topic-topic

matrix.

  • It tends to choose rare words as anchors and generate less

meaningful topics.

slide-7
SLIDE 7

Probablistic Structure

  • K. Dong (Cornell University)

Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 7 / 17

  • For m-th document with nm words, we view it as nm(nm − 1) pairs.
  • Generate a distribution A over pairs of topics with parameter α.
  • Sample two topics (z1, z2) ∼ A.
  • Sample actual word-pair (x1, x2) ∼ (Bz1, Bz2).
slide-8
SLIDE 8

Statistical Structure

  • K. Dong (Cornell University)

Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 8 / 17

  • Let f (α) be a distribution of topic-distributions.
  • Documents are M i.i.d. samples {W1, · · · , Wm} ∼ f (α).
  • Let the posterior topic-topic matrix A∗

M = 1 M

M

m=1 WmW T m and

the expectation A∗ = E[WmW T

m ]. A∗ M → A∗ as M → ∞.

  • Let posterior word-word matrix C ∗

m = BWmW T m BT and

C ∗ = 1

M

M

m=1 C ∗ m.

  • Let C be the noisy observation for all samples.

C → E[C] = C ∗ = BA∗

MBT → BA∗B

  • A∗

M, A∗ ∈ DNN K and C ∗ ∈ DNN N.

slide-9
SLIDE 9

Generating Co-occurrence C

  • K. Dong (Cornell University)

Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 9 / 17

  • Let Hm be the vector of word counts for m-th document and

Wm be the latent topic distribution.

  • Let pm = BWm, and we assume Hm ∼ Multi(nm, pm).
  • E[Hm] = nmpm = nmBWm and

Cov(Hm) = nm(diag(pm) − pmpT

m).

  • Let co-occurrence Cm = HmHT

m −diag(Hm)

nm(nm−1)

.

  • E[Cm|Wm] = C ∗

m so E[C|W ] = C ∗.

slide-10
SLIDE 10

Rectifying Co-occurrence C

  • K. Dong (Cornell University)

Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 10 / 17

  • In reality C could still mismatch C ∗ because of model

assumption violation and limited data.

  • We can rectify C into low-rank, doubly non-negative and

joint-stochastic by Alternating Projection (Dykstra’s Algorithm).

  • PSDNK (C) = UΛ+

KUT

  • NORN(C) = C +

1−

i,j Cij

N2

11T

  • NN N(C) = max{C, 0}
slide-11
SLIDE 11

Finding Anchor Words

  • K. Dong (Cornell University)

Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 11 / 17

  • Use a column-pivoting QR algorithm to greedily find topics

farthest away from each other.

  • Exploit sparsity and avoid using random projection.
slide-12
SLIDE 12

Recovering Word-Topic Matrix B

  • K. Dong (Cornell University)

Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 12 / 17

  • If we row-normalize C to get C, C ij = p(w2 = j|w1 = i).
  • Under separability assumption,

C sk,j =

  • k′

p(z1 = k′|w1 = sk)p(w2 = j|z1 = k′) = p(w2 = j|z1 = k)

  • The row-space of C lies in the convex hull of C sk rows.

C ij =

  • k

p(z1 = k|w1 = i)p(w2 = j|z1 = k) =

  • k

QikC sk,j

  • Find Qik through NNLS and infer Bik with Bayes’ rule.
slide-13
SLIDE 13

Example of Recovered Topics

  • K. Dong (Cornell University)

Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 13 / 17

slide-14
SLIDE 14

Recovering Topic-Topic Matrix A

  • K. Dong (Cornell University)

Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 14 / 17

slide-15
SLIDE 15

Conclusion

  • K. Dong (Cornell University)

Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 15 / 17

  • This algorithm can handle noisy co-occurrence by rectification.
  • It produces quality anchor words and topics, even when sample

size is small.

  • Preserve the structure of the decomposition under our

assumption.

slide-16
SLIDE 16

Citation

Sanjeev Arora, Rong Ge, Yonatan Halpern, David M Mimno, Ankur Moitra, David Sontag, Yichen Wu, and Michael Zhu. A practical algorithm for topic modeling with provable guarantees. Moontae Lee, David Bindel, and David Mimno. Robust spectral inference for joint stochastic matrix factorization. In Advances in Neural Information Processing Systems, pages 2710–2718, 2015.

  • K. Dong (Cornell University)

Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 16 / 17

slide-17
SLIDE 17

Thank you!

  • K. Dong (Cornell University)

Robust Spectral Inference for Joint Stochastic Matrix Factorization October 20, 2016 17 / 17