Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral - - PowerPoint PPT Presentation

โ–ถ
part iii latent tree models
SMART_READER_LITE
LIVE PREVIEW

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral - - PowerPoint PPT Presentation

Spectral Algorithms for Latent Variable Models Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent Variable Models, Edinburgh, UK Joint work with Mariya Ishteva, Ankur Parikh, Eric Xing, Byron Boots , Geoff


slide-1
SLIDE 1

Spectral Algorithms for Latent Variable Models Part III: Latent Tree Models

ICML 2012 Tutorial on Spectral Algorithms for Latent Variable Models, Edinburgh, UK

Joint work with Mariya Ishteva, Ankur Parikh, Eric Xing, Byron Boots , Geoff Gordon, Alex Smola and Kenji Fukumizu

Le Song

slide-2
SLIDE 2

Graphical model: nodes represent variables, edges represent conditional independence relation Latent tree graphical models: latent and observed variables are arranged in a tree structure Many real world applications, eg., time-series prediction, topic modeling

Latent Tree Graphical Models

2

Latent Variable Observed Variable Latent Tree

๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ7 ๐‘Œ4 ๐‘Œ5 ๐‘Œ6 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

Hidden Markov Model

๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ4 ๐‘Œ5 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10 ๐‘Œ7 ๐‘Œ11 ๐‘Œ12 ๐‘Œ6

slide-3
SLIDE 3

Scope of This Tutorial

Estimating marginal probability of the observed variables

Spectral HMMs (Hsu et al. COLTโ€™09) Kernel spectral HMMs (Song et al. ICMLโ€™10) Spectral latent tree (Parikh et al. ICMLโ€™11, Song et al. NIPSโ€™11) Spectral dimensional reduction for HMMs (Foster et al. Arxiv) More recent: Cohen et al. ACLโ€™12, Balle et al. ICMLโ€™12

Estimating latent parameters

PCA approach (Mossel & Roch AOAPโ€™06) PCA and SVD approach, (Anandkumar et al. COLTโ€™12, Arxiv)

Estimating the structure of latent variable models

Recursive grouping (Choi et al. JMLRโ€™11) Spectral short quartet (Anandkumar et al. NIPSโ€™11)

3

slide-4
SLIDE 4

Exponential number of entries in ๐‘„ ๐‘Œ1, ๐‘Œ2, โ€ฆ , ๐‘Œ6

Discrete variable taking ๐‘œ possible values, ๐‘„ has ๐‘ƒ(๐‘œ6) entries!

Latent tree reduces the number of parameters

๐‘„ ๐‘Œ10 ๐‘„ ๐‘ฆ7 ๐‘ฆ10 ๐‘„ ๐‘Œ1 ๐‘ฆ7 ๐‘„ ๐‘Œ2 ๐‘ฆ7 ๐‘„ ๐‘ฆ8 ๐‘ฆ10 ๐‘„ ๐‘Œ3 ๐‘ฆ8 ๐‘„(๐‘Œ4|๐‘ฆ8) ๐‘„ ๐‘ฆ9 ๐‘ฆ10 ๐‘„ ๐‘Œ5 ๐‘ฆ9 ๐‘„(๐‘Œ6|๐‘ฆ9)

Challenge of Estimating Marginal of Observed Variables

4

๐‘„ ๐‘Œ1, ๐‘Œ2, โ€ฆ , ๐‘Œ6 =

๐‘ฆ7,๐‘ฆ8,๐‘ฆ9,๐‘ฆ10

๐‘„ ๐‘Œ1, ๐‘Œ2, โ€ฆ , ๐‘Œ6, ๐‘ฆ7, โ€ฆ , ๐‘ฆ10

๐‘ฆ7,๐‘ฆ8,๐‘ฆ9,๐‘ฆ10

๐‘ƒ ๐‘œ params ๐‘ƒ 3๐‘œ2 params

Latent tree has ๐‘ƒ 9๐‘œ2 params Significant saving!

๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ7 ๐‘Œ4 ๐‘Œ5 ๐‘Œ6 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

slide-5
SLIDE 5

EM Algorithm for Parameter Estimation

Do not observe latent variables, need to estimate the corresponding parameters, eg., ๐‘„(๐‘Œ7|๐‘Œ10) and ๐‘„ ๐‘Œ1 ๐‘Œ7 Expectation maximization: maximize likelihood of observations

max ๐‘„(๐‘ฆ1

๐‘—, โ€ฆ , ๐‘ฆ6 ๐‘—) ๐‘› ๐‘—=1

Drawback: local maxima, slow to converge, difficult to analyze

5 ๐‘ฆ1

1

๐‘ฆ2

1

๐‘ฆ3

1

๐‘ฆ4

1

๐‘ฆ5

1

๐‘ฆ6

1

๐‘— = 1

๐‘ฆ1

๐‘›

๐‘ฆ2

๐‘›

๐‘ฆ3

๐‘›

๐‘ฆ4

๐‘›

๐‘ฆ5

๐‘›

๐‘ฆ6

๐‘›

๐‘— = ๐‘›

โ€ฆ โ€ฆ โ€ฆ

๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ7 ๐‘Œ4 ๐‘Œ5 ๐‘Œ6 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

Goal of spectral algorithm: Estimate the marginal in local-minimum-free fashion

slide-6
SLIDE 6

Key Features of Spectral Algorithms

Represent joint probability table of observed variables with low rank factorization, without using the joint table in the computation!

  • Eg. ๐‘„ 1,โ€ฆ,๐‘’ ; ๐‘’+1 ,โ€ฆ,2๐‘’ = ๐‘†๐‘“๐‘กโ„Ž๐‘๐‘ž๐‘“(๐‘„ ๐‘Œ1, โ€ฆ , ๐‘Œ2๐‘’ , 1, โ€ฆ , ๐‘’ )

6

๐‘„ 1,โ€ฆ,๐‘’ ; ๐‘’+1 ,โ€ฆ,2๐‘’

๐‘œ๐‘’ ๐‘œ๐‘’

  • Represent it by low rank factors

to avoid exponential blowup

  • Use clever decomposition

technique to avoid directly using all entries from the table

  • Use singular value decomposition
slide-7
SLIDE 7

Tensor View of Marginal Probability

Marginal probability table ๐“ค = ๐‘„ ๐‘Œ1, ๐‘Œ2, โ€ฆ , ๐‘Œ6

Discrete variable taking ๐‘œ possible values 1, โ€ฆ , ๐‘œ

6-way table, or 6th order tensor Dimension labeled by the variable

Value of the variable is the index to the corresponding dimension, need 6 indexes to access a single entry

๐‘„(๐‘Œ1 = 1, ๐‘Œ2 = 4, โ€ฆ , ๐‘Œ6 = 3) is the entry ๐“ค[1,4, โ€ฆ , 3]

7

๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ7 ๐‘Œ4 ๐‘Œ5 ๐‘Œ6 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

Hidden Markov Model

๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ4 ๐‘Œ5 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10 ๐‘Œ7 ๐‘Œ11 ๐‘Œ12 ๐‘Œ6

Latent Tree

Running Examples:

slide-8
SLIDE 8

Reshaping Tensor into Matrices

๐‘ˆ = ๐‘†๐‘“๐‘กโ„Ž๐‘๐‘ž๐‘“ ๐“ค, ๐’Ÿ : multi-index ๐’Ÿ mapped into row index, and the remaining indexes into column index

  • Eg. ๐“ค = ๐‘„ ๐‘Œ1, ๐‘Œ2, ๐‘Œ3 , a 3rd order tensor and ๐‘œ = 3

๐‘„ 2 ;{1,3} = ๐‘†๐‘“๐‘กโ„Ž๐‘๐‘ž๐‘“ ๐“ค, {2} turns the dimension of ๐‘Œ2 into row

8

๐“ค

๐‘Œ1 ๐‘Œ2 ๐‘Œ3

๐‘ˆ =

๐‘Œ2 ๐‘Œ1 ๐‘Œ3 = 1 ๐‘Œ3 = 2 ๐‘Œ3 = 3

Slice at dimension of ๐‘Œ3

๐‘Œ2 ๐‘Œ1

slide-9
SLIDE 9

๐‘Œ4 ๐‘Œ5 ๐‘Œ6

1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3

๐‘Œ3 ๐‘Œ2 ๐‘Œ1

Reshaping 6th Order Tensor

๐‘ˆ = ๐‘„ 1,2,3 ;{4,5,6} = ๐‘†๐‘“๐‘กโ„Ž๐‘๐‘ž๐‘“(๐‘„ ๐‘Œ1, โ€ฆ , ๐‘Œ6 , 1,2,3 )

9

Each entry is the probability

  • f a unique

assignment to ๐‘Œ1, โ€ฆ , ๐‘Œ6 ๐‘„(2,3,1,2,1,2)

slide-10
SLIDE 10

Reshaping according to Latent Tree Structure

For marginal ๐“  = ๐‘„ ๐‘Œ1, ๐‘Œ2, โ€ฆ , ๐‘Œ6 of a latent tree model, reshape it according to the edges in the tree ๐‘„ 1 ;{2,3,4,5,6} = ๐‘†๐‘“๐‘กโ„Ž๐‘๐‘ž๐‘“(๐“ , 1 ) ๐‘„ 1,2 ;{3,4,5,6} = ๐‘†๐‘“๐‘กโ„Ž๐‘๐‘ž๐‘“(๐“ , 1,2 ) ๐‘„ 1,2,3,4 ;{5,6} = ๐‘†๐‘“๐‘กโ„Ž๐‘๐‘ž๐‘“(๐“ , 1,2,3,4 )

10

๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ7 ๐‘Œ4 ๐‘Œ5 ๐‘Œ6 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

๐‘„ 1 ;{2,3,4,5,6}

๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ7 ๐‘Œ4 ๐‘Œ5 ๐‘Œ6 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

๐‘„ 1,2 ;{3,4,5,6}

๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ7 ๐‘Œ4 ๐‘Œ5 ๐‘Œ6 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

๐‘„ 1,2,3,4 ;{5,6}

slide-11
SLIDE 11

Low Rank Structure after Reshaping

Size of ๐‘„ 1,2 ;{3,4,5,6} is ๐‘œ2 ร— ๐‘œ4, but its rank is just ๐‘œ ๐‘„ ๐‘Œ1, ๐‘Œ2, โ€ฆ , ๐‘Œ6 = Use matrix multiplications to express summation over ๐‘Œ7, ๐‘Œ10

๐‘„ 1,2 ;{3,4,5,6} = ๐‘„ 1,2 | 7 ๐‘„ 7 ;{10} ๐‘„ 3,4,5,6 |{10}

โŠค

๐‘„ 1,2 | 7 โ‰” ๐‘†๐‘“๐‘กโ„Ž๐‘๐‘ž๐‘“(๐‘„ ๐‘Œ1, ๐‘Œ2 ๐‘Œ7 , 1,2 ) ๐‘„ 3,4,5,6 | 10 โ‰” ๐‘†๐‘“๐‘กโ„Ž๐‘๐‘ž๐‘“(๐‘„ ๐‘Œ3, ๐‘Œ4, ๐‘Œ5, ๐‘Œ6 ๐‘Œ10 , 3,4,5,6 )

11

๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ7 ๐‘Œ4 ๐‘Œ5 ๐‘Œ6 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

๐‘„ 1,2 ;{3,4,5,6} ๐‘„ ๐‘ฆ7, ๐‘ฆ10

๐‘ฆ7,๐‘ฆ10

๐‘„ ๐‘Œ1, ๐‘Œ2 ๐‘ฆ7

๐‘„(๐‘Œ3, ๐‘Œ4, ๐‘Œ5, ๐‘Œ6|๐‘ฆ10)

๐‘œ2 ๐‘œ ๐‘œ ๐‘œ ๐‘œ ๐‘œ4 ๐‘„ 1,2 ;{3,4,5,6} ๐‘œ2 ๐‘œ4

=

๐‘„ 7 ;{10}

slide-12
SLIDE 12

Low Rank Structure of Latent Tree Model

๐‘„ 3,4 ;{1,2,5,6} = ๐‘„ 3,4 | 8 ๐‘„ 8 ;{10} ๐‘„ 1,2,5,6 |{10}

โŠค

๐‘„ 1 ;{2,3,4,5,6} = ๐‘„ 1 | 7 ๐‘„ 7 ;{7} ๐‘„ 2,3,4,5,6 |{7}

โŠค

12

๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ7 ๐‘Œ4 ๐‘Œ5 ๐‘Œ6 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

๐‘œ2 ๐‘œ ๐‘œ ๐‘œ ๐‘œ ๐‘œ4 ๐‘œ2 ๐‘œ4

=

๐‘œ ๐‘œ ๐‘œ ๐‘œ5

=

๐‘œ ๐‘œ5 ๐‘œ ๐‘œ

All these reshapings are low rank, and with rank ๐‘œ

slide-13
SLIDE 13

Low Rank Structure of Hidden Markov Models

๐‘„ 1,2 ;{3,4,5,6} = ๐‘„ 1,2 | 8 ๐‘„ 8 ;{9} ๐‘„ 3,4,5,6 |{9}

โŠค

๐‘„ 1,2,3 ;{4,5,6} = ๐‘„ 1,2,3 | 9 ๐‘„ 9 ;{10} ๐‘„ 4,5,6 |{10}

โŠค

13

๐‘œ3 ๐‘œ ๐‘œ ๐‘œ ๐‘œ ๐‘œ3 ๐‘œ3 ๐‘œ3

=

๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ4 ๐‘Œ5 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10 ๐‘Œ7 ๐‘Œ11 ๐‘Œ12 ๐‘Œ6

๐‘œ2 ๐‘œ ๐‘œ ๐‘œ ๐‘œ ๐‘œ4 ๐‘œ2 ๐‘œ4

=

slide-14
SLIDE 14

Key Features of Spectral Algorithms

Represent joint probability table of observed variables with low rank factorization, without using the joint table in the computation!

  • Eg. ๐‘„ 1,โ€ฆ,๐‘’ ; ๐‘’+1 ,โ€ฆ,2๐‘’ = ๐‘†๐‘“๐‘กโ„Ž๐‘๐‘ž๐‘“(๐‘„ ๐‘Œ1, โ€ฆ , ๐‘Œ2๐‘’ , 1, โ€ฆ , ๐‘’ )

14

๐‘„ 1,โ€ฆ,๐‘’ ; ๐‘’+1 ,โ€ฆ,2๐‘’

๐‘œ๐‘’ ๐‘œ๐‘’

  • Represent it by low rank factors

to avoid exponential blowup

  • Use clever decomposition

technique to avoid directly using all entries from the table

  • Use singular value decomposition
slide-15
SLIDE 15

Key Theorem

๐‘„ will be the reshaped joint probability table ๐ต and ๐ถ will be marginalization operator Theorem 1 will be applied recursively Recover several existing spectral algorithms as special cases

15

Theorem 1: ๐‘„: ๐‘ก๐‘—๐‘จ๐‘“ ๐‘› ร— ๐‘œ, ๐‘ ๐‘๐‘œ๐‘™ ๐‘™ ๐ต: ๐‘ก๐‘—๐‘จ๐‘“ ๐‘œ ร— ๐‘™, ๐‘ ๐‘๐‘œ๐‘™ ๐‘™ ๐ถ: ๐‘ก๐‘—๐‘จ๐‘“ ๐‘™ ร— ๐‘›, ๐‘ ๐‘๐‘œ๐‘™ ๐‘™ ๐ฝ๐‘” ๐ถ๐‘„๐ต ๐‘—๐‘œ๐‘ค๐‘“๐‘ ๐‘ข๐‘—๐‘๐‘š๐‘“, ๐‘ขโ„Ž๐‘“๐‘œ ๐‘„ = ๐‘„๐ต ๐ถ๐‘„๐ต โˆ’1 ๐ถ๐‘„

slide-16
SLIDE 16

Marginalization Operator A and B

Compute the marginal probability of a subset of variables can be expressed as matrix product ๐‘„ ๐‘Œ1, ๐‘Œ2, ๐‘Œ3, ๐‘Œ4 = ๐‘„ ๐‘Œ1, ๐‘Œ2, ๐‘Œ3, ๐‘Œ4, ๐‘ฆ5, ๐‘ฆ6

๐‘ฆ5,๐‘ฆ6

๐‘„ 1,2,3 ; 4 = ๐‘„ 1,2,3 ;{4,5,6}๐ต, where ๐ต = 1๐‘œ โŠ— 1๐‘œ โŠ— ๐ฝ๐‘œ

16

๐ต

=

๐‘œ3 ๐‘œ ๐‘œ3 ๐‘œ3 ๐‘œ ๐‘œ3 ๐ฝ ๐ต ๐‘œ3 ๐‘œ

= โŠ— โŠ—

๐‘œ ๐‘œ 1 1 ๐‘œ ๐‘œ ๐‘œ2 1

slide-17
SLIDE 17

=

Zoom into Marginalization Operation

17

๐‘Œ4 ๐‘Œ5 ๐‘Œ6

1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3

๐‘„ 1,2,3 ;{4,5,6} ๐‘„ 1,2,3 ;{4} 13 โŠ— 13 โŠ— ๐ฝ3

slide-18
SLIDE 18

Apply Theorem 1 to Latent Tree Model

Let

๐‘„ = ๐‘„ 1,2 ;{3,4,5,6} ๐ต = 1๐‘œ โŠ— 1๐‘œ โŠ— 1๐‘œ โŠ— ๐ฝ๐‘œ ๐ถ = ๐ฝ๐‘œ โŠ— 1๐‘œ โŠค

Then

๐‘„ 1,2 ; 3,4,5,6 ๐ต = ๐‘„ 1,2 ;{3} ๐ถ๐‘„ 1,2 ; 3,4,5,6 = ๐‘„ 2 ;{3,4,5,6} ๐ถ๐‘„ 1,2 ; 3,4,5,6 ๐ต = ๐‘„ 2 ;{3} Finally use ๐‘„ = ๐‘„๐ต

๐ถ๐‘„๐ต โˆ’1 ๐ถ๐‘„

๐‘„ 1,2 ;{3,4,5,6} = ๐‘„ 1,2 ;{3}๐‘„ 2 ; 3

โˆ’1

๐‘„ 2 ;{3,4,5,6}

18

๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ7 ๐‘Œ4 ๐‘Œ5 ๐‘Œ6 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

๐‘„ 1,2 ;{3,4,5,6}

slide-19
SLIDE 19

Latent Tree Decomposition

๐‘„ 1,2 ;{3,4,5,6} = ๐‘„ 1,2 ;{3}๐‘„ 2 ; 3

โˆ’1

๐‘„ 2 ;{3,4,5,6}

19

๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ7 ๐‘Œ4 ๐‘Œ5 ๐‘Œ6 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

๐‘„ 1,2 ;{3,4,5,6}

๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ7 ๐‘Œ8 ๐‘Œ10

๐‘„ 1,2 ;{3}

๐‘Œ2 ๐‘Œ3 ๐‘Œ7 ๐‘Œ8 ๐‘Œ10

๐‘„ 2 ; 3

โˆ’1

Decompose:

๐‘Œ3 ๐‘Œ4 ๐‘Œ5 ๐‘Œ6 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

๐‘„ 2 ;{3,4,5,6}

๐‘Œ2 ๐‘Œ7

slide-20
SLIDE 20

Apply Theorem 1 to Hidden Markov Models

Let

๐‘„ = ๐‘„ 1,2,3 ;{4,5,6} ๐ต = 1๐‘œ โŠ— 1๐‘œ โŠ— ๐ฝ๐‘œ ๐ถ = ๐ฝ๐‘œ โŠ— 1๐‘œ โŠ— 1๐‘œ โŠค

Then

๐‘„ 1,2,3 ; 4,5,6 ๐ต = ๐‘„ 1,2,3 ;{4} ๐ถ๐‘„ 1,2,3 ; 4,5,6 = ๐‘„ 3 ;{4,5,6} ๐ถ๐‘„ 1,2,3 ; 4,5,6 ๐ต = ๐‘„ 3 ;{4} Finally use ๐‘„ = ๐‘„๐ต

๐ถ๐‘„๐ต โˆ’1 ๐ถ๐‘„

๐‘„ 1,2,3 ;{4,5,6} = ๐‘„ 1,2,3 ;{4}๐‘„ 3 ; 4

โˆ’1

๐‘„ 3 ;{4,5,6}

20

๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ4 ๐‘Œ5 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10 ๐‘Œ7 ๐‘Œ11 ๐‘Œ12 ๐‘Œ6

๐‘„ 1,2,3 ;{4,5,6}

slide-21
SLIDE 21

Hidden Markov Model Decomposition

๐‘„ 1,2,3 ;{4,5,6} = ๐‘„ 1,2,3 ;{4}๐‘„ 3 ; 4

โˆ’1

๐‘„ 3 ;{4,5,6}

21

๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ4 ๐‘Œ5 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10 ๐‘Œ7 ๐‘Œ11 ๐‘Œ12 ๐‘Œ6

๐‘„ 1,2,3 ;{4,5,6}

๐‘Œ3 ๐‘Œ4 ๐‘Œ5 ๐‘Œ9 ๐‘Œ10 ๐‘Œ11 ๐‘Œ12 ๐‘Œ6

๐‘„ 3 ;{4,5,6}

๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ4 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10 ๐‘Œ7

๐‘„ 1,2,3 ;{4}

๐‘Œ3 ๐‘Œ4 ๐‘Œ9 ๐‘Œ10

๐‘„ 3 ; 4

โˆ’1

Decompose:

slide-22
SLIDE 22

Recursive Decomposition of Latent Tree

22

๐‘Œ7 ๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ4 ๐‘Œ5 ๐‘Œ6 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

๐‘„ 1,2 ;{3,4,5,6}

๐‘Œ7 ๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ8 ๐‘Œ10

๐‘„ 1,2 ;{3}

๐‘Œ7 ๐‘Œ2 ๐‘Œ3 ๐‘Œ8 ๐‘Œ10

๐‘„ 2 ; 3

โˆ’1

๐‘Œ7 ๐‘Œ2 ๐‘Œ3 ๐‘Œ4 ๐‘Œ5 ๐‘Œ6 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

๐‘„ 2 ;{3,4,5,6}

๐‘Œ7 ๐‘Œ2 ๐‘Œ3 ๐‘Œ4 ๐‘Œ5 ๐‘Œ6 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

๐‘„ 2,3,4 ;{5,6}

Reshape

๐‘Œ7 ๐‘Œ2 ๐‘Œ3 ๐‘Œ4 ๐‘Œ5 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

๐‘„ 2,3,4 ;{5}

๐‘Œ4 ๐‘Œ5 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

๐‘„ 4 ; 5

โˆ’1

๐‘Œ4 ๐‘Œ5 ๐‘Œ6 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

๐‘„ 4 ;{5,6}

Reshape

๐‘Œ7 ๐‘Œ2 ๐‘Œ3 ๐‘Œ4 ๐‘Œ5 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

๐‘„ 3,4 ;{2,5}

๐‘Œ7 ๐‘Œ2 ๐‘Œ3 ๐‘Œ4 ๐‘Œ8 ๐‘Œ10

๐‘„ 3,4 ;{2}

๐‘Œ7 ๐‘Œ2 ๐‘Œ4 ๐‘Œ8 ๐‘Œ10

๐‘„ 4 ; 2

โˆ’1

๐‘Œ7 ๐‘Œ2 ๐‘Œ4 ๐‘Œ5 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

๐‘„ 4 ;{2,5}

slide-23
SLIDE 23

Recursive Decomposition of HMM

23

๐‘Œ1 ๐‘Œ2 ๐‘Œ4 ๐‘Œ5 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10 ๐‘Œ7 ๐‘Œ11 ๐‘Œ12 ๐‘Œ6

๐‘„ 1,2,3 ;{4,5,6} = ๐‘„ 1,2,3 ;{4}๐‘„ 3 ; 4

โˆ’1

๐‘„ 3 ;{4,5,6} ๐‘„ 1,2,3 ;{4,5,6} ๐‘„ 1,2,3 ;{4}

๐‘Œ1 ๐‘Œ2 ๐‘Œ4 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10 ๐‘Œ7

๐‘„ 3 ;{4,5,6}

๐‘Œ4 ๐‘Œ5 ๐‘Œ9 ๐‘Œ10 ๐‘Œ11 ๐‘Œ12 ๐‘Œ6

๐‘„ 3 ; 4

โˆ’1

๐‘Œ4 ๐‘Œ9 ๐‘Œ10

๐‘„ 1,2 ;{3,4}

Reshape

๐‘Œ1 ๐‘Œ2 ๐‘Œ4 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10 ๐‘Œ7

๐‘„ 3,4 ;{5,6}

Reshape

๐‘Œ4 ๐‘Œ5 ๐‘Œ9 ๐‘Œ10 ๐‘Œ11 ๐‘Œ12 ๐‘Œ6

๐‘„ 1,2 ;{3}

๐‘Œ1 ๐‘Œ2 ๐‘Œ8 ๐‘Œ9 ๐‘Œ7

๐‘„ 2 ;{3,4}

๐‘Œ2 ๐‘Œ4 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

๐‘„ 2 ; 3

โˆ’1

๐‘Œ2 ๐‘Œ8 ๐‘Œ9

๐‘„ 1,2 ;{3,4} = ๐‘„ 1,2 ;{3}๐‘„ 3 ; 4

โˆ’1

๐‘„ 3 ;{4,5}

๐‘„ 4 ; 5

โˆ’1

๐‘Œ4 ๐‘Œ5 ๐‘Œ10 ๐‘Œ11

๐‘„ 3,4 ;{5}

๐‘Œ4 ๐‘Œ5 ๐‘Œ9 ๐‘Œ10 ๐‘Œ11

๐‘„ 4 ;{5,6}

๐‘Œ4 ๐‘Œ5 ๐‘Œ10 ๐‘Œ11 ๐‘Œ12 ๐‘Œ6

๐‘„ 1,2 ;{3,4} = ๐‘„ 1,2 ;{3}๐‘„ 3 ; 4

โˆ’1

๐‘„ 3 ;{4,5,6}

slide-24
SLIDE 24

One Entries in Joint Probability Table of HMM

Fix some observations

Fix ๐‘Œ3 = ๐‘ฆ3, ๐‘„ 2 ;๐‘ฆ3;{4} โ‰” ๐‘„ ๐‘Œ2, ๐‘ฆ3, ๐‘Œ4 is a matrix Fix ๐‘Œ2= ๐‘ฆ2, ๐‘Œ2= ๐‘ฆ2, ๐‘„๐‘ฆ1;๐‘ฆ2;{3} โ‰” ๐‘„ ๐‘ฆ1, ๐‘ฆ2, ๐‘Œ3 is a vector

๐‘„ ๐‘ฆ1, ๐‘ฆ2, ๐‘ฆ3, ๐‘ฆ4, ๐‘ฆ5, ๐‘ฆ6 = ๐‘„๐‘ฆ1;๐‘ฆ2;{3} ๐‘„ 2 ; 3

โˆ’1

๐‘„ 2 ;๐‘ฆ3;{4}๐‘„ 3 ; 4

โˆ’1

๐‘„ 3 ;๐‘ฆ4;{5}๐‘„ 4 ; 5

โˆ’1

๐‘„{4};๐‘ฆ5;๐‘ฆ6

24

๐‘„ 3 ; 4

โˆ’1

๐‘Œ4 ๐‘Œ9 ๐‘Œ10

๐‘„ 1,2 ;{3}

๐‘Œ1 ๐‘Œ2 ๐‘Œ8 ๐‘Œ9 ๐‘Œ7

๐‘„ 2 ;{3,4}

๐‘Œ2 ๐‘Œ4 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

๐‘„ 2 ; 3

โˆ’1

๐‘Œ2 ๐‘Œ8 ๐‘Œ9

๐‘„ 4 ; 5

โˆ’1

๐‘Œ4 ๐‘Œ5 ๐‘Œ10 ๐‘Œ11

๐‘„ 3,4 ;{5}

๐‘Œ4 ๐‘Œ5 ๐‘Œ9 ๐‘Œ10 ๐‘Œ11

๐‘„ 4 ;{5,6}

๐‘Œ4 ๐‘Œ5 ๐‘Œ10 ๐‘Œ11 ๐‘Œ12 ๐‘Œ6

slide-25
SLIDE 25

๐‘„ ๐‘ฆ1, ๐‘ฆ2, ๐‘ฆ3, ๐‘ฆ4, ๐‘ฆ5, ๐‘ฆ6 = ๐‘„๐‘ฆ1;๐‘ฆ2;{3} ๐‘„ 2 ; 3

โˆ’1

๐‘„ 2 ;๐‘ฆ3;{3}๐‘„ 3 ; 4

โˆ’1

๐‘„ 4 ;๐‘ฆ4;{5}๐‘„ 4 ; 5

โˆ’1

๐‘„{4};๐‘ฆ5;๐‘ฆ6 Introduce variable ๐‘Œ0 into ๐‘„๐‘ฆ1;๐‘ฆ2;{3}

= 1โŠค๐‘„ 0 ;๐‘ฆ1;{1}๐‘„ 1 ; 2

โˆ’1

๐‘„{1};๐‘ฆ2;{3} = 1โŠค๐‘„ 0 ;{1}๐‘„ 0 ; 1

โˆ’1

๐‘„ 0 ;๐‘ฆ1;{1}๐‘„ 1 ; 2

โˆ’1

๐‘„{1};๐‘ฆ2;{3} = ๐‘„ 1

โŠค ๐‘„ 0 ; 1 โˆ’1

๐‘„ 0 ;๐‘ฆ1;{1}๐‘„ 1 ; 2

โˆ’1

๐‘„{1};๐‘ฆ2;{3}

Do similar things to ๐‘„{6};๐‘ฆ5;๐‘ฆ6 Assume time homogeneous

๐‘„ 0 ; 1

โˆ’1

= ๐‘„ 1 ; 2

โˆ’1

, ๐‘„{1,2};{3} = ๐‘„{2,3};{4}

Connection to Foster et al.

25

๐‘ฆ1 ๐‘ฆ2 ๐‘Œ3 ๐‘Œ8 ๐‘Œ7 ๐‘Œ9 ๐‘Œ0 ๐‘ฆ1 ๐‘ฆ2 ๐‘Œ3 ๐‘Œ8 ๐‘Œ7 ๐‘Œ9 ๐ผ

slide-26
SLIDE 26

What if Hidden State ๐‘™ < Observed State ๐‘œ

26

Let ๐‘„ = ๐‘„ 1,2,3 ;{4,5,6}, ๐ต = 1๐‘œ โŠ— 1๐‘œ โŠ— ๐ฝ๐‘œ, ๐ถ = ๐ฝ๐‘œ โŠ— 1๐‘œ โŠ— 1๐‘œ โŠค. Use ๐‘„ = ๐‘„๐ต ๐ถ๐‘„๐ต โˆ’1 ๐ถ๐‘„

๐‘„ 1,2,3 ;{4,5,6} = ๐‘„ 1,2,3 ;{4}๐‘„ 3 ; 4

โˆ’1

๐‘„ 3 ;{4,5,6}

๐‘„ 3 ; 4 of size ๐‘œ ร— ๐‘œ has rank ๐‘™ and not invertible!

Singular Value decomposition of ๐‘„ 3 ;{4} = ๐‘‰๐‘™ฮฃ๐‘™๐‘Š

๐‘™ โŠค

Solution: Use further projection such that (๐ถ๐‘„๐ต) is invertible

Let ๐ต = 1๐‘œ โŠ— 1๐‘œ โŠ— ๐ฝ๐‘œ ๐‘Š

๐‘™, ๐ถ = ๐‘‰๐‘™ โŠค ๐ฝ๐‘œ โŠ— 1๐‘œ โŠ— 1๐‘œ โŠค

๐‘„ 1,2,3 ;{4,5,6} = ๐‘„ 1,2,3 ;{4}๐‘Š

๐‘™ ๐‘‰๐‘™ โŠค๐‘„ 3 ; 4 ๐‘Š ๐‘™ โˆ’1๐‘‰๐‘™ โŠค๐‘„ 3 ;{4,5,6}

๐‘„ 3 ; 4

โˆ’1

๐‘Œ4 ๐‘Œ9 ๐‘Œ10

slide-27
SLIDE 27

Connection to Hsu et al.

Two equivalent forms of applying further projection ๐‘‰๐‘™ and ๐‘Š

๐‘™

๐‘„ 3 ;{4} = ๐‘‰๐‘™ฮฃ๐‘™๐‘Š

๐‘™ โŠค

๐‘‰๐‘™

โŠค๐‘„ 3 ; 4 ๐‘Š ๐‘™ โˆ’1๐‘‰๐‘™ โŠค๐‘„ 3 ;{4,5,6} = ๐‘„ 3 ; 4 ๐‘Š ๐‘™ โ€ ๐‘„ 3 ;{4,5,6}

๐‘„ ๐‘ฆ1, ๐‘ฆ2, ๐‘ฆ3, ๐‘ฆ4, ๐‘ฆ5, ๐‘ฆ6 = ๐‘„๐‘ฆ1;๐‘ฆ2; 3 ๐‘Š

๐‘™ ๐‘„ 2 ; 3 โˆ’1

๐‘Š

๐‘™ โ€ ๐‘„ 2 ;๐‘ฆ3; 4 ๐‘Š ๐‘™

๐‘„ 3 ; 4

โˆ’1

๐‘Š

๐‘™ โ€ ๐‘„ 4 ;๐‘ฆ4; 5 ๐‘Š ๐‘™ ๐‘„ 4 ; 5 โˆ’1

๐‘Š

๐‘™ โ€ ๐‘„{4};๐‘ฆ5;๐‘ฆ6

๐‘1

โŠค๐ถ๐‘ฆ1 โ€ฆ ๐ถ๐‘ฆ6๐‘โˆž

27

slide-28
SLIDE 28

Proof of Theorem 1

SVD: ๐‘„ = ๐‘‰๐‘™ฮฃ๐‘™๐‘Š

๐‘™ โŠค + ๐‘‰โŠฅ0๐‘Š โŠฅ โŠค

Assume

๐ต = (๐‘Š

๐‘™, ๐‘Š โŠฅ) ๐ท

๐ธ , ๐ท of size ๐‘™ ร— ๐‘™ and invertible ๐ถ = (๐‘‰๐‘™, ๐‘‰โŠฅ) ๐น ๐บ , ๐น of size ๐‘™ ร— ๐‘™ and invertible Plug the above ๐ต and ๐ถ into ๐‘„๐ต

๐ถ๐‘„๐ต โˆ’1 ๐ถ๐‘„

28

Theorem 1: ๐‘€๐‘“๐‘ข ๐‘„ ๐‘๐‘“ ๐‘ ๐‘ ๐‘๐‘œ๐‘™ ๐‘™ ๐‘›๐‘๐‘ข๐‘ ๐‘—๐‘ฆ ๐‘๐‘” ๐‘ก๐‘—๐‘จ๐‘“ ๐‘› ร— ๐‘œ, ๐ต ๐‘๐‘“ ๐‘ ๐‘ ๐‘๐‘œ๐‘™ ๐‘™ ๐‘›๐‘๐‘ข๐‘ ๐‘—๐‘ฆ ๐‘๐‘” ๐‘ก๐‘—๐‘จ๐‘“ ๐‘œ ร— ๐‘™, ๐ถ ๐‘๐‘“ ๐‘ ๐‘ ๐‘๐‘œ๐‘™ ๐‘™ ๐‘›๐‘๐‘ข๐‘ ๐‘—๐‘ฆ ๐‘๐‘” ๐‘ก๐‘—๐‘จ๐‘“ ๐‘™ ร— ๐‘›, ๐‘ขโ„Ž๐‘“๐‘œ ๐‘„ = ๐‘„๐ต ๐ถ๐‘„๐ต โˆ’1 ๐ถ๐‘„

slide-29
SLIDE 29

Finite Sample Estimator

Given ๐‘› iid samples, estimate pairwise and triplet marginals One-of-๐‘œ encoding, e.g., ๐œš ๐‘ฆ = 1 = 1 โ‹ฎ , ๐œš ๐‘ฆ = 2 = 1 โ‹ฎ ๐‘„ 1 ; 2 ;{3} =

1 ๐‘›

๐œš ๐‘ฆ1

๐‘— ๐‘› ๐‘—=1

โŠ— ๐œš ๐‘ฆ2

๐‘—

โŠ— ๐œš ๐‘ฆ3

๐‘—

๐‘„ 1 ;{2} =

1 ๐‘›

๐œš ๐‘ฆ1

๐‘— ๐‘› ๐‘—=1

๐œš ๐‘ฆ2

๐‘— โŠค

๐‘„ 1 =

1 ๐‘›

๐œš ๐‘ฆ1

๐‘— ๐‘› ๐‘—=1

29 ๐‘ฆ1

1

๐‘ฆ2

1

๐‘ฆ3

1

๐‘ฆ4

1

๐‘ฆ5

1

๐‘ฆ6

1

๐‘— = 1

๐‘ฆ1

๐‘›

๐‘ฆ2

๐‘›

๐‘ฆ3

๐‘›

๐‘ฆ4

๐‘›

๐‘ฆ5

๐‘›

๐‘ฆ6

๐‘›

๐‘— = ๐‘›

โ€ฆ โ€ฆ โ€ฆ

๐‘Œ1 ๐‘Œ2 ๐‘Œ3 ๐‘Œ7 ๐‘Œ4 ๐‘Œ5 ๐‘Œ6 ๐‘Œ8 ๐‘Œ9 ๐‘Œ10

slide-30
SLIDE 30

Sample Complexity Analysis

Error in estimate ๐‘„ 1 ; 2 ;{3}, ๐‘„ 1 ;{2} Error propagation in the recursive decomposition

๐‘„ ๐‘ฆ1, ๐‘ฆ2, ๐‘ฆ3, ๐‘ฆ4, ๐‘ฆ5, ๐‘ฆ6

= ๐‘„๐‘ฆ1;๐‘ฆ2;{3} ๐‘„ 2 ; 3

โˆ’1

๐‘„ 2 ;๐‘ฆ3;{3}๐‘„ 3 ; 4

โˆ’1

๐‘„ 4 ;๐‘ฆ4;{5}๐‘„ 4 ; 5

โˆ’1

๐‘„{4};๐‘ฆ5;๐‘ฆ6

Depends on the smallest singular value of the invesion terms eg., ๐‘„ 1 ;{2}

Spectral algorithms

Use SVD for further projection Error depends on singular value

30

slide-31
SLIDE 31

Synthetic Data

31

slide-32
SLIDE 32

Stock Trend Prediction Data

59 stocks, 6800 samples, learn latent structure first and then estimate the marginal MAP prediction task ๐‘ฆ๐‘— = ๐‘๐‘ ๐‘•๐‘›๐‘๐‘ฆ ๐‘„(๐‘Œ๐‘—|๐‘ฆ1, ๐‘ฆ2, โ€ฆ , ๐‘ฆ๐‘—โˆ’1) (query ๐‘— variables) Absolute error |๐‘ฆ๐‘— โˆ’ ๐‘ฆ๐‘—

โ‹†|

Also compared with Chow-Liu tree (fully observed model)

32

slide-33
SLIDE 33

Non-discrete, Non-Gaussian Case

Previous approach all about discrete variables Real world data can be continuous, and have multimodal distribution and other rich statistical features Replace discrete probabilities by kernel embedding of distributions

๐‘™ ๐‘ฆ, ๐‘ฆโ€ฒ = ๐œš ๐‘ฆ , ๐œš(๐‘ฆโ€ฒ) , eg., exp โˆ’๐‘ก ๐‘ฆ โˆ’ ๐‘ฆโ€ฒ 2 Expected feature of distribution ๐œˆ๐‘Œ = ๐”ฝ๐‘Œ ๐œš ๐‘Œ (can be infinite dimensional feature) One-of-๐‘œ feature of discrete case is a special case

33

๐‘Œ ๐‘„(๐‘Œ) ๐‘Œ ๐‘ ๐‘„(๐‘Œ, ๐‘) ๐‘ ๐‘Œ ๐‘Ž ๐‘„(๐‘Œ, ๐‘, ๐‘Ž)

slide-34
SLIDE 34

Kernel Embedding and Covariance Operator

34

๐œˆ๐‘Œ โ‰” ๐”ฝ๐‘Œ[๐œš(๐‘Œ)] ๐““๐‘Œ๐‘ โ‰” ๐”ฝ๐‘Œ๐‘[๐œš ๐‘Œ โŠ— ๐œš(๐‘)] ๐““๐‘Œ๐‘๐‘Ž โ‰” ๐”ฝ๐‘Œ๐‘๐‘Ž[๐œš ๐‘Œ โŠ— ๐œš ๐‘ โŠ— ๐œš ๐‘Ž ] ๐‘„(๐‘Œ) ๐‘œ ร— 1 โˆž ร— 1 ๐‘„(๐‘Œ, ๐‘) ๐‘œ ร— ๐‘œ โˆž ร— โˆž ๐‘„(๐‘Œ, ๐‘, ๐‘Ž) ๐‘œ ร— ๐‘œ ร— ๐‘œ โˆž ร— โˆž ร— โˆž ๐‘Œ ๐‘„(๐‘Œ) ๐‘Œ ๐‘ ๐‘„(๐‘Œ, ๐‘) ๐‘ ๐‘Œ ๐‘Ž ๐‘„(๐‘Œ, ๐‘, ๐‘Ž) Discrete Kernel Embedding

slide-35
SLIDE 35

Kernel Embedding with Finite Sample

35

Joint Feature space

๐‘„(๐‘Œ, ๐‘)

๐““๐‘Œ๐‘ = ๐”ฝ๐‘Œ๐‘ ๐œš ๐‘Œ โŠ— ๐œš ๐‘ โ‰ˆ ๐““ ๐‘Œ๐‘ = 1 ๐‘› ๐œš ๐‘ฆ๐‘— โŠ— ๐œš ๐‘ง๐‘—

๐‘› ๐‘—=1

Use finite sample mean to approximate expectation, Then apply the recursively low rank decomposition

๐œš ๐‘ฆ๐‘— โŠ— ๐œš ๐‘ง๐‘— ๐““ ๐‘Œ๐‘

๐”ฝ๐‘Œ๐‘ ๐œš ๐‘Œ โŠ— ๐œš ๐‘ ๐““๐‘Œ๐‘

slide-36
SLIDE 36

How to Deal with Infinite Features?

Kernel trick: never explicitly compute features, always turn it into inner product ๐‘™ ๐‘ฆ, ๐‘ฆโ€ฒ = ๐œš ๐‘ฆ , ๐œš(๐‘ฆโ€ฒ)

  • Eg. kernel Singular Value Decomposition

๐““ ๐‘Œ๐‘ = 1

๐‘›

๐œš ๐‘ฆ๐‘— โŠ— ๐œš ๐‘ง๐‘—

๐‘› ๐‘—=1

= ๐‘‰ฮฃ๐‘ŠโŠค Run kernel principal component analysis on ๐““ ๐‘Œ๐‘๐““ ๐‘Œ๐‘

โŠค

Eigenvector lies the span of data ๐‘‰ = ๐›ฝ๐‘—๐œš ๐‘ฆ๐‘—

๐‘› ๐‘—=1

Solve a generalized eigenvalue problem

๐ฟ๐ป๐ฟ๐›ฝ = ๐œ‡๐ฟ๐›ฝ Kernel matrix ๐ฟ๐‘—๐‘˜ = ๐‘™(๐‘ฆ๐‘—, ๐‘ฆ๐‘˜) and ๐ป๐‘—๐‘˜ = ๐‘™(๐‘ง๐‘—, ๐‘ง๐‘˜)

36

slide-37
SLIDE 37

Video and Slot Car Senor Prediction

37

slide-38
SLIDE 38

Demographic Feature Prediction

50 variables, 1400 samples, learn the latent structure first and then run spectral algorithms Compare to Gaussian latent variable model and Gaussian copula model (NPN), absolute error |๐‘ฆ โˆ’ ๐‘ฆโ‹†|

38

slide-39
SLIDE 39

Summary and Future direction (more)

Spectral algorithm is the consequence of low rank structure of latent variable model

๐‘„ = ๐‘„๐ต ๐ถ๐‘„๐ต โˆ’1 ๐ถ๐‘„ Recursively decomposition Better low rank approximation?

What if the latent variable model is the wrong model?

Estimating latent parameters

PCA approach (Mossel & Roch AOAPโ€™06), PCA and SVD approach, (Anandkumar et al. COLTโ€™12, Arxiv)

Estimating the structure of latent variable models

Recursive grouping (Choi et al. JMLRโ€™11), Spectral short quartet (Anandkumar et al. NIPSโ€™11)

39

slide-40
SLIDE 40

Questions?

Thanks

40