The Kikuchi Hierarchy and Tensor PCA Alex Wein Courant Institute, - - PowerPoint PPT Presentation

the kikuchi hierarchy and tensor pca
SMART_READER_LITE
LIVE PREVIEW

The Kikuchi Hierarchy and Tensor PCA Alex Wein Courant Institute, - - PowerPoint PPT Presentation

The Kikuchi Hierarchy and Tensor PCA Alex Wein Courant Institute, NYU Joint work with: Ahmed El Alaoui Cris Moore Stanford Santa Fe Institute 1 / 19 Statistical Physics of Inference High-dimensional inference problems: compressed


slide-1
SLIDE 1

The Kikuchi Hierarchy and Tensor PCA

Alex Wein

Courant Institute, NYU

Joint work with: Ahmed El Alaoui

Stanford

Cris Moore

Santa Fe Institute

1 / 19

slide-2
SLIDE 2

Statistical Physics of Inference

◮ High-dimensional inference problems: compressed sensing,

community detection, spiked Wigner/Wishart, sparse PCA, planted clique, group synchronization, ...

2 / 19

slide-3
SLIDE 3

Statistical Physics of Inference

◮ High-dimensional inference problems: compressed sensing,

community detection, spiked Wigner/Wishart, sparse PCA, planted clique, group synchronization, ...

◮ Connection to statistical physics: posterior distribution is a

Gibbs/Boltzmann distribution

2 / 19

slide-4
SLIDE 4

Statistical Physics of Inference

◮ High-dimensional inference problems: compressed sensing,

community detection, spiked Wigner/Wishart, sparse PCA, planted clique, group synchronization, ...

◮ Connection to statistical physics: posterior distribution is a

Gibbs/Boltzmann distribution

◮ Algorithms: belief propagation (BP) [Pearl ’86],

approximate message passing (AMP) [Donoho-Maleki-Montanari ’09]

2 / 19

slide-5
SLIDE 5

Statistical Physics of Inference

◮ High-dimensional inference problems: compressed sensing,

community detection, spiked Wigner/Wishart, sparse PCA, planted clique, group synchronization, ...

◮ Connection to statistical physics: posterior distribution is a

Gibbs/Boltzmann distribution

◮ Algorithms: belief propagation (BP) [Pearl ’86],

approximate message passing (AMP) [Donoho-Maleki-Montanari ’09]

◮ Known/believed to be optimal in many settings 2 / 19

slide-6
SLIDE 6

Statistical Physics of Inference

◮ High-dimensional inference problems: compressed sensing,

community detection, spiked Wigner/Wishart, sparse PCA, planted clique, group synchronization, ...

◮ Connection to statistical physics: posterior distribution is a

Gibbs/Boltzmann distribution

◮ Algorithms: belief propagation (BP) [Pearl ’86],

approximate message passing (AMP) [Donoho-Maleki-Montanari ’09]

◮ Known/believed to be optimal in many settings ◮ Sharp results: exact MMSE, phase transitions 2 / 19

slide-7
SLIDE 7

Statistical Physics of Inference

◮ High-dimensional inference problems: compressed sensing,

community detection, spiked Wigner/Wishart, sparse PCA, planted clique, group synchronization, ...

◮ Connection to statistical physics: posterior distribution is a

Gibbs/Boltzmann distribution

◮ Algorithms: belief propagation (BP) [Pearl ’86],

approximate message passing (AMP) [Donoho-Maleki-Montanari ’09]

◮ Known/believed to be optimal in many settings ◮ Sharp results: exact MMSE, phase transitions

◮ Evidence for computational hardness: failure of BP/AMP, free

energy barriers [Decelle-Krzakala-Moore-Zdeborov´

a ’11, Lesieur-Krzakala-Zdeborov´ a ’15] 2 / 19

slide-8
SLIDE 8

Statistical Physics of Inference

◮ High-dimensional inference problems: compressed sensing,

community detection, spiked Wigner/Wishart, sparse PCA, planted clique, group synchronization, ...

◮ Connection to statistical physics: posterior distribution is a

Gibbs/Boltzmann distribution

◮ Algorithms: belief propagation (BP) [Pearl ’86],

approximate message passing (AMP) [Donoho-Maleki-Montanari ’09]

◮ Known/believed to be optimal in many settings ◮ Sharp results: exact MMSE, phase transitions

◮ Evidence for computational hardness: failure of BP/AMP, free

energy barriers [Decelle-Krzakala-Moore-Zdeborov´

a ’11, Lesieur-Krzakala-Zdeborov´ a ’15]

This theory has been hugely successful at precisely understanding statistical and computational limits of many problems.

2 / 19

slide-9
SLIDE 9

Sum-of-Squares (SoS) Hierarchy

A competing theory: sum-of-squares hierarchy [Parrilo ’00, Lasserre ’01]

3 / 19

slide-10
SLIDE 10

Sum-of-Squares (SoS) Hierarchy

A competing theory: sum-of-squares hierarchy [Parrilo ’00, Lasserre ’01]

◮ Systematic way to obtain convex relaxations of polynomial

  • ptimization problems

3 / 19

slide-11
SLIDE 11

Sum-of-Squares (SoS) Hierarchy

A competing theory: sum-of-squares hierarchy [Parrilo ’00, Lasserre ’01]

◮ Systematic way to obtain convex relaxations of polynomial

  • ptimization problems

◮ Degree-d relaxation can be solved in nO(d)-time

3 / 19

slide-12
SLIDE 12

Sum-of-Squares (SoS) Hierarchy

A competing theory: sum-of-squares hierarchy [Parrilo ’00, Lasserre ’01]

◮ Systematic way to obtain convex relaxations of polynomial

  • ptimization problems

◮ Degree-d relaxation can be solved in nO(d)-time ◮ Higher degree gives more powerful algorithms

3 / 19

slide-13
SLIDE 13

Sum-of-Squares (SoS) Hierarchy

A competing theory: sum-of-squares hierarchy [Parrilo ’00, Lasserre ’01]

◮ Systematic way to obtain convex relaxations of polynomial

  • ptimization problems

◮ Degree-d relaxation can be solved in nO(d)-time ◮ Higher degree gives more powerful algorithms ◮ State-of-the-art algorithms for many statistical problems:

tensor decomposition, tensor completion, planted sparse vector, dictionary learning, refuting random CSPs, mixtures of Gaussians, ...

3 / 19

slide-14
SLIDE 14

Sum-of-Squares (SoS) Hierarchy

A competing theory: sum-of-squares hierarchy [Parrilo ’00, Lasserre ’01]

◮ Systematic way to obtain convex relaxations of polynomial

  • ptimization problems

◮ Degree-d relaxation can be solved in nO(d)-time ◮ Higher degree gives more powerful algorithms ◮ State-of-the-art algorithms for many statistical problems:

tensor decomposition, tensor completion, planted sparse vector, dictionary learning, refuting random CSPs, mixtures of Gaussians, ...

◮ Evidence for computational hardness: SoS lower bounds

3 / 19

slide-15
SLIDE 15

Sum-of-Squares (SoS) Hierarchy

A competing theory: sum-of-squares hierarchy [Parrilo ’00, Lasserre ’01]

◮ Systematic way to obtain convex relaxations of polynomial

  • ptimization problems

◮ Degree-d relaxation can be solved in nO(d)-time ◮ Higher degree gives more powerful algorithms ◮ State-of-the-art algorithms for many statistical problems:

tensor decomposition, tensor completion, planted sparse vector, dictionary learning, refuting random CSPs, mixtures of Gaussians, ...

◮ Evidence for computational hardness: SoS lower bounds

Meta-question: unify the statistical physics and SoS approaches?

3 / 19

slide-16
SLIDE 16

Sum-of-Squares (SoS) Hierarchy

A competing theory: sum-of-squares hierarchy [Parrilo ’00, Lasserre ’01]

◮ Systematic way to obtain convex relaxations of polynomial

  • ptimization problems

◮ Degree-d relaxation can be solved in nO(d)-time ◮ Higher degree gives more powerful algorithms ◮ State-of-the-art algorithms for many statistical problems:

tensor decomposition, tensor completion, planted sparse vector, dictionary learning, refuting random CSPs, mixtures of Gaussians, ...

◮ Evidence for computational hardness: SoS lower bounds

Meta-question: unify the statistical physics and SoS approaches? This talk: case study on tensor PCA – a problem where statistical physics and SoS disagree (!!!)

3 / 19

slide-17
SLIDE 17

Tensor PCA (Principal Component Analysis)

Definition (Spiked Tensor Model [Richard-Montanari ’14])

x ∈ {±1}n – signal p ∈ {2, 3, 4, . . .} – tensor order For each subset U ⊆ [n] of size |U| = p, observe YU = λ

  • i∈U

xi + N(0, 1) λ ≥ 0 – signal-to-noise parameter Goal: given {YU}, recover x (with high probability as n → ∞)

◮ “For every p variables, get a noisy observation of their parity” ◮ In tensor notation: Y = λx⊗p + Z where Z is symmetric noise ◮ Case p = 2 is the spiked Wigner matrix model Y = λxx⊤ + Z

4 / 19

slide-18
SLIDE 18

Algorithms for Tensor PCA

Maximum likelihood estimation (MLE): Pr[x|Y ] ∝ exp  

|U|=p

λYU

  • i∈U

xi   = exp λ p Y , x⊗p

  • MLE: ˆ

x = argmax

v∈{±1}nY , v⊗p

5 / 19

slide-19
SLIDE 19

Algorithms for Tensor PCA

Maximum likelihood estimation (MLE): Pr[x|Y ] ∝ exp  

|U|=p

λYU

  • i∈U

xi   = exp λ p Y , x⊗p

  • MLE: ˆ

x = argmax

v∈{±1}nY , v⊗p ◮ Succeeds when λ n(1−p)/2 [Richard-Montanari ’14]

5 / 19

slide-20
SLIDE 20

Algorithms for Tensor PCA

Maximum likelihood estimation (MLE): Pr[x|Y ] ∝ exp  

|U|=p

λYU

  • i∈U

xi   = exp λ p Y , x⊗p

  • MLE: ˆ

x = argmax

v∈{±1}nY , v⊗p ◮ Succeeds when λ n(1−p)/2 [Richard-Montanari ’14] ◮ Statistically optimal (up to constant factors in λ)

5 / 19

slide-21
SLIDE 21

Algorithms for Tensor PCA

Maximum likelihood estimation (MLE): Pr[x|Y ] ∝ exp  

|U|=p

λYU

  • i∈U

xi   = exp λ p Y , x⊗p

  • MLE: ˆ

x = argmax

v∈{±1}nY , v⊗p ◮ Succeeds when λ n(1−p)/2 [Richard-Montanari ’14] ◮ Statistically optimal (up to constant factors in λ) ◮ Problem: requires exponential time 2n

5 / 19

slide-22
SLIDE 22

Algorithms for Tensor PCA

Local algorithms: keep track of a “guess” v ∈ Rn and locally maximize the log-likelihood L(v) = Y , v⊗p

6 / 19

slide-23
SLIDE 23

Algorithms for Tensor PCA

Local algorithms: keep track of a “guess” v ∈ Rn and locally maximize the log-likelihood L(v) = Y , v⊗p

◮ Gradient descent [Ben Arous-Gheissari-Jagannath ’18]

6 / 19

slide-24
SLIDE 24

Algorithms for Tensor PCA

Local algorithms: keep track of a “guess” v ∈ Rn and locally maximize the log-likelihood L(v) = Y , v⊗p

◮ Gradient descent [Ben Arous-Gheissari-Jagannath ’18] ◮ Tensor power iteration [Richard-Montanari ’14]

6 / 19

slide-25
SLIDE 25

Algorithms for Tensor PCA

Local algorithms: keep track of a “guess” v ∈ Rn and locally maximize the log-likelihood L(v) = Y , v⊗p

◮ Gradient descent [Ben Arous-Gheissari-Jagannath ’18] ◮ Tensor power iteration [Richard-Montanari ’14] ◮ Langevin dynamics [Ben Arous-Gheissari-Jagannath ’18]

6 / 19

slide-26
SLIDE 26

Algorithms for Tensor PCA

Local algorithms: keep track of a “guess” v ∈ Rn and locally maximize the log-likelihood L(v) = Y , v⊗p

◮ Gradient descent [Ben Arous-Gheissari-Jagannath ’18] ◮ Tensor power iteration [Richard-Montanari ’14] ◮ Langevin dynamics [Ben Arous-Gheissari-Jagannath ’18] ◮ Approximate message passing (AMP) [Richard-Montanari ’14]

6 / 19

slide-27
SLIDE 27

Algorithms for Tensor PCA

Local algorithms: keep track of a “guess” v ∈ Rn and locally maximize the log-likelihood L(v) = Y , v⊗p

◮ Gradient descent [Ben Arous-Gheissari-Jagannath ’18] ◮ Tensor power iteration [Richard-Montanari ’14] ◮ Langevin dynamics [Ben Arous-Gheissari-Jagannath ’18] ◮ Approximate message passing (AMP) [Richard-Montanari ’14]

These only succeed when λ ≫ n−1/2

◮ Recall: MLE works for λ ∼ n(1−p)/2

6 / 19

slide-28
SLIDE 28

Algorithms for Tensor PCA

Sum-of-squares (SoS) and spectral methods:

7 / 19

slide-29
SLIDE 29

Algorithms for Tensor PCA

Sum-of-squares (SoS) and spectral methods:

◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15]

7 / 19

slide-30
SLIDE 30

Algorithms for Tensor PCA

Sum-of-squares (SoS) and spectral methods:

◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] ◮ Spectral SoS [Hopkins-Shi-Steurer ’15, Hopkins-Schramm-Shi-Steurer ’15]

7 / 19

slide-31
SLIDE 31

Algorithms for Tensor PCA

Sum-of-squares (SoS) and spectral methods:

◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] ◮ Spectral SoS [Hopkins-Shi-Steurer ’15, Hopkins-Schramm-Shi-Steurer ’15] ◮ Tensor unfolding [Richard-Montanari ’14, Hopkins-Shi-Steurer ’15]

7 / 19

slide-32
SLIDE 32

Algorithms for Tensor PCA

Sum-of-squares (SoS) and spectral methods:

◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] ◮ Spectral SoS [Hopkins-Shi-Steurer ’15, Hopkins-Schramm-Shi-Steurer ’15] ◮ Tensor unfolding [Richard-Montanari ’14, Hopkins-Shi-Steurer ’15]

These are poly-time and succeed when λ ≫ n−p/4

7 / 19

slide-33
SLIDE 33

Algorithms for Tensor PCA

Sum-of-squares (SoS) and spectral methods:

◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] ◮ Spectral SoS [Hopkins-Shi-Steurer ’15, Hopkins-Schramm-Shi-Steurer ’15] ◮ Tensor unfolding [Richard-Montanari ’14, Hopkins-Shi-Steurer ’15]

These are poly-time and succeed when λ ≫ n−p/4 SoS lower bounds suggest no poly-time algorithm when λ ≪ n−p/4

[Hopkins-Shi-Steurer ’15, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17] 7 / 19

slide-34
SLIDE 34

Algorithms for Tensor PCA

Sum-of-squares (SoS) and spectral methods:

◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] ◮ Spectral SoS [Hopkins-Shi-Steurer ’15, Hopkins-Schramm-Shi-Steurer ’15] ◮ Tensor unfolding [Richard-Montanari ’14, Hopkins-Shi-Steurer ’15]

These are poly-time and succeed when λ ≫ n−p/4 SoS lower bounds suggest no poly-time algorithm when λ ≪ n−p/4

[Hopkins-Shi-Steurer ’15, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17]

λ

impossible hard !!!

n(1−p)/2

MLE

n−p/4

SoS

n−1/2

Local

Local algorithms (gradient descent, AMP, ...) are suboptimal when p ≥ 3

7 / 19

slide-35
SLIDE 35

Subexponential-Time Algorithms

Subexponential-time: 2nδ for δ ∈ (0, 1)

8 / 19

slide-36
SLIDE 36

Subexponential-Time Algorithms

Subexponential-time: 2nδ for δ ∈ (0, 1) Tensor PCA has a smooth tradeoff between runtime and statistical power: for δ ∈ (0, 1), there is a 2nδ-time algorithm for λ ∼ n−p/4+δ(1/2−p/4)

[Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] 8 / 19

slide-37
SLIDE 37

Subexponential-Time Algorithms

Subexponential-time: 2nδ for δ ∈ (0, 1) Tensor PCA has a smooth tradeoff between runtime and statistical power: for δ ∈ (0, 1), there is a 2nδ-time algorithm for λ ∼ n−p/4+δ(1/2−p/4)

[Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16]

Interpolates between SoS and MLE:

◮ δ = 0

⇒ poly-time algorithm for λ ∼ n−p/4

◮ δ = 1

⇒ 2n-time algorithm for λ ∼ n(1−p)/2

8 / 19

slide-38
SLIDE 38

Subexponential-Time Algorithms

Subexponential-time: 2nδ for δ ∈ (0, 1) Tensor PCA has a smooth tradeoff between runtime and statistical power: for δ ∈ (0, 1), there is a 2nδ-time algorithm for λ ∼ n−p/4+δ(1/2−p/4)

[Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16]

Interpolates between SoS and MLE:

◮ δ = 0

⇒ poly-time algorithm for λ ∼ n−p/4

◮ δ = 1

⇒ 2n-time algorithm for λ ∼ n(1−p)/2 λ

impossible hard

n(1−p)/2

MLE

n−p/4

SoS

n−1/2

Local

8 / 19

slide-39
SLIDE 39

Subexponential-Time Algorithms

Subexponential-time: 2nδ for δ ∈ (0, 1) Tensor PCA has a smooth tradeoff between runtime and statistical power: for δ ∈ (0, 1), there is a 2nδ-time algorithm for λ ∼ n−p/4+δ(1/2−p/4)

[Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16]

Interpolates between SoS and MLE:

◮ δ = 0

⇒ poly-time algorithm for λ ∼ n−p/4

◮ δ = 1

⇒ 2n-time algorithm for λ ∼ n(1−p)/2 λ

impossible hard

n(1−p)/2

MLE

n−p/4

SoS

n−1/2

Local

In contrast, some problems have a sharp threshold

◮ E.g., λ > 1 is nearly-linear time; λ < 1 needs time 2n

8 / 19

slide-40
SLIDE 40

Subexponential-Time Algorithms

Subexponential-time: 2nδ for δ ∈ (0, 1) Tensor PCA has a smooth tradeoff between runtime and statistical power: for δ ∈ (0, 1), there is a 2nδ-time algorithm for λ ∼ n−p/4+δ(1/2−p/4)

[Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16]

Interpolates between SoS and MLE:

◮ δ = 0

⇒ poly-time algorithm for λ ∼ n−p/4

◮ δ = 1

⇒ 2n-time algorithm for λ ∼ n(1−p)/2 λ

impossible hard

n(1−p)/2

MLE

n−p/4

SoS

n−1/2

Local

In contrast, some problems have a sharp threshold

◮ E.g., λ > 1 is nearly-linear time; λ < 1 needs time 2n

For “soft” thresholds (like tensor PCA): BP/AMP can’t be optimal

8 / 19

slide-41
SLIDE 41

Aside: Low-Degree Likelihood Ratio

Recall: there is a 2nδ-time algorithm for λ ∼ n−p/4+δ(1/2−p/4)

9 / 19

slide-42
SLIDE 42

Aside: Low-Degree Likelihood Ratio

Recall: there is a 2nδ-time algorithm for λ ∼ n−p/4+δ(1/2−p/4) Evidence that this tradeoff is optimal: low-degree likelihood ratio

9 / 19

slide-43
SLIDE 43

Aside: Low-Degree Likelihood Ratio

Recall: there is a 2nδ-time algorithm for λ ∼ n−p/4+δ(1/2−p/4) Evidence that this tradeoff is optimal: low-degree likelihood ratio

◮ A relatively simple calculation that predicts the computational

complexity of high-dimensional inference problems

9 / 19

slide-44
SLIDE 44

Aside: Low-Degree Likelihood Ratio

Recall: there is a 2nδ-time algorithm for λ ∼ n−p/4+δ(1/2−p/4) Evidence that this tradeoff is optimal: low-degree likelihood ratio

◮ A relatively simple calculation that predicts the computational

complexity of high-dimensional inference problems

◮ Arose from the study of SoS lower bounds, pseudo-calibration

[Barak-Hopkins-Kelner-Kothari-Moitra-Potechin ’16, Hopkins-Steurer ’17, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17, Hopkins PhD thesis ’18] 9 / 19

slide-45
SLIDE 45

Aside: Low-Degree Likelihood Ratio

Recall: there is a 2nδ-time algorithm for λ ∼ n−p/4+δ(1/2−p/4) Evidence that this tradeoff is optimal: low-degree likelihood ratio

◮ A relatively simple calculation that predicts the computational

complexity of high-dimensional inference problems

◮ Arose from the study of SoS lower bounds, pseudo-calibration

[Barak-Hopkins-Kelner-Kothari-Moitra-Potechin ’16, Hopkins-Steurer ’17, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17, Hopkins PhD thesis ’18]

◮ Idea: look for a low-degree polynomial (of Y ) that

distinguishes P (spiked tensor) and Q (pure noise)

9 / 19

slide-46
SLIDE 46

Aside: Low-Degree Likelihood Ratio

Recall: there is a 2nδ-time algorithm for λ ∼ n−p/4+δ(1/2−p/4) Evidence that this tradeoff is optimal: low-degree likelihood ratio

◮ A relatively simple calculation that predicts the computational

complexity of high-dimensional inference problems

◮ Arose from the study of SoS lower bounds, pseudo-calibration

[Barak-Hopkins-Kelner-Kothari-Moitra-Potechin ’16, Hopkins-Steurer ’17, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17, Hopkins PhD thesis ’18]

◮ Idea: look for a low-degree polynomial (of Y ) that

distinguishes P (spiked tensor) and Q (pure noise) max

f degree ≤D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

?

= O(1) ⇒ “hard” ω(1) ⇒ “easy”

9 / 19

slide-47
SLIDE 47

Aside: Low-Degree Likelihood Ratio

Recall: there is a 2nδ-time algorithm for λ ∼ n−p/4+δ(1/2−p/4) Evidence that this tradeoff is optimal: low-degree likelihood ratio

◮ A relatively simple calculation that predicts the computational

complexity of high-dimensional inference problems

◮ Arose from the study of SoS lower bounds, pseudo-calibration

[Barak-Hopkins-Kelner-Kothari-Moitra-Potechin ’16, Hopkins-Steurer ’17, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17, Hopkins PhD thesis ’18]

◮ Idea: look for a low-degree polynomial (of Y ) that

distinguishes P (spiked tensor) and Q (pure noise) max

f degree ≤D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

?

= O(1) ⇒ “hard” ω(1) ⇒ “easy”

◮ Take deg-D polynomials as a proxy for n ˜ Θ(D)-time algorithms

9 / 19

slide-48
SLIDE 48

Aside: Low-Degree Likelihood Ratio

Recall: there is a 2nδ-time algorithm for λ ∼ n−p/4+δ(1/2−p/4) Evidence that this tradeoff is optimal: low-degree likelihood ratio

◮ A relatively simple calculation that predicts the computational

complexity of high-dimensional inference problems

◮ Arose from the study of SoS lower bounds, pseudo-calibration

[Barak-Hopkins-Kelner-Kothari-Moitra-Potechin ’16, Hopkins-Steurer ’17, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17, Hopkins PhD thesis ’18]

◮ Idea: look for a low-degree polynomial (of Y ) that

distinguishes P (spiked tensor) and Q (pure noise) max

f degree ≤D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

?

= O(1) ⇒ “hard” ω(1) ⇒ “easy”

◮ Take deg-D polynomials as a proxy for n ˜ Θ(D)-time algorithms

For more, see the survey Kunisky-W.-Bandeira, “Notes on Computational

Hardness of Hypothesis Testing: Predictions using the Low-Degree Likelihood Ratio”, arXiv:1907.11636

9 / 19

slide-49
SLIDE 49

Our Contributions

10 / 19

slide-50
SLIDE 50

Our Contributions

◮ We give a hierarchy of increasingly powerful BP/AMP-type

algorithms: level ℓ requires nO(ℓ) time

◮ Analogous to SoS hierarchy 10 / 19

slide-51
SLIDE 51

Our Contributions

◮ We give a hierarchy of increasingly powerful BP/AMP-type

algorithms: level ℓ requires nO(ℓ) time

◮ Analogous to SoS hierarchy

◮ We prove that these algorithms match the performance of

SoS

◮ Both for poly-time and for subexponential-time tradeoff 10 / 19

slide-52
SLIDE 52

Our Contributions

◮ We give a hierarchy of increasingly powerful BP/AMP-type

algorithms: level ℓ requires nO(ℓ) time

◮ Analogous to SoS hierarchy

◮ We prove that these algorithms match the performance of

SoS

◮ Both for poly-time and for subexponential-time tradeoff

◮ This refines and “redeems” the statistical physics approach to

algorithm design

10 / 19

slide-53
SLIDE 53

Our Contributions

◮ We give a hierarchy of increasingly powerful BP/AMP-type

algorithms: level ℓ requires nO(ℓ) time

◮ Analogous to SoS hierarchy

◮ We prove that these algorithms match the performance of

SoS

◮ Both for poly-time and for subexponential-time tradeoff

◮ This refines and “redeems” the statistical physics approach to

algorithm design

◮ Our algorithms and analysis are simpler than prior work

10 / 19

slide-54
SLIDE 54

Our Contributions

◮ We give a hierarchy of increasingly powerful BP/AMP-type

algorithms: level ℓ requires nO(ℓ) time

◮ Analogous to SoS hierarchy

◮ We prove that these algorithms match the performance of

SoS

◮ Both for poly-time and for subexponential-time tradeoff

◮ This refines and “redeems” the statistical physics approach to

algorithm design

◮ Our algorithms and analysis are simpler than prior work ◮ This talk: even-order tensors only

10 / 19

slide-55
SLIDE 55

Our Contributions

◮ We give a hierarchy of increasingly powerful BP/AMP-type

algorithms: level ℓ requires nO(ℓ) time

◮ Analogous to SoS hierarchy

◮ We prove that these algorithms match the performance of

SoS

◮ Both for poly-time and for subexponential-time tradeoff

◮ This refines and “redeems” the statistical physics approach to

algorithm design

◮ Our algorithms and analysis are simpler than prior work ◮ This talk: even-order tensors only ◮ Similar results for refuting random XOR formulas

10 / 19

slide-56
SLIDE 56

Motivating the Algorithm: Belief Propagation / AMP

11 / 19

slide-57
SLIDE 57

Motivating the Algorithm: Belief Propagation / AMP

General setup: unknown signal x ∈ {±1}n, observed data Y

11 / 19

slide-58
SLIDE 58

Motivating the Algorithm: Belief Propagation / AMP

General setup: unknown signal x ∈ {±1}n, observed data Y Want to understand posterior Pr[x|Y ]

11 / 19

slide-59
SLIDE 59

Motivating the Algorithm: Belief Propagation / AMP

General setup: unknown signal x ∈ {±1}n, observed data Y Want to understand posterior Pr[x|Y ] Find distribution µ over {±1}n minimizing free energy F(µ) = E(µ) − S(µ)

◮ “Energy” and “entropy” terms ◮ The unique minimizer is Pr[x|Y ]

11 / 19

slide-60
SLIDE 60

Motivating the Algorithm: Belief Propagation / AMP

General setup: unknown signal x ∈ {±1}n, observed data Y Want to understand posterior Pr[x|Y ] Find distribution µ over {±1}n minimizing free energy F(µ) = E(µ) − S(µ)

◮ “Energy” and “entropy” terms ◮ The unique minimizer is Pr[x|Y ]

Problem: need exponentially-many parameters to describe µ

11 / 19

slide-61
SLIDE 61

Motivating the Algorithm: Belief Propagation / AMP

General setup: unknown signal x ∈ {±1}n, observed data Y Want to understand posterior Pr[x|Y ] Find distribution µ over {±1}n minimizing free energy F(µ) = E(µ) − S(µ)

◮ “Energy” and “entropy” terms ◮ The unique minimizer is Pr[x|Y ]

Problem: need exponentially-many parameters to describe µ BP/AMP: just keep track of marginals mi = E[xi] and minimize a proxy, Bethe free energy B(m)

11 / 19

slide-62
SLIDE 62

Motivating the Algorithm: Belief Propagation / AMP

General setup: unknown signal x ∈ {±1}n, observed data Y Want to understand posterior Pr[x|Y ] Find distribution µ over {±1}n minimizing free energy F(µ) = E(µ) − S(µ)

◮ “Energy” and “entropy” terms ◮ The unique minimizer is Pr[x|Y ]

Problem: need exponentially-many parameters to describe µ BP/AMP: just keep track of marginals mi = E[xi] and minimize a proxy, Bethe free energy B(m)

◮ Locally minimize B(m) via iterative update

11 / 19

slide-63
SLIDE 63

Generalized BP and Kikuchi Free Energy

Recall: BP/AMP keeps track of marginals mi = E[xi] and minimizes Bethe free energy B(m)

12 / 19

slide-64
SLIDE 64

Generalized BP and Kikuchi Free Energy

Recall: BP/AMP keeps track of marginals mi = E[xi] and minimizes Bethe free energy B(m) Natural higher-order variant:

12 / 19

slide-65
SLIDE 65

Generalized BP and Kikuchi Free Energy

Recall: BP/AMP keeps track of marginals mi = E[xi] and minimizes Bethe free energy B(m) Natural higher-order variant:

◮ Keep track of mi = E[xi], mij = E[xixj], . . . (up to degree ℓ)

12 / 19

slide-66
SLIDE 66

Generalized BP and Kikuchi Free Energy

Recall: BP/AMP keeps track of marginals mi = E[xi] and minimizes Bethe free energy B(m) Natural higher-order variant:

◮ Keep track of mi = E[xi], mij = E[xixj], . . . (up to degree ℓ) ◮ Minimize Kikuchi free energy Kℓ(m) [Kikuchi ’51]

12 / 19

slide-67
SLIDE 67

Generalized BP and Kikuchi Free Energy

Recall: BP/AMP keeps track of marginals mi = E[xi] and minimizes Bethe free energy B(m) Natural higher-order variant:

◮ Keep track of mi = E[xi], mij = E[xixj], . . . (up to degree ℓ) ◮ Minimize Kikuchi free energy Kℓ(m) [Kikuchi ’51]

Various ways to locally minimize Kikuchi free energy

12 / 19

slide-68
SLIDE 68

Generalized BP and Kikuchi Free Energy

Recall: BP/AMP keeps track of marginals mi = E[xi] and minimizes Bethe free energy B(m) Natural higher-order variant:

◮ Keep track of mi = E[xi], mij = E[xixj], . . . (up to degree ℓ) ◮ Minimize Kikuchi free energy Kℓ(m) [Kikuchi ’51]

Various ways to locally minimize Kikuchi free energy

◮ Gradient descent

12 / 19

slide-69
SLIDE 69

Generalized BP and Kikuchi Free Energy

Recall: BP/AMP keeps track of marginals mi = E[xi] and minimizes Bethe free energy B(m) Natural higher-order variant:

◮ Keep track of mi = E[xi], mij = E[xixj], . . . (up to degree ℓ) ◮ Minimize Kikuchi free energy Kℓ(m) [Kikuchi ’51]

Various ways to locally minimize Kikuchi free energy

◮ Gradient descent ◮ Generalized belief propagation (GBP) [Yedidia-Freeman-Weiss ’03]

12 / 19

slide-70
SLIDE 70

Generalized BP and Kikuchi Free Energy

Recall: BP/AMP keeps track of marginals mi = E[xi] and minimizes Bethe free energy B(m) Natural higher-order variant:

◮ Keep track of mi = E[xi], mij = E[xixj], . . . (up to degree ℓ) ◮ Minimize Kikuchi free energy Kℓ(m) [Kikuchi ’51]

Various ways to locally minimize Kikuchi free energy

◮ Gradient descent ◮ Generalized belief propagation (GBP) [Yedidia-Freeman-Weiss ’03] ◮ We will use a spectral method based on the Kikuchi Hessian

12 / 19

slide-71
SLIDE 71

The Kikuchi Hessian

13 / 19

slide-72
SLIDE 72

The Kikuchi Hessian

Bethe Hessian approach [Saade-Krzakala-Zdeborov´

a ’14] 13 / 19

slide-73
SLIDE 73

The Kikuchi Hessian

Bethe Hessian approach [Saade-Krzakala-Zdeborov´

a ’14]

◮ Recall: want to minimize B(m) with respect to m = {mi}

13 / 19

slide-74
SLIDE 74

The Kikuchi Hessian

Bethe Hessian approach [Saade-Krzakala-Zdeborov´

a ’14]

◮ Recall: want to minimize B(m) with respect to m = {mi} ◮ Trivial “uninformative” stationary point m∗ where ∇B(m) = 0

13 / 19

slide-75
SLIDE 75

The Kikuchi Hessian

Bethe Hessian approach [Saade-Krzakala-Zdeborov´

a ’14]

◮ Recall: want to minimize B(m) with respect to m = {mi} ◮ Trivial “uninformative” stationary point m∗ where ∇B(m) = 0 ◮ Bethe Hessian matrix Hij = ∂2B ∂mi∂mj |m=m∗

13 / 19

slide-76
SLIDE 76

The Kikuchi Hessian

Bethe Hessian approach [Saade-Krzakala-Zdeborov´

a ’14]

◮ Recall: want to minimize B(m) with respect to m = {mi} ◮ Trivial “uninformative” stationary point m∗ where ∇B(m) = 0 ◮ Bethe Hessian matrix Hij = ∂2B ∂mi∂mj |m=m∗ ◮ Algorithm: compute bottom eigenvector of H

13 / 19

slide-77
SLIDE 77

The Kikuchi Hessian

Bethe Hessian approach [Saade-Krzakala-Zdeborov´

a ’14]

◮ Recall: want to minimize B(m) with respect to m = {mi} ◮ Trivial “uninformative” stationary point m∗ where ∇B(m) = 0 ◮ Bethe Hessian matrix Hij = ∂2B ∂mi∂mj |m=m∗ ◮ Algorithm: compute bottom eigenvector of H ◮ Why: best direction of local improvement

13 / 19

slide-78
SLIDE 78

The Kikuchi Hessian

Bethe Hessian approach [Saade-Krzakala-Zdeborov´

a ’14]

◮ Recall: want to minimize B(m) with respect to m = {mi} ◮ Trivial “uninformative” stationary point m∗ where ∇B(m) = 0 ◮ Bethe Hessian matrix Hij = ∂2B ∂mi∂mj |m=m∗ ◮ Algorithm: compute bottom eigenvector of H ◮ Why: best direction of local improvement ◮ Spectral method with performance essentially as good as BP

for community detection

13 / 19

slide-79
SLIDE 79

The Kikuchi Hessian

Bethe Hessian approach [Saade-Krzakala-Zdeborov´

a ’14]

◮ Recall: want to minimize B(m) with respect to m = {mi} ◮ Trivial “uninformative” stationary point m∗ where ∇B(m) = 0 ◮ Bethe Hessian matrix Hij = ∂2B ∂mi∂mj |m=m∗ ◮ Algorithm: compute bottom eigenvector of H ◮ Why: best direction of local improvement ◮ Spectral method with performance essentially as good as BP

for community detection Our approach: Kikuchi Hessian

◮ Bottom eigenvector of Hessian of K(m) with respect to

moments m = {mi, mij, . . .}

13 / 19

slide-80
SLIDE 80

The Algorithm

Definition (Symmetric Difference Matrix)

Input: an order-p tensor Y = (YU)|U|=p (with p even) and an integer ℓ in the range p/2 ≤ ℓ ≤ n − p/2. Define the n

  • ×

n

  • matrix (indexed by ℓ-subsets of [n])

MS,T = YS△T if |S △ T| = p,

  • therwise.

14 / 19

slide-81
SLIDE 81

The Algorithm

Definition (Symmetric Difference Matrix)

Input: an order-p tensor Y = (YU)|U|=p (with p even) and an integer ℓ in the range p/2 ≤ ℓ ≤ n − p/2. Define the n

  • ×

n

  • matrix (indexed by ℓ-subsets of [n])

MS,T = YS△T if |S △ T| = p,

  • therwise.

◮ This is (approximately) a submatrix of the Kikuchi Hessian

14 / 19

slide-82
SLIDE 82

The Algorithm

Definition (Symmetric Difference Matrix)

Input: an order-p tensor Y = (YU)|U|=p (with p even) and an integer ℓ in the range p/2 ≤ ℓ ≤ n − p/2. Define the n

  • ×

n

  • matrix (indexed by ℓ-subsets of [n])

MS,T = YS△T if |S △ T| = p,

  • therwise.

◮ This is (approximately) a submatrix of the Kikuchi Hessian ◮ Algorithm: compute leading eigenvalue/eigenvector of M

14 / 19

slide-83
SLIDE 83

The Algorithm

Definition (Symmetric Difference Matrix)

Input: an order-p tensor Y = (YU)|U|=p (with p even) and an integer ℓ in the range p/2 ≤ ℓ ≤ n − p/2. Define the n

  • ×

n

  • matrix (indexed by ℓ-subsets of [n])

MS,T = YS△T if |S △ T| = p,

  • therwise.

◮ This is (approximately) a submatrix of the Kikuchi Hessian ◮ Algorithm: compute leading eigenvalue/eigenvector of M ◮ Runtime: nO(ℓ)

14 / 19

slide-84
SLIDE 84

The Algorithm

Definition (Symmetric Difference Matrix)

Input: an order-p tensor Y = (YU)|U|=p (with p even) and an integer ℓ in the range p/2 ≤ ℓ ≤ n − p/2. Define the n

  • ×

n

  • matrix (indexed by ℓ-subsets of [n])

MS,T = YS△T if |S △ T| = p,

  • therwise.

◮ This is (approximately) a submatrix of the Kikuchi Hessian ◮ Algorithm: compute leading eigenvalue/eigenvector of M ◮ Runtime: nO(ℓ) ◮ The case ℓ = p/2 is “tensor unfolding,” which is poly-time

and succeeds up to the SoS threshold

14 / 19

slide-85
SLIDE 85

The Algorithm

Definition (Symmetric Difference Matrix)

Input: an order-p tensor Y = (YU)|U|=p (with p even) and an integer ℓ in the range p/2 ≤ ℓ ≤ n − p/2. Define the n

  • ×

n

  • matrix (indexed by ℓ-subsets of [n])

MS,T = YS△T if |S △ T| = p,

  • therwise.

◮ This is (approximately) a submatrix of the Kikuchi Hessian ◮ Algorithm: compute leading eigenvalue/eigenvector of M ◮ Runtime: nO(ℓ) ◮ The case ℓ = p/2 is “tensor unfolding,” which is poly-time

and succeeds up to the SoS threshold

◮ ℓ = nδ gives an algorithm of runtime nO(nℓ) = 2nδ+o(1)

14 / 19

slide-86
SLIDE 86

Intuition for Symmetric Difference Matrix

Recall: MS,T = ✶|S△T|=pYS△T where |S| = |T| = ℓ

15 / 19

slide-87
SLIDE 87

Intuition for Symmetric Difference Matrix

Recall: MS,T = ✶|S△T|=pYS△T where |S| = |T| = ℓ Compute top eigenvector via power iteration: v ← Mv

◮ v ∈ R(n

ℓ) where vS is an estimate of xS :=

i∈S xi

15 / 19

slide-88
SLIDE 88

Intuition for Symmetric Difference Matrix

Recall: MS,T = ✶|S△T|=pYS△T where |S| = |T| = ℓ Compute top eigenvector via power iteration: v ← Mv

◮ v ∈ R(n

ℓ) where vS is an estimate of xS :=

i∈S xi

Expand formula v ← Mv: vS ←

  • T:|S△T|=p

YS△T vT

◮ Recall: YS△T is a noisy measurement of xS△T ◮ So YS△T vT is T’s opinion about xS

15 / 19

slide-89
SLIDE 89

Intuition for Symmetric Difference Matrix

Recall: MS,T = ✶|S△T|=pYS△T where |S| = |T| = ℓ Compute top eigenvector via power iteration: v ← Mv

◮ v ∈ R(n

ℓ) where vS is an estimate of xS :=

i∈S xi

Expand formula v ← Mv: vS ←

  • T:|S△T|=p

YS△T vT

◮ Recall: YS△T is a noisy measurement of xS△T ◮ So YS△T vT is T’s opinion about xS

This is a message-passing algorithm among sets of size ℓ

15 / 19

slide-90
SLIDE 90

Analysis

Simplest statistical task: detection

◮ Distinguish between λ = ¯

λ (spiked tensor) and λ = 0 (noise) ✶

16 / 19

slide-91
SLIDE 91

Analysis

Simplest statistical task: detection

◮ Distinguish between λ = ¯

λ (spiked tensor) and λ = 0 (noise) Algorithm: given Y , build matrix MS,T = ✶|S△T|=pYS△T, threshold maximum eigenvalue

16 / 19

slide-92
SLIDE 92

Analysis

Simplest statistical task: detection

◮ Distinguish between λ = ¯

λ (spiked tensor) and λ = 0 (noise) Algorithm: given Y , build matrix MS,T = ✶|S△T|=pYS△T, threshold maximum eigenvalue Key step: bound spectral norm M when Y ∼ i.i.d. N(0, 1)

16 / 19

slide-93
SLIDE 93

Analysis

Simplest statistical task: detection

◮ Distinguish between λ = ¯

λ (spiked tensor) and λ = 0 (noise) Algorithm: given Y , build matrix MS,T = ✶|S△T|=pYS△T, threshold maximum eigenvalue Key step: bound spectral norm M when Y ∼ i.i.d. N(0, 1)

Theorem (Matrix Chernoff Bound [Oliveira ’10, Tropp ’10])

Let M =

i ziAi where zi ∼ N(0, 1) independently and {Ai} is a

finite sequence of fixed symmetric d × d matrices. Then, for all t ≥ 0, P (M ≥ t) ≤ 2de−t2/2σ2 where σ2 =

  • i

(Ai)2

  • .

16 / 19

slide-94
SLIDE 94

Analysis

Simplest statistical task: detection

◮ Distinguish between λ = ¯

λ (spiked tensor) and λ = 0 (noise) Algorithm: given Y , build matrix MS,T = ✶|S△T|=pYS△T, threshold maximum eigenvalue Key step: bound spectral norm M when Y ∼ i.i.d. N(0, 1)

Theorem (Matrix Chernoff Bound [Oliveira ’10, Tropp ’10])

Let M =

i ziAi where zi ∼ N(0, 1) independently and {Ai} is a

finite sequence of fixed symmetric d × d matrices. Then, for all t ≥ 0, P (M ≥ t) ≤ 2de−t2/2σ2 where σ2 =

  • i

(Ai)2

  • .

In our case,

i(Ai)2 is a multiple of the identity

16 / 19

slide-95
SLIDE 95

Comparison to Prior Work

SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm Y inj := max

x=1 |Y , x⊗p|

17 / 19

slide-96
SLIDE 96

Comparison to Prior Work

SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm Y inj := max

x=1 |Y , x⊗p|

Spectral certification: find an nℓ × nℓ matrix M such that (x⊗ℓ)⊤M(x⊗ℓ) = Y , x⊗p2ℓ/p and so Y inj ≤ Mp/2ℓ

17 / 19

slide-97
SLIDE 97

Comparison to Prior Work

SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm Y inj := max

x=1 |Y , x⊗p|

Spectral certification: find an nℓ × nℓ matrix M such that (x⊗ℓ)⊤M(x⊗ℓ) = Y , x⊗p2ℓ/p and so Y inj ≤ Mp/2ℓ

◮ Each entry of M is a degree-2ℓ/p polynomial in Y

17 / 19

slide-98
SLIDE 98

Comparison to Prior Work

SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm Y inj := max

x=1 |Y , x⊗p|

Spectral certification: find an nℓ × nℓ matrix M such that (x⊗ℓ)⊤M(x⊗ℓ) = Y , x⊗p2ℓ/p and so Y inj ≤ Mp/2ℓ

◮ Each entry of M is a degree-2ℓ/p polynomial in Y ◮ Analysis: trace moment method (complicated)

[Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] 17 / 19

slide-99
SLIDE 99

Comparison to Prior Work

SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm Y inj := max

x=1 |Y , x⊗p|

Spectral certification: find an nℓ × nℓ matrix M such that (x⊗ℓ)⊤M(x⊗ℓ) = Y , x⊗p2ℓ/p and so Y inj ≤ Mp/2ℓ

◮ Each entry of M is a degree-2ℓ/p polynomial in Y ◮ Analysis: trace moment method (complicated)

[Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16]

Our method: instead find M (symm. diff. matrix) such that (x⊗ℓ)⊤M(x⊗ℓ) = Y , x⊗px2ℓ−p and so Y inj ≤ M

17 / 19

slide-100
SLIDE 100

Comparison to Prior Work

SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm Y inj := max

x=1 |Y , x⊗p|

Spectral certification: find an nℓ × nℓ matrix M such that (x⊗ℓ)⊤M(x⊗ℓ) = Y , x⊗p2ℓ/p and so Y inj ≤ Mp/2ℓ

◮ Each entry of M is a degree-2ℓ/p polynomial in Y ◮ Analysis: trace moment method (complicated)

[Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16]

Our method: instead find M (symm. diff. matrix) such that (x⊗ℓ)⊤M(x⊗ℓ) = Y , x⊗px2ℓ−p and so Y inj ≤ M

◮ Each entry of M is a degree-1 polynomial in Y

17 / 19

slide-101
SLIDE 101

Comparison to Prior Work

SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm Y inj := max

x=1 |Y , x⊗p|

Spectral certification: find an nℓ × nℓ matrix M such that (x⊗ℓ)⊤M(x⊗ℓ) = Y , x⊗p2ℓ/p and so Y inj ≤ Mp/2ℓ

◮ Each entry of M is a degree-2ℓ/p polynomial in Y ◮ Analysis: trace moment method (complicated)

[Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16]

Our method: instead find M (symm. diff. matrix) such that (x⊗ℓ)⊤M(x⊗ℓ) = Y , x⊗px2ℓ−p and so Y inj ≤ M

◮ Each entry of M is a degree-1 polynomial in Y ◮ Analysis: matrix Chernoff bound (much simpler)

17 / 19

slide-102
SLIDE 102

Related Work

18 / 19

slide-103
SLIDE 103

Related Work

◮ [Hastings ’19, “Classical and Quantum Algorithms for Tensor PCA”]

◮ Similar construction (symmetric difference matrix) with

different motivation: quantum

◮ Hamiltonian of system of bosons 18 / 19

slide-104
SLIDE 104

Related Work

◮ [Hastings ’19, “Classical and Quantum Algorithms for Tensor PCA”]

◮ Similar construction (symmetric difference matrix) with

different motivation: quantum

◮ Hamiltonian of system of bosons

◮ [Biroli, Cammarota, Ricci-Tersenghi ’19, “How to iron out rough

landscapes and get optimal performances”]

◮ A different form of “redemption” for local algorithms ◮ Replicated gradient descent 18 / 19

slide-105
SLIDE 105

Summary

19 / 19

slide-106
SLIDE 106

Summary

◮ Local algorithms are suboptimal for tensor PCA

19 / 19

slide-107
SLIDE 107

Summary

◮ Local algorithms are suboptimal for tensor PCA

◮ E.g. gradient descent, AMP 19 / 19

slide-108
SLIDE 108

Summary

◮ Local algorithms are suboptimal for tensor PCA

◮ E.g. gradient descent, AMP ◮ Keep track of an n-dimensional state 19 / 19

slide-109
SLIDE 109

Summary

◮ Local algorithms are suboptimal for tensor PCA

◮ E.g. gradient descent, AMP ◮ Keep track of an n-dimensional state ◮ Nearly-linear runtime 19 / 19

slide-110
SLIDE 110

Summary

◮ Local algorithms are suboptimal for tensor PCA

◮ E.g. gradient descent, AMP ◮ Keep track of an n-dimensional state ◮ Nearly-linear runtime

◮ Why suboptimal?

19 / 19

slide-111
SLIDE 111

Summary

◮ Local algorithms are suboptimal for tensor PCA

◮ E.g. gradient descent, AMP ◮ Keep track of an n-dimensional state ◮ Nearly-linear runtime

◮ Why suboptimal?

◮ Soft threshold: optimal algorithm cannot be nearly-linear time 19 / 19

slide-112
SLIDE 112

Summary

◮ Local algorithms are suboptimal for tensor PCA

◮ E.g. gradient descent, AMP ◮ Keep track of an n-dimensional state ◮ Nearly-linear runtime

◮ Why suboptimal?

◮ Soft threshold: optimal algorithm cannot be nearly-linear time ◮ For p-way data, need p-way algorithm? 19 / 19

slide-113
SLIDE 113

Summary

◮ Local algorithms are suboptimal for tensor PCA

◮ E.g. gradient descent, AMP ◮ Keep track of an n-dimensional state ◮ Nearly-linear runtime

◮ Why suboptimal?

◮ Soft threshold: optimal algorithm cannot be nearly-linear time ◮ For p-way data, need p-way algorithm?

◮ “Redemption” for local algorithms and AMP

19 / 19

slide-114
SLIDE 114

Summary

◮ Local algorithms are suboptimal for tensor PCA

◮ E.g. gradient descent, AMP ◮ Keep track of an n-dimensional state ◮ Nearly-linear runtime

◮ Why suboptimal?

◮ Soft threshold: optimal algorithm cannot be nearly-linear time ◮ For p-way data, need p-way algorithm?

◮ “Redemption” for local algorithms and AMP

◮ Hierarchy of message-passing algorithms: symm. diff. matrices 19 / 19

slide-115
SLIDE 115

Summary

◮ Local algorithms are suboptimal for tensor PCA

◮ E.g. gradient descent, AMP ◮ Keep track of an n-dimensional state ◮ Nearly-linear runtime

◮ Why suboptimal?

◮ Soft threshold: optimal algorithm cannot be nearly-linear time ◮ For p-way data, need p-way algorithm?

◮ “Redemption” for local algorithms and AMP

◮ Hierarchy of message-passing algorithms: symm. diff. matrices ◮ Keep track of beliefs about higher-order correlations 19 / 19

slide-116
SLIDE 116

Summary

◮ Local algorithms are suboptimal for tensor PCA

◮ E.g. gradient descent, AMP ◮ Keep track of an n-dimensional state ◮ Nearly-linear runtime

◮ Why suboptimal?

◮ Soft threshold: optimal algorithm cannot be nearly-linear time ◮ For p-way data, need p-way algorithm?

◮ “Redemption” for local algorithms and AMP

◮ Hierarchy of message-passing algorithms: symm. diff. matrices ◮ Keep track of beliefs about higher-order correlations ◮ Minimize Kikuchi free energy 19 / 19

slide-117
SLIDE 117

Summary

◮ Local algorithms are suboptimal for tensor PCA

◮ E.g. gradient descent, AMP ◮ Keep track of an n-dimensional state ◮ Nearly-linear runtime

◮ Why suboptimal?

◮ Soft threshold: optimal algorithm cannot be nearly-linear time ◮ For p-way data, need p-way algorithm?

◮ “Redemption” for local algorithms and AMP

◮ Hierarchy of message-passing algorithms: symm. diff. matrices ◮ Keep track of beliefs about higher-order correlations ◮ Minimize Kikuchi free energy ◮ Matches SoS (conjectured optimal) 19 / 19

slide-118
SLIDE 118

Summary

◮ Local algorithms are suboptimal for tensor PCA

◮ E.g. gradient descent, AMP ◮ Keep track of an n-dimensional state ◮ Nearly-linear runtime

◮ Why suboptimal?

◮ Soft threshold: optimal algorithm cannot be nearly-linear time ◮ For p-way data, need p-way algorithm?

◮ “Redemption” for local algorithms and AMP

◮ Hierarchy of message-passing algorithms: symm. diff. matrices ◮ Keep track of beliefs about higher-order correlations ◮ Minimize Kikuchi free energy ◮ Matches SoS (conjectured optimal) ◮ Proof is much simpler than prior work 19 / 19

slide-119
SLIDE 119

Summary

◮ Local algorithms are suboptimal for tensor PCA

◮ E.g. gradient descent, AMP ◮ Keep track of an n-dimensional state ◮ Nearly-linear runtime

◮ Why suboptimal?

◮ Soft threshold: optimal algorithm cannot be nearly-linear time ◮ For p-way data, need p-way algorithm?

◮ “Redemption” for local algorithms and AMP

◮ Hierarchy of message-passing algorithms: symm. diff. matrices ◮ Keep track of beliefs about higher-order correlations ◮ Minimize Kikuchi free energy ◮ Matches SoS (conjectured optimal) ◮ Proof is much simpler than prior work

◮ Future directions

19 / 19

slide-120
SLIDE 120

Summary

◮ Local algorithms are suboptimal for tensor PCA

◮ E.g. gradient descent, AMP ◮ Keep track of an n-dimensional state ◮ Nearly-linear runtime

◮ Why suboptimal?

◮ Soft threshold: optimal algorithm cannot be nearly-linear time ◮ For p-way data, need p-way algorithm?

◮ “Redemption” for local algorithms and AMP

◮ Hierarchy of message-passing algorithms: symm. diff. matrices ◮ Keep track of beliefs about higher-order correlations ◮ Minimize Kikuchi free energy ◮ Matches SoS (conjectured optimal) ◮ Proof is much simpler than prior work

◮ Future directions

◮ Unify statistical physics and SoS? 19 / 19

slide-121
SLIDE 121

Summary

◮ Local algorithms are suboptimal for tensor PCA

◮ E.g. gradient descent, AMP ◮ Keep track of an n-dimensional state ◮ Nearly-linear runtime

◮ Why suboptimal?

◮ Soft threshold: optimal algorithm cannot be nearly-linear time ◮ For p-way data, need p-way algorithm?

◮ “Redemption” for local algorithms and AMP

◮ Hierarchy of message-passing algorithms: symm. diff. matrices ◮ Keep track of beliefs about higher-order correlations ◮ Minimize Kikuchi free energy ◮ Matches SoS (conjectured optimal) ◮ Proof is much simpler than prior work

◮ Future directions

◮ Unify statistical physics and SoS? ◮ Systematically obtain optimal spectral methods in general? 19 / 19

slide-122
SLIDE 122

Summary

◮ Local algorithms are suboptimal for tensor PCA

◮ E.g. gradient descent, AMP ◮ Keep track of an n-dimensional state ◮ Nearly-linear runtime

◮ Why suboptimal?

◮ Soft threshold: optimal algorithm cannot be nearly-linear time ◮ For p-way data, need p-way algorithm?

◮ “Redemption” for local algorithms and AMP

◮ Hierarchy of message-passing algorithms: symm. diff. matrices ◮ Keep track of beliefs about higher-order correlations ◮ Minimize Kikuchi free energy ◮ Matches SoS (conjectured optimal) ◮ Proof is much simpler than prior work

◮ Future directions

◮ Unify statistical physics and SoS? ◮ Systematically obtain optimal spectral methods in general?

Thanks!

19 / 19