An introduction to chaining, and applications to sublinear - - PowerPoint PPT Presentation

an introduction to chaining and applications to sublinear
SMART_READER_LITE
LIVE PREVIEW

An introduction to chaining, and applications to sublinear - - PowerPoint PPT Presentation

An introduction to chaining, and applications to sublinear algorithms Jelani Nelson Harvard August 28, 2015 Whats this talk about? Whats this talk about? Given a collection of random variables X 1 , X 2 , . . . , , we would like to say


slide-1
SLIDE 1

An introduction to chaining, and applications to sublinear algorithms

Jelani Nelson

Harvard

August 28, 2015

slide-2
SLIDE 2

What’s this talk about?

slide-3
SLIDE 3

What’s this talk about?

Given a collection of random variables X1, X2, . . . ,, we would like to say that maxi Xi is small with high probability. (Happens all

  • ver computer science, e.g. “Chernion” (Chernoff+Union) bound)
slide-4
SLIDE 4

What’s this talk about?

Given a collection of random variables X1, X2, . . . ,, we would like to say that maxi Xi is small with high probability. (Happens all

  • ver computer science, e.g. “Chernion” (Chernoff+Union) bound)

Today’s topic: Beating the Union Bound

slide-5
SLIDE 5

What’s this talk about?

Given a collection of random variables X1, X2, . . . ,, we would like to say that maxi Xi is small with high probability. (Happens all

  • ver computer science, e.g. “Chernion” (Chernoff+Union) bound)

Today’s topic: Beating the Union Bound

Disclaimer: This is an educational talk, about ideas which aren’t mine.

slide-6
SLIDE 6

A first example

  • T ⊂ Bℓn

2

slide-7
SLIDE 7

A first example

  • T ⊂ Bℓn

2

  • Random variables (Zx)x∈T

Zx = g, x for a vector g with i.i.d. N(0, 1) entries

slide-8
SLIDE 8

A first example

  • T ⊂ Bℓn

2

  • Random variables (Zx)x∈T

Zx = g, x for a vector g with i.i.d. N(0, 1) entries

  • Define gaussian mean width g(T) = Eg supx∈T Zx
slide-9
SLIDE 9

A first example

  • T ⊂ Bℓn

2

  • Random variables (Zx)x∈T

Zx = g, x for a vector g with i.i.d. N(0, 1) entries

  • Define gaussian mean width g(T) = Eg supx∈T Zx
  • How can we bound g(T)?
slide-10
SLIDE 10

A first example

  • T ⊂ Bℓn

2

  • Random variables (Zx)x∈T

Zx = g, x for a vector g with i.i.d. N(0, 1) entries

  • Define gaussian mean width g(T) = Eg supx∈T Zx
  • How can we bound g(T)?
  • This talk: four progressively tighter ways to bound g(T),

then applications of techniques to some TCS problems

slide-11
SLIDE 11

Gaussian mean width bound 1: union bound

  • g(T) = E supx∈T Zx = E supx∈T g, x
slide-12
SLIDE 12

Gaussian mean width bound 1: union bound

  • g(T) = E supx∈T Zx = E supx∈T g, x
  • Zx is a gaussian with variance one
slide-13
SLIDE 13

Gaussian mean width bound 1: union bound

  • g(T) = E supx∈T Zx = E supx∈T g, x
  • Zx is a gaussian with variance one

E sup

x∈T

Zx = ∞ P(sup

x∈T

Zx > u)du = u∗ P(sup

x∈T

Zx > u)

  • ≤1

du + ∞

u∗

P(sup

x∈T

Zx > u)

  • ≤|T|·e−u2/2 (union bound)

du ≤ u∗ + |T| · e−u2

∗/2

  • log |T| (set u∗ =
  • 2 log |T|)
slide-14
SLIDE 14

Gaussian mean width bound 1: union bound

  • g(T) = E supx∈T Zx = E supx∈T g, x
  • Zx is a gaussian with variance one

E sup

x∈T

Zx = ∞ P(sup

x∈T

Zx > u)du = u∗ P(sup

x∈T

Zx > u)

  • ≤1

du + ∞

u∗

P(sup

x∈T

Zx > u)

  • ≤|T|·e−u2/2 (union bound)

du ≤ u∗ + |T| · e−u2

∗/2

  • log |T| (set u∗ =
  • 2 log |T|)
slide-15
SLIDE 15

Gaussian mean width bound 1: union bound

  • g(T) = E supx∈T Zx = E supx∈T g, x
  • Zx is a gaussian with variance one

E sup

x∈T

Zx = ∞ P(sup

x∈T

Zx > u)du = u∗ P(sup

x∈T

Zx > u)

  • ≤1

du + ∞

u∗

P(sup

x∈T

Zx > u)

  • ≤|T|·e−u2/2 (union bound)

du ≤ u∗ + |T| · e−u2

∗/2

  • log |T| (set u∗ =
  • 2 log |T|)
slide-16
SLIDE 16

Gaussian mean width bound 2: ε-net

  • g(T) = E supx∈T g, x
  • Let Sε be ε-net of (T, ℓ2)
slide-17
SLIDE 17

Gaussian mean width bound 2: ε-net

  • g(T) = E supx∈T g, x
  • Let Sε be ε-net of (T, ℓ2)
  • g, x = g, x′ + g, x − x′ (x′ = argminy∈T x − y2)

g(T) ≤ g(Sε) + Eg supx∈T

  • g, x − x′
  • ≤ε·g2
slide-18
SLIDE 18

Gaussian mean width bound 2: ε-net

  • g(T) = E supx∈T g, x
  • Let Sε be ε-net of (T, ℓ2)
  • g, x = g, x′ + g, x − x′ (x′ = argminy∈T x − y2)

g(T) ≤ g(Sε) + Eg supx∈T

  • g, x − x′
  • ≤ε·g2
  • log |Sε| + ε(Eg g2

2)1/2

  • log1/2

N(T, ℓ2, ε)

  • smallest ε−net size

+ε√n

slide-19
SLIDE 19

Gaussian mean width bound 2: ε-net

  • g(T) = E supx∈T g, x
  • Let Sε be ε-net of (T, ℓ2)
  • g, x = g, x′ + g, x − x′ (x′ = argminy∈T x − y2)

g(T) ≤ g(Sε) + Eg supx∈T

  • g, x − x′
  • ≤ε·g2
  • log |Sε| + ε(Eg g2

2)1/2

  • log1/2

N(T, ℓ2, ε)

  • smallest ε−net size

+ε√n

  • Choose ε to optimize bound; can never be worse than last

slide (which amounts to choosing ε = 0)

slide-20
SLIDE 20

Gaussian mean width bound 3: ε-net sequence

  • Sk is a (1/2k)-net of T, k ≥ 0

πkx is closest point in Sk to x ∈ T, ∆kx = πkx − πk−1x

slide-21
SLIDE 21

Gaussian mean width bound 3: ε-net sequence

  • Sk is a (1/2k)-net of T, k ≥ 0

πkx is closest point in Sk to x ∈ T, ∆kx = πkx − πk−1x

  • wlog |T| < ∞ (else apply this slide to ε-net of T for ε small)
  • g, x = g, π0x + ∞

k=1 g, ∆kx

slide-22
SLIDE 22

Gaussian mean width bound 3: ε-net sequence

  • Sk is a (1/2k)-net of T, k ≥ 0

πkx is closest point in Sk to x ∈ T, ∆kx = πkx − πk−1x

  • wlog |T| < ∞ (else apply this slide to ε-net of T for ε small)
  • g, x = g, π0x + ∞

k=1 g, ∆kx

  • g(T) ≤ E

g sup x∈T

g, π0x

  • + ∞

k=1 Eg supx∈T g, ∆kx

slide-23
SLIDE 23

Gaussian mean width bound 3: ε-net sequence

  • Sk is a (1/2k)-net of T, k ≥ 0

πkx is closest point in Sk to x ∈ T, ∆kx = πkx − πk−1x

  • wlog |T| < ∞ (else apply this slide to ε-net of T for ε small)
  • g, x = g, π0x + ∞

k=1 g, ∆kx

  • g(T) ≤ E

g sup x∈T

g, π0x

  • + ∞

k=1 Eg supx∈T g, ∆kx

  • |{∆kx : x ∈ T}| ≤ N(T, ℓ2, 1/2k) · N(T, ℓ2, 1/2k−1)

≤ (N(T, ℓ2, 1/2k))2

slide-24
SLIDE 24

Gaussian mean width bound 3: ε-net sequence

  • Sk is a (1/2k)-net of T, k ≥ 0

πkx is closest point in Sk to x ∈ T, ∆kx = πkx − πk−1x

  • wlog |T| < ∞ (else apply this slide to ε-net of T for ε small)
  • g, x = g, π0x + ∞

k=1 g, ∆kx

  • g(T) ≤ E

g sup x∈T

g, π0x

  • + ∞

k=1 Eg supx∈T g, ∆kx

  • |{∆kx : x ∈ T}| ≤ N(T, ℓ2, 1/2k) · N(T, ℓ2, 1/2k−1)

≤ (N(T, ℓ2, 1/2k))2

  • g(T) ∞

k=1(1/2k) · log1/2 N(T, ℓ2, 1/2k)

0 log1/2 N(T, ℓ2, u)du (Dudley’s theorem)

slide-25
SLIDE 25

Gaussian mean width bound 4: generic chaining

  • Again, wlog |T| < ∞. Define T0 ⊆ T1 ⊆ · · · ⊆ Tk∗ = T

|T0| = 1, |Tk| ≤ 22k (call such a sequence “admissible”)

slide-26
SLIDE 26

Gaussian mean width bound 4: generic chaining

  • Again, wlog |T| < ∞. Define T0 ⊆ T1 ⊆ · · · ⊆ Tk∗ = T

|T0| = 1, |Tk| ≤ 22k (call such a sequence “admissible”)

  • Exercise: show Dudley’s theorem is equivalent to

g(T) inf{Tk} admissible ∞

k=1 2k/2 · supx∈T dℓ2(x, Tk)

(should pick Tk to be the best ε = ε(k) net of size 22k)

slide-27
SLIDE 27

Gaussian mean width bound 4: generic chaining

  • Again, wlog |T| < ∞. Define T0 ⊆ T1 ⊆ · · · ⊆ Tk∗ = T

|T0| = 1, |Tk| ≤ 22k (call such a sequence “admissible”)

  • Exercise: show Dudley’s theorem is equivalent to

g(T) inf{Tk} admissible ∞

k=1 2k/2 · supx∈T dℓ2(x, Tk)

(should pick Tk to be the best ε = ε(k) net of size 22k)

  • Fernique’76∗: can pull the supx outside the sum
  • g(T) inf{Tk} supx∈T

k=1 2k/2 · dℓ2(x, Tk) def

= γ2(T, ℓ2)

slide-28
SLIDE 28

Gaussian mean width bound 4: generic chaining

  • Again, wlog |T| < ∞. Define T0 ⊆ T1 ⊆ · · · ⊆ Tk∗ = T

|T0| = 1, |Tk| ≤ 22k (call such a sequence “admissible”)

  • Exercise: show Dudley’s theorem is equivalent to

g(T) inf{Tk} admissible ∞

k=1 2k/2 · supx∈T dℓ2(x, Tk)

(should pick Tk to be the best ε = ε(k) net of size 22k)

  • Fernique’76∗: can pull the supx outside the sum
  • g(T) inf{Tk} supx∈T

k=1 2k/2 · dℓ2(x, Tk) def

= γ2(T, ℓ2) ∗ equivalent upper bound proven by Fernique (who minimized some integral over all measures over T), but reformulated in terms of admissible sequences by Talgarand

slide-29
SLIDE 29

Gaussian mean width bound 4: generic chaining

Proof of Fernique’s bound

g(T) ≤ E

g sup x∈T

g, π0x

  • + E

g sup x∈T ∞

  • k=1

g, ∆kx

  • Yk

(from before)

  • ∀t, P(Yk > t2k/2∆kx2) ≤ et22k/2 (gaussian decay)
slide-30
SLIDE 30

Gaussian mean width bound 4: generic chaining

Proof of Fernique’s bound

g(T) ≤ E

g sup x∈T

g, π0x

  • + E

g sup x∈T ∞

  • k=1

g, ∆kx

  • Yk

(from before)

  • ∀t, P(Yk > t2k/2∆kx2) ≤ et22k/2 (gaussian decay)
  • P(∃x, k Yk > t2k/2∆kx2) ≤

k(22k)2e−t22k/2

slide-31
SLIDE 31

Gaussian mean width bound 4: generic chaining

Proof of Fernique’s bound

g(T) ≤ E

g sup x∈T

g, π0x

  • + E

g sup x∈T ∞

  • k=1

g, ∆kx

  • Yk

(from before)

  • ∀t, P(Yk > t2k/2∆kx2) ≤ et22k/2 (gaussian decay)
  • P(∃x, k Yk > t2k/2∆kx2) ≤

k(22k)2e−t22k/2

E

g sup x∈T

  • k

Yk = ∞ P(sup

x∈T

  • k

Yk > u)du

slide-32
SLIDE 32

Gaussian mean width bound 4: generic chaining

E

g sup x∈T

  • k

Yk = ∞ P(sup

x∈T

  • k

Yk > u)du = γ2(T, ℓ2) · ∞ P(sup

x∈T

  • k

Yk > t sup

x∈T

  • k

2k/2∆kx2)dt (change of variables: u = t sup

x∈T

  • k

2k/2∆kx2 ≃ tγ2(T, ℓ2)) = γ2(T, ℓ2) · [t∗ + ∞

t∗ ∞

  • k=1

(22k)2e−t22k/2dt] ≃ γ2(T, ℓ2) Conclusion: g(T) γ2(T, ℓ2) Talagrand: g(T) ≃ γ2(T, ℓ2) (won’t show today) (“Majorizing measures theorem”)

slide-33
SLIDE 33

Gaussian mean width bound 4: generic chaining

E

g sup x∈T

  • k

Yk = ∞ P(sup

x∈T

  • k

Yk > u)du = γ2(T, ℓ2) · ∞ P(sup

x∈T

  • k

Yk > t sup

x∈T

  • k

2k/2∆kx2)dt (change of variables: u = t sup

x∈T

  • k

2k/2∆kx2 ≃ tγ2(T, ℓ2)) = γ2(T, ℓ2) · [t∗ + ∞

t∗ ∞

  • k=1

(22k)2e−t22k/2dt] ≃ γ2(T, ℓ2)

slide-34
SLIDE 34

Gaussian mean width bound 4: generic chaining

E

g sup x∈T

  • k

Yk = ∞ P(sup

x∈T

  • k

Yk > u)du = γ2(T, ℓ2) · ∞ P(sup

x∈T

  • k

Yk > t sup

x∈T

  • k

2k/2∆kx2)dt (change of variables: u = t sup

x∈T

  • k

2k/2∆kx2 ≃ tγ2(T, ℓ2)) ≤ γ2(T, ℓ2) · [2 + ∞

2

  • k=1

(22k)2e−t22k/2

  • dt]

≃ γ2(T, ℓ2)

slide-35
SLIDE 35

Gaussian mean width bound 4: generic chaining

E

g sup x∈T

  • k

Yk = ∞ P(sup

x∈T

  • k

Yk > u)du = γ2(T, ℓ2) · ∞ P(sup

x∈T

  • k

Yk > t sup

x∈T

  • k

2k/2∆kx2)dt (change of variables: u = t sup

x∈T

  • k

2k/2∆kx2 ≃ tγ2(T, ℓ2)) ≤ γ2(T, ℓ2) · [2 + ∞

2

  • k=1

(22k)2e−t22k/2

  • dt]

≃ γ2(T, ℓ2)

  • Conclusion: g(T) γ2(T, ℓ2)
slide-36
SLIDE 36

Gaussian mean width bound 4: generic chaining

E

g sup x∈T

  • k

Yk = ∞ P(sup

x∈T

  • k

Yk > u)du = γ2(T, ℓ2) · ∞ P(sup

x∈T

  • k

Yk > t sup

x∈T

  • k

2k/2∆kx2)dt (change of variables: u = t sup

x∈T

  • k

2k/2∆kx2 ≃ tγ2(T, ℓ2)) ≤ γ2(T, ℓ2) · [2 + ∞

2

  • k=1

(22k)2e−t22k/2

  • dt]

≃ γ2(T, ℓ2)

  • Conclusion: g(T) γ2(T, ℓ2)
  • Talagrand: g(T) ≃ γ2(T, ℓ2) (won’t show today)

(“Majorizing measures theorem”)

slide-37
SLIDE 37

Are these bounds really different?

  • γ2(T, ℓ2): inf{Tk} supx∈T

k=1 2k/2 · dℓ2(x, Tk)

  • Dudley:

inf{Tk} ∞

k=1 2k/2 · supx∈T dℓ2(x, Tk)

≃ ∞

0 log1/2 N(T, ℓ2, u)du

slide-38
SLIDE 38

Are these bounds really different?

  • γ2(T, ℓ2): inf{Tk} supx∈T

k=1 2k/2 · dℓ2(x, Tk)

  • Dudley:

inf{Tk} ∞

k=1 2k/2 · supx∈T dℓ2(x, Tk)

≃ ∞

0 log1/2 N(T, ℓ2, u)du

  • Dudley not optimal: T = Bℓn

1

slide-39
SLIDE 39

Are these bounds really different?

  • γ2(T, ℓ2): inf{Tk} supx∈T

k=1 2k/2 · dℓ2(x, Tk)

  • Dudley:

inf{Tk} ∞

k=1 2k/2 · supx∈T dℓ2(x, Tk)

≃ ∞

0 log1/2 N(T, ℓ2, u)du

  • Dudley not optimal: T = Bℓn

1

  • supx∈Bℓn

1 g, x = g∞, so g(T) ≃ √log n

  • Exercise: Come up with admissible {Tk} yielding

γ2 √log n (must exist by majorizing measures)

slide-40
SLIDE 40

Are these bounds really different?

  • γ2(T, ℓ2): inf{Tk} supx∈T

k=1 2k/2 · dℓ2(x, Tk)

  • Dudley:

inf{Tk} ∞

k=1 2k/2 · supx∈T dℓ2(x, Tk)

≃ ∞

0 log1/2 N(T, ℓ2, u)du

  • Dudley not optimal: T = Bℓn

1

  • supx∈Bℓn

1 g, x = g∞, so g(T) ≃ √log n

  • Exercise: Come up with admissible {Tk} yielding

γ2 √log n (must exist by majorizing measures)

  • Dudley: log N(Bℓn

1, ℓ2, u) ≃ (1/u2) log n for u not too small

(consider just covering (1/u2)-sparse vectors with u2 in each coordinate). Dudley can only give g(Bℓn

1) log3/2 n.

slide-41
SLIDE 41

Are these bounds really different?

  • γ2(T, ℓ2): inf{Tk} supx∈T

k=1 2k/2 · dℓ2(x, Tk)

  • Dudley:

inf{Tk} ∞

k=1 2k/2 · supx∈T dℓ2(x, Tk)

≃ ∞

0 log1/2 N(T, ℓ2, u)du

  • Dudley not optimal: T = Bℓn

1

  • supx∈Bℓn

1 g, x = g∞, so g(T) ≃ √log n

  • Exercise: Come up with admissible {Tk} yielding

γ2 √log n (must exist by majorizing measures)

  • Dudley: log N(Bℓn

1, ℓ2, u) ≃ (1/u2) log n for u not too small

(consider just covering (1/u2)-sparse vectors with u2 in each coordinate). Dudley can only give g(Bℓn

1) log3/2 n.

  • Simple vanilla ε-net argument gives g(Bℓn

1) poly(n).

slide-42
SLIDE 42

High probability

  • So far just talked about g(T) = Eg supx∈T Zx

But what if we want to know supx∈T Zx is small whp, not just in expectation?

slide-43
SLIDE 43

High probability

  • So far just talked about g(T) = Eg supx∈T Zx

But what if we want to know supx∈T Zx is small whp, not just in expectation?

  • Usual approach: bound Eg supx∈T Z p

x for large p and do

Markov (“moment method”) Can bound moments using chaining too; see (Dirksen’13)

slide-44
SLIDE 44

Applications in computer science

  • Fast RIP matrices (Cand`

es, Tao’06), (Rudelson, Vershynin’06), (Cheragchi, Guruswami, Velingker’13), (N., Price, Wootters’14), (Bourgain’14), (Haviv, Regev’15)

  • Fast JL (Ailon, Liberty’11), (Krahmer, Ward’11), (Bourgain,

Dirksen, N.’15), (Oymak, Recht, Soltanolkotabi’15)

  • Instance-wise JL bounds (Gordon’88), (Klartag,

Mendelson’05), (Mendelson, Pajor, Tomczak-Jaegermann’07), (Dirksen’14)

  • Approximate nearest neighbor (Indyk, Naor’07)
  • Deterministic algorithm to estimate graph cover time (Ding,

Lee, Peres’11)

  • List-decodability of random codes (Wootters’13), (Rudra,

Wootters’14)

  • . . .
slide-45
SLIDE 45

A chaining result for quadratic forms

Theorem

[Krahmer, Mendelson, Rauhut’14] Let A ⊂ Rn×n be a family of matrices, and let σ1, . . . , σn be independent subgaussians. Then E sup

A∈A

|Aσ2

2 − E σ Aσ2 2|

γ2

2(A, · ℓ2→ℓ2) + γ2(A, · ℓ2→ℓ2) · ∆F(A) + ∆ℓ2→ℓ2(A) · ∆F(A)

(∆X is diameter under X-norm)

slide-46
SLIDE 46

A chaining result for quadratic forms

Theorem

[Krahmer, Mendelson, Rauhut’14] Let A ⊂ Rn×n be a family of matrices, and let σ1, . . . , σn be independent subgaussians. Then E sup

A∈A

|Aσ2

2 − E σ Aσ2 2|

γ2

2(A, · ℓ2→ℓ2) + γ2(A, · ℓ2→ℓ2) · ∆F(A) + ∆ℓ2→ℓ2(A) · ∆F(A)

(∆X is diameter under X-norm) Won’t show proof today, but it is similar to bounding g(T) (with some extra tricks). See http://people.seas.harvard. edu/˜minilek/madalgo2015/, Lecture 3.

slide-47
SLIDE 47

Instance-wise bounds for JL

Corollary (Gordon’88, Klartag-Mendelson’05, Mendelson, Pajor, Tomczak-Jaegermann’07, Dirksen’14)

For T ⊆ Sn−1 and 0 < ε < 1/2, let Π ∈ Rm×n have independent subgaussian independent entries with mean zero and variance 1/m for m (g2(T)+1)/ε2. Then E

Π sup x∈T

|Πx2

2 − 1| < ε

slide-48
SLIDE 48

Instance-wise bounds for JL

Proof of Gordon’s theorem

  • For x ∈ T let Ax denote the m × mn matrix:

Ax = 1 √m ·      x1 · · · xn · · · · · · · · · · · · · · · · · · · · · · · · x1 · · · xn · · · · · · · · · · · · . . . . . . . . . · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · x1 · · · xn      .

slide-49
SLIDE 49

Instance-wise bounds for JL

Proof of Gordon’s theorem

  • For x ∈ T let Ax denote the m × mn matrix:

Ax = 1 √m ·      x1 · · · xn · · · · · · · · · · · · · · · · · · · · · · · · x1 · · · xn · · · · · · · · · · · · . . . . . . . . . · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · x1 · · · xn      .

  • Then Πx2

2 = Axσ2 2, where σ is formed by concatenating

rows of Π (multiplied by √m).

slide-50
SLIDE 50

Instance-wise bounds for JL

Proof of Gordon’s theorem

  • For x ∈ T let Ax denote the m × mn matrix:

Ax = 1 √m ·      x1 · · · xn · · · · · · · · · · · · · · · · · · · · · · · · x1 · · · xn · · · · · · · · · · · · . . . . . . . . . · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · x1 · · · xn      .

  • Then Πx2

2 = Axσ2 2, where σ is formed by concatenating

rows of Π (multiplied by √m).

  • Ax − Ay = Ax−y = (1/√m) · x − y2

⇒ γ2(AT, · ℓ2→ℓ2) = γ2(T, ℓ2) ≃ g(T)

slide-51
SLIDE 51

Instance-wise bounds for JL

Proof of Gordon’s theorem

  • For x ∈ T let Ax denote the m × mn matrix:

Ax = 1 √m ·      x1 · · · xn · · · · · · · · · · · · · · · · · · · · · · · · x1 · · · xn · · · · · · · · · · · · . . . . . . . . . · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · x1 · · · xn      .

  • Then Πx2

2 = Axσ2 2, where σ is formed by concatenating

rows of Π (multiplied by √m).

  • Ax − Ay = Ax−y = (1/√m) · x − y2

⇒ γ2(AT, · ℓ2→ℓ2) = γ2(T, ℓ2) ≃ g(T)

  • ∆F(AT) = 1, ∆ℓ2→ℓ2(AT) = 1/√m
slide-52
SLIDE 52

Instance-wise bounds for JL

Proof of Gordon’s theorem

  • For x ∈ T let Ax denote the m × mn matrix:

Ax = 1 √m ·      x1 · · · xn · · · · · · · · · · · · · · · · · · · · · · · · x1 · · · xn · · · · · · · · · · · · . . . . . . . . . · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · x1 · · · xn      .

  • Then Πx2

2 = Axσ2 2, where σ is formed by concatenating

rows of Π (multiplied by √m).

  • Ax − Ay = Ax−y = (1/√m) · x − y2

⇒ γ2(AT, · ℓ2→ℓ2) = γ2(T, ℓ2) ≃ g(T)

  • ∆F(AT) = 1, ∆ℓ2→ℓ2(AT) = 1/√m
  • Thus EΠ supx∈T |Πx2

2 − 1| g2(T)/m + g(T)/√m + 1/√m

slide-53
SLIDE 53

Instance-wise bounds for JL

Proof of Gordon’s theorem

  • For x ∈ T let Ax denote the m × mn matrix:

Ax = 1 √m ·      x1 · · · xn · · · · · · · · · · · · · · · · · · · · · · · · x1 · · · xn · · · · · · · · · · · · . . . . . . . . . · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · x1 · · · xn      .

  • Then Πx2

2 = Axσ2 2, where σ is formed by concatenating

rows of Π (multiplied by √m).

  • Ax − Ay = Ax−y = (1/√m) · x − y2

⇒ γ2(AT, · ℓ2→ℓ2) = γ2(T, ℓ2) ≃ g(T)

  • ∆F(AT) = 1, ∆ℓ2→ℓ2(AT) = 1/√m
  • Thus EΠ supx∈T |Πx2

2 − 1| g2(T)/m + g(T)/√m + 1/√m

  • Set m (g2(T)+1)/ε2
slide-54
SLIDE 54

Consequences of Gordon’s theorem

m (g2(T)+1)/ε2

  • |T| < ∞: g2(T) log |T| (JL)
  • T a d-dim subspace: g2(T) ≃ d (subspace embeddings)
  • T all k-sparse vectors: g2(T) ≃ k log(n/k) (RIP)
slide-55
SLIDE 55

Consequences of Gordon’s theorem

m (g2(T)+1)/ε2

  • |T| < ∞: g2(T) log |T| (JL)
  • T a d-dim subspace: g2(T) ≃ d (subspace embeddings)
  • T all k-sparse vectors: g2(T) ≃ k log(n/k) (RIP)
  • more applications to constrained least squares, manifold

learning, model-based compressed sensing, . . . (see (Dirksen’14) and (Bourgain, Dirksen, N.’15))

slide-56
SLIDE 56

Chaining isn’t just for gaussians

slide-57
SLIDE 57

Chaining without gaussians: RIP (Rudelson, Vershynin’06)

“Restricted isometry property” useful in compressed sensing. T = {x : x0 ≤ k, x2 = 1}.

Theorem (Cand` es-Tao’06, Donoho’06, Cand´ es’08)

If Π satisfies (ε∗, k)-RIP for ε∗ < √ 2 − 1 then there is a linear program which, given Πx and Π as input, recovers ˜ x in polynomial time such that x − ˜ x2 ≤ O(1/

√ k) · miny0≤k x − y1.

slide-58
SLIDE 58

Chaining without gaussians: RIP (Rudelson, Vershynin’06)

“Restricted isometry property” useful in compressed sensing. T = {x : x0 ≤ k, x2 = 1}.

Theorem (Cand` es-Tao’06, Donoho’06, Cand´ es’08)

If Π satisfies (ε∗, k)-RIP for ε∗ < √ 2 − 1 then there is a linear program which, given Πx and Π as input, recovers ˜ x in polynomial time such that x − ˜ x2 ≤ O(1/

√ k) · miny0≤k x − y1.

Of interest to show sampling rows of discrete Fourier matrix is RIP

slide-59
SLIDE 59

Chaining without gaussians: RIP (Rudelson, Vershynin’06)

  • (Unnormalized) Fourier matrix F, rows: z∗

1, . . . , z∗ n

  • δ1, . . . , δn independent Bernoulli with expectation m/n
slide-60
SLIDE 60

Chaining without gaussians: RIP (Rudelson, Vershynin’06)

  • (Unnormalized) Fourier matrix F, rows: z∗

1, . . . , z∗ n

  • δ1, . . . , δn independent Bernoulli with expectation m/n
  • Want

E

δ sup T⊂[n] |T|≤k

IT − 1 m

n

  • i=1

δiz(T)

i

z(T)∗

i

< ε

slide-61
SLIDE 61

Chaining without gaussians: RIP (Rudelson, Vershynin’06)

LHS = E

δ sup T⊂[n] |T|≤k

  • IT
  • E

δ′

1 m

n

  • i=1

δ′

iz(T) i

z(T)∗

i

− 1 m

n

  • i=1

δiz(T)

i

z(T)∗

i

  • ≤ 1

m E

δ,δ′ sup T

  • n
  • i=1

(δ′

i − δi)z(T) i

z(T)∗

i

(Jensen) = π 2 · 1 m E

δ,δ′,σ sup T

E

g n

  • i=1

|gi|σi(δ′

i − δi)z(T) i

z(T)∗

i

√ 2π · 1 m E

δ,g sup T

  • n
  • i=1

giδiz(T)

i

z(T)∗

i

(Jensen+triangle ineq) ≃ 1 m E

δ E g

sup

x∈Bn,k

2

|

n

  • i=1

giδi zi, x2 | (gaussian mean width!)

slide-62
SLIDE 62

Chaining without gaussians: RIP (Rudelson, Vershynin’06)

LHS = E

δ sup T⊂[n] |T|≤k

  • IT
  • E

δ′

1 m

n

  • i=1

δ′

iz(T) i

z(T)∗

i

− 1 m

n

  • i=1

δiz(T)

i

z(T)∗

i

  • ≤ 1

m E

δ,δ′ sup T

  • n
  • i=1

(δ′

i − δi)z(T) i

z(T)∗

i

(Jensen) = π 2 · 1 m E

δ,δ′,σ sup T

E

g n

  • i=1

|gi|σi(δ′

i − δi)z(T) i

z(T)∗

i

√ 2π · 1 m E

δ,g sup T

  • n
  • i=1

giδiz(T)

i

z(T)∗

i

(Jensen+triangle ineq) ≃ 1 m E

δ E g

sup

x∈Bn,k

2

|

n

  • i=1

giδi zi, x2 | (gaussian mean width!)

slide-63
SLIDE 63

Chaining without gaussians: RIP (Rudelson, Vershynin’06)

LHS = E

δ sup T⊂[n] |T|≤k

  • IT
  • E

δ′

1 m

n

  • i=1

δ′

iz(T) i

z(T)∗

i

− 1 m

n

  • i=1

δiz(T)

i

z(T)∗

i

  • ≤ 1

m E

δ,δ′ sup T

  • n
  • i=1

(δ′

i − δi)z(T) i

z(T)∗

i

(Jensen) = π 2 · 1 m E

δ,δ′,σ sup T

E

g n

  • i=1

|gi|σi(δ′

i − δi)z(T) i

z(T)∗

i

√ 2π · 1 m E

δ,g sup T

  • n
  • i=1

giδiz(T)

i

z(T)∗

i

(Jensen+triangle ineq) ≃ 1 m E

δ E g

sup

x∈Bn,k

2

|

n

  • i=1

giδi zi, x2 | (gaussian mean width!)

slide-64
SLIDE 64

Chaining without gaussians: RIP (Rudelson, Vershynin’06)

LHS = E

δ sup T⊂[n] |T|≤k

  • IT
  • E

δ′

1 m

n

  • i=1

δ′

iz(T) i

z(T)∗

i

− 1 m

n

  • i=1

δiz(T)

i

z(T)∗

i

  • ≤ 1

m E

δ,δ′ sup T

  • n
  • i=1

(δ′

i − δi)z(T) i

z(T)∗

i

(Jensen) = π 2 · 1 m E

δ,δ′,σ sup T

E

g n

  • i=1

|gi|σi(δ′

i − δi)z(T) i

z(T)∗

i

√ 2π · 1 m E

δ,g sup T

  • n
  • i=1

giδiz(T)

i

z(T)∗

i

(Jensen+triangle ineq) ≃ 1 m E

δ E g

sup

x∈Bn,k

2

|

n

  • i=1

giδi zi, x2 | (gaussian mean width!)

slide-65
SLIDE 65

Chaining without gaussians: RIP (Rudelson, Vershynin’06)

LHS = E

δ sup T⊂[n] |T|≤k

  • IT
  • E

δ′

1 m

n

  • i=1

δ′

iz(T) i

z(T)∗

i

− 1 m

n

  • i=1

δiz(T)

i

z(T)∗

i

  • ≤ 1

m E

δ,δ′ sup T

  • n
  • i=1

(δ′

i − δi)z(T) i

z(T)∗

i

(Jensen) = π 2 · 1 m E

δ,δ′,σ sup T

E

g n

  • i=1

|gi|σi(δ′

i − δi)z(T) i

z(T)∗

i

√ 2π · 1 m E

δ,g sup T

  • n
  • i=1

giδiz(T)

i

z(T)∗

i

(Jensen+triangle ineq) ≃ 1 m E

δ E g

sup

x∈Bn,k

2

|

n

  • i=1

giδi zi, x2 | (gaussian mean width!)

slide-66
SLIDE 66

The End

slide-67
SLIDE 67

June 22nd+23rd: workshop on concentration of measure / chaining at Harvard, after STOC’16. Details+website forthcoming.