when fourier siirvs: fourier-based testing for families of - - PowerPoint PPT Presentation

when fourier siirvs fourier based testing for families of
SMART_READER_LITE
LIVE PREVIEW

when fourier siirvs: fourier-based testing for families of - - PowerPoint PPT Presentation

when fourier siirvs: fourier-based testing for families of distributions Clment Canonne 1 , Ilias Diakonikolas, 2 and Alistair Stewart 2 March 19, 2018 Stanford University 1 and University of Southern California 2 background, context, and


slide-1
SLIDE 1

when fourier siirvs: fourier-based testing for families of distributions

Clément Canonne1, Ilias Diakonikolas,2 and Alistair Stewart2 March 19, 2018

Stanford University1 and University of Southern California2

slide-2
SLIDE 2

background, context, and motivation

slide-3
SLIDE 3

property testing

Sublinear-time, approximate, randomized decision algorithms that make local queries to their input. ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options ∙ Good Enough: a priori knowledge Need to infer information – one bit – from the data: quickly, or with very few lookups.

2

slide-4
SLIDE 4

property testing

Sublinear-time, approximate, randomized decision algorithms that make local queries to their input. ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options ∙ Good Enough: a priori knowledge Need to infer information – one bit – from the data: quickly, or with very few lookups.

2

slide-5
SLIDE 5

property testing

Sublinear-time, approximate, randomized decision algorithms that make local queries to their input. ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options ∙ Good Enough: a priori knowledge Need to infer information – one bit – from the data: quickly, or with very few lookups.

2

slide-6
SLIDE 6

property testing

Sublinear-time, approximate, randomized decision algorithms that make local queries to their input. ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options ∙ Good Enough: a priori knowledge Need to infer information – one bit – from the data: quickly, or with very few lookups.

2

slide-7
SLIDE 7

property testing

Sublinear-time, approximate, randomized decision algorithms that make local queries to their input. ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options ∙ Good Enough: a priori knowledge Need to infer information – one bit – from the data: quickly, or with very few lookups.

2

slide-8
SLIDE 8

property testing

Sublinear-time, approximate, randomized decision algorithms that make local queries to their input. ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options ∙ Good Enough: a priori knowledge Need to infer information – one bit – from the data: quickly, or with very few lookups.

2

slide-9
SLIDE 9

property testing

Sublinear-time, approximate, randomized decision algorithms that make local queries to their input. ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options ∙ Good Enough: a priori knowledge Need to infer information – one bit – from the data: quickly, or with very few lookups.

2

slide-10
SLIDE 10

property testing

Sublinear-time, approximate, randomized decision algorithms that make local queries to their input. ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options ∙ Good Enough: a priori knowledge Need to infer information – one bit – from the data: quickly, or with very few lookups.

2

slide-11
SLIDE 11

property testing

Sublinear-time, approximate, randomized decision algorithms that make local queries to their input. ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options ∙ Good Enough: a priori knowledge Need to infer information – one bit – from the data: quickly, or with very few lookups.

2

slide-12
SLIDE 12

property testing

Figure: Property Testing: Inside the yolk, or outside the egg.

3

slide-13
SLIDE 13

property testing

Introduced by [RS96, GGR98] – has been a very active area since. ∙ Known space (e.g., {0, 1}N) ∙ Property P ⊆ {0, 1}N ∙ Oracle access to unknown x ∈ {0, 1}N ∙ Proximity parameter ε ∈ (0, 1] Must decide x ∈ P vs. dist(x, P) > ε (has the property, or is ε-far from it) Many variants, subareas, with a plethora of results (see e.g. [Ron08, Ron10, Gol10, Gol17, BY17]).

4

slide-14
SLIDE 14

distribution testing

Now, our “big object” is a probability distribution over a (discrete) domain Ω (here,* |Ω| = n). ∙ instead of queries: samples ∙ instead of Hamming distance: total variation ∙ instead of functions/graphs/strings: distributions Focus on the sample complexity, with efficiency as ancillary (yet important) goal.

5

slide-15
SLIDE 15

distribution testing

Now, our “big object” is a probability distribution over a (discrete) domain Ω (here,* |Ω| = n). ∙ instead of queries: samples ∙ instead of Hamming distance: total variation ∙ instead of functions/graphs/strings: distributions Focus on the sample complexity, with efficiency as ancillary (yet important) goal.

5

slide-16
SLIDE 16

distribution testing

Now, our “big object” is a probability distribution over a (discrete) domain Ω (here,* |Ω| = n). ∙ instead of queries: samples ∙ instead of Hamming distance: total variation ∙ instead of functions/graphs/strings: distributions Focus on the sample complexity, with efficiency as ancillary (yet important) goal.

5

slide-17
SLIDE 17

distribution testing

Now, our “big object” is a probability distribution over a (discrete) domain Ω (here,* |Ω| = n). ∙ instead of queries: samples ∙ instead of Hamming distance: total variation ∙ instead of functions/graphs/strings: distributions Focus on the sample complexity, with efficiency as ancillary (yet important) goal.

5

slide-18
SLIDE 18

distribution testing

Now, our “big object” is a probability distribution over a (discrete) domain Ω (here,* |Ω| = n). ∙ instead of queries: samples ∙ instead of Hamming distance: total variation ∙ instead of functions/graphs/strings: distributions Focus on the sample complexity, with efficiency as ancillary (yet important) goal.

5

slide-19
SLIDE 19

distribution testing

Now, our “big object” is a probability distribution over a (discrete) domain Ω (here,* |Ω| = n). ∙ instead of queries: samples ∙ instead of Hamming distance: total variation ∙ instead of functions/graphs/strings: distributions Focus on the sample complexity, with efficiency as ancillary (yet important) goal.

5

slide-20
SLIDE 20

distribution testing

Now, our “big object” is a probability distribution over a (discrete) domain Ω (here,* |Ω| = n). ∙ instead of queries: samples ∙ instead of Hamming distance: total variation ∙ instead of functions/graphs/strings: distributions Focus on the sample complexity, with efficiency as ancillary (yet important) goal.

5

slide-21
SLIDE 21

background

Over the past 15 years, many results on many properties: ∙ Uniformity [GR00, BFR 00, Pan08, DGPP16] ∙ Identity [BFF 01, VV17, BCG17] ∙ Equivalence [BFR 00, Val11, CDVV14] ∙ Independence [BFF 01, LRR13, DK16, ADK15] ∙ Monotonicity [BKR04, BFRV11, CDGR16, ADK15] ∙ Poisson Binomial Distributions [AD15, CDGR16] ∙ Log-concavity [CDGR16, ADK15] ∙ and more… [Rub12, Can15] Much has been done; and yet…

6

slide-22
SLIDE 22

background

Over the past 15 years, many results on many properties: ∙ Uniformity [GR00, BFR+00, Pan08, DGPP16] ∙ Identity [BFF 01, VV17, BCG17] ∙ Equivalence [BFR 00, Val11, CDVV14] ∙ Independence [BFF 01, LRR13, DK16, ADK15] ∙ Monotonicity [BKR04, BFRV11, CDGR16, ADK15] ∙ Poisson Binomial Distributions [AD15, CDGR16] ∙ Log-concavity [CDGR16, ADK15] ∙ and more… [Rub12, Can15] Much has been done; and yet…

6

slide-23
SLIDE 23

background

Over the past 15 years, many results on many properties: ∙ Uniformity [GR00, BFR+00, Pan08, DGPP16] ∙ Identity [BFF+01, VV17, BCG17] ∙ Equivalence [BFR 00, Val11, CDVV14] ∙ Independence [BFF 01, LRR13, DK16, ADK15] ∙ Monotonicity [BKR04, BFRV11, CDGR16, ADK15] ∙ Poisson Binomial Distributions [AD15, CDGR16] ∙ Log-concavity [CDGR16, ADK15] ∙ and more… [Rub12, Can15] Much has been done; and yet…

6

slide-24
SLIDE 24

background

Over the past 15 years, many results on many properties: ∙ Uniformity [GR00, BFR+00, Pan08, DGPP16] ∙ Identity [BFF+01, VV17, BCG17] ∙ Equivalence [BFR+00, Val11, CDVV14] ∙ Independence [BFF 01, LRR13, DK16, ADK15] ∙ Monotonicity [BKR04, BFRV11, CDGR16, ADK15] ∙ Poisson Binomial Distributions [AD15, CDGR16] ∙ Log-concavity [CDGR16, ADK15] ∙ and more… [Rub12, Can15] Much has been done; and yet…

6

slide-25
SLIDE 25

background

Over the past 15 years, many results on many properties: ∙ Uniformity [GR00, BFR+00, Pan08, DGPP16] ∙ Identity [BFF+01, VV17, BCG17] ∙ Equivalence [BFR+00, Val11, CDVV14] ∙ Independence [BFF+01, LRR13, DK16, ADK15] ∙ Monotonicity [BKR04, BFRV11, CDGR16, ADK15] ∙ Poisson Binomial Distributions [AD15, CDGR16] ∙ Log-concavity [CDGR16, ADK15] ∙ and more… [Rub12, Can15] Much has been done; and yet…

6

slide-26
SLIDE 26

background

Over the past 15 years, many results on many properties: ∙ Uniformity [GR00, BFR+00, Pan08, DGPP16] ∙ Identity [BFF+01, VV17, BCG17] ∙ Equivalence [BFR+00, Val11, CDVV14] ∙ Independence [BFF+01, LRR13, DK16, ADK15] ∙ Monotonicity [BKR04, BFRV11, CDGR16, ADK15] ∙ Poisson Binomial Distributions [AD15, CDGR16] ∙ Log-concavity [CDGR16, ADK15] ∙ and more… [Rub12, Can15] Much has been done; and yet…

6

slide-27
SLIDE 27

background

Over the past 15 years, many results on many properties: ∙ Uniformity [GR00, BFR+00, Pan08, DGPP16] ∙ Identity [BFF+01, VV17, BCG17] ∙ Equivalence [BFR+00, Val11, CDVV14] ∙ Independence [BFF+01, LRR13, DK16, ADK15] ∙ Monotonicity [BKR04, BFRV11, CDGR16, ADK15] ∙ Poisson Binomial Distributions [AD15, CDGR16] ∙ Log-concavity [CDGR16, ADK15] ∙ and more… [Rub12, Can15] Much has been done; and yet…

6

slide-28
SLIDE 28

background

Over the past 15 years, many results on many properties: ∙ Uniformity [GR00, BFR+00, Pan08, DGPP16] ∙ Identity [BFF+01, VV17, BCG17] ∙ Equivalence [BFR+00, Val11, CDVV14] ∙ Independence [BFF+01, LRR13, DK16, ADK15] ∙ Monotonicity [BKR04, BFRV11, CDGR16, ADK15] ∙ Poisson Binomial Distributions [AD15, CDGR16] ∙ Log-concavity [CDGR16, ADK15] ∙ and more… [Rub12, Can15] Much has been done; and yet…

6

slide-29
SLIDE 29

background

Over the past 15 years, many results on many properties: ∙ Uniformity [GR00, BFR+00, Pan08, DGPP16] ∙ Identity [BFF+01, VV17, BCG17] ∙ Equivalence [BFR+00, Val11, CDVV14] ∙ Independence [BFF+01, LRR13, DK16, ADK15] ∙ Monotonicity [BKR04, BFRV11, CDGR16, ADK15] ∙ Poisson Binomial Distributions [AD15, CDGR16] ∙ Log-concavity [CDGR16, ADK15] ∙ and more… [Rub12, Can15] Much has been done; and yet…

6

slide-30
SLIDE 30

background

Over the past 15 years, many results on many properties: ∙ Uniformity [GR00, BFR+00, Pan08, DGPP16] ∙ Identity [BFF+01, VV17, BCG17] ∙ Equivalence [BFR+00, Val11, CDVV14] ∙ Independence [BFF+01, LRR13, DK16, ADK15] ∙ Monotonicity [BKR04, BFRV11, CDGR16, ADK15] ∙ Poisson Binomial Distributions [AD15, CDGR16] ∙ Log-concavity [CDGR16, ADK15] ∙ and more… [Rub12, Can15] Much has been done; and yet…

6

slide-31
SLIDE 31

background

Over the past 15 years, many results on many properties: ∙ Uniformity [GR00, BFR+00, Pan08, DGPP16] ∙ Identity [BFF+01, VV17, BCG17] ∙ Equivalence [BFR+00, Val11, CDVV14] ∙ Independence [BFF+01, LRR13, DK16, ADK15] ∙ Monotonicity [BKR04, BFRV11, CDGR16, ADK15] ∙ Poisson Binomial Distributions [AD15, CDGR16] ∙ Log-concavity [CDGR16, ADK15] ∙ and more… [Rub12, Can15] Much has been done; and yet…

6

slide-32
SLIDE 32
  • ne ring to rule them all?

Techniques Most algorithms and results are somewhat ad hoc, and property-specific. Can we… design general algorithms and approaches that apply to many testing problems at once?

7

slide-33
SLIDE 33
  • ne ring to rule them all?

Techniques Most algorithms and results are somewhat ad hoc, and property-specific. Can we… design general algorithms and approaches that apply to many testing problems at once?

7

slide-34
SLIDE 34

and in the darkness test them

General Trend In learning: [CDSS13, CDSS14, CDSX14, ADLS17] and recently… In testing: [Val11, VV11, CDGR16, ADK15, DK16, BCG17]

8

slide-35
SLIDE 35

and in the darkness test them

General Trend In learning: [CDSS13, CDSS14, CDSX14, ADLS17] and recently… In testing: [Val11, VV11, CDGR16, ADK15, DK16, BCG17]

8

slide-36
SLIDE 36
  • utline of the talk
slide-37
SLIDE 37
  • utline of the talk

∙ Notation, Preliminaries ∙ Overall Goal, Restated

∙ The shape restrictions approach [CDGR16] ∙ The Fourier approach [CDS17]

10

slide-38
SLIDE 38

some notation

slide-39
SLIDE 39

glossary

∙ Probability distributions over [n] := {1, . . . , n} ∆([n]) = { p: [n] → [0, 1] :

n

i=1

p(i) = 1 } ∙ Property (or class) of distributions over n : n ∙ Total variation distance (statistical distance,

1 distance): TV p q S

p S q S 1 2 x p x q x 0 1 Domain size n is big (“goes to ”). Proximity parameter 0 1 is small. Lowercase Greek letters are in 0 1 . Asymptotics O, , hide logarithmic factors.*

12

slide-40
SLIDE 40

glossary

∙ Probability distributions over [n] := {1, . . . , n} ∆([n]) = { p: [n] → [0, 1] :

n

i=1

p(i) = 1 } ∙ Property (or class) of distributions over [n]: P ⊆ ∆([n]) ∙ Total variation distance (statistical distance,

1 distance): TV p q S

p S q S 1 2 x p x q x 0 1 Domain size n is big (“goes to ”). Proximity parameter 0 1 is small. Lowercase Greek letters are in 0 1 . Asymptotics O, , hide logarithmic factors.*

12

slide-41
SLIDE 41

glossary

∙ Probability distributions over [n] := {1, . . . , n} ∆([n]) = { p: [n] → [0, 1] :

n

i=1

p(i) = 1 } ∙ Property (or class) of distributions over [n]: P ⊆ ∆([n]) ∙ Total variation distance (statistical distance, ℓ1 distance): dTV(p, q) = sup

S⊆Ω

(p(S) − q(S)) = 1 2 ∑

x∈Ω

|p(x) − q(x)| ∈ [0, 1] Domain size n is big (“goes to ”). Proximity parameter 0 1 is small. Lowercase Greek letters are in 0 1 . Asymptotics O, , hide logarithmic factors.*

12

slide-42
SLIDE 42

glossary

∙ Probability distributions over [n] := {1, . . . , n} ∆([n]) = { p: [n] → [0, 1] :

n

i=1

p(i) = 1 } ∙ Property (or class) of distributions over [n]: P ⊆ ∆([n]) ∙ Total variation distance (statistical distance, ℓ1 distance): dTV(p, q) = sup

S⊆Ω

(p(S) − q(S)) = 1 2 ∑

x∈Ω

|p(x) − q(x)| ∈ [0, 1] Domain size n ∈ N is big (“goes to ∞”). Proximity parameter 0 1 is small. Lowercase Greek letters are in 0 1 . Asymptotics O, , hide logarithmic factors.*

12

slide-43
SLIDE 43

glossary

∙ Probability distributions over [n] := {1, . . . , n} ∆([n]) = { p: [n] → [0, 1] :

n

i=1

p(i) = 1 } ∙ Property (or class) of distributions over [n]: P ⊆ ∆([n]) ∙ Total variation distance (statistical distance, ℓ1 distance): dTV(p, q) = sup

S⊆Ω

(p(S) − q(S)) = 1 2 ∑

x∈Ω

|p(x) − q(x)| ∈ [0, 1] Domain size n ∈ N is big (“goes to ∞”). Proximity parameter ε ∈ (0, 1] is small. Lowercase Greek letters are in 0 1 . Asymptotics O, , hide logarithmic factors.*

12

slide-44
SLIDE 44

glossary

∙ Probability distributions over [n] := {1, . . . , n} ∆([n]) = { p: [n] → [0, 1] :

n

i=1

p(i) = 1 } ∙ Property (or class) of distributions over [n]: P ⊆ ∆([n]) ∙ Total variation distance (statistical distance, ℓ1 distance): dTV(p, q) = sup

S⊆Ω

(p(S) − q(S)) = 1 2 ∑

x∈Ω

|p(x) − q(x)| ∈ [0, 1] Domain size n ∈ N is big (“goes to ∞”). Proximity parameter ε ∈ (0, 1] is small. Lowercase Greek letters are in (0, 1]. Asymptotics O, , hide logarithmic factors.*

12

slide-45
SLIDE 45

glossary

∙ Probability distributions over [n] := {1, . . . , n} ∆([n]) = { p: [n] → [0, 1] :

n

i=1

p(i) = 1 } ∙ Property (or class) of distributions over [n]: P ⊆ ∆([n]) ∙ Total variation distance (statistical distance, ℓ1 distance): dTV(p, q) = sup

S⊆Ω

(p(S) − q(S)) = 1 2 ∑

x∈Ω

|p(x) − q(x)| ∈ [0, 1] Domain size n ∈ N is big (“goes to ∞”). Proximity parameter ε ∈ (0, 1] is small. Lowercase Greek letters are in (0, 1]. Asymptotics ˜ O, ˜ Ω, ˜ Θ hide logarithmic factors.*

12

slide-46
SLIDE 46

round up the usual suspects

∙ Poisson Binomial Distribution (PBD) X =

n

j=1

Xj, with X1 . . . , Xn ∈ {0, 1} independent. ∙ k-Sum of Independent Integer Random Variables (k-SIIRV) X

n j 1

Xj with X1 Xn 0 1 k 1 independent. ∙ Poisson Multinomial Distribution (PMD) X

n j 1

Xj with X1 Xn e1 ek independent. ∙ (Discrete) Log-Concave p k 2 p k 1 p k 1 and supported on an interval

13

slide-47
SLIDE 47

round up the usual suspects

∙ Poisson Binomial Distribution (PBD) X =

n

j=1

Xj, with X1 . . . , Xn ∈ {0, 1} independent. ∙ k-Sum of Independent Integer Random Variables (k-SIIRV) X =

n

j=1

Xj, with X1 . . . , Xn ∈ {0, 1, . . . , k − 1} independent. ∙ Poisson Multinomial Distribution (PMD) X

n j 1

Xj with X1 Xn e1 ek independent. ∙ (Discrete) Log-Concave p k 2 p k 1 p k 1 and supported on an interval

13

slide-48
SLIDE 48

round up the usual suspects

∙ Poisson Binomial Distribution (PBD) X =

n

j=1

Xj, with X1 . . . , Xn ∈ {0, 1} independent. ∙ k-Sum of Independent Integer Random Variables (k-SIIRV) X =

n

j=1

Xj, with X1 . . . , Xn ∈ {0, 1, . . . , k − 1} independent. ∙ Poisson Multinomial Distribution (PMD) X =

n

j=1

Xj, with X1 . . . , Xn ∈ {e1, . . . , ek} independent. ∙ (Discrete) Log-Concave p k 2 p k 1 p k 1 and supported on an interval

13

slide-49
SLIDE 49

round up the usual suspects

∙ Poisson Binomial Distribution (PBD) X =

n

j=1

Xj, with X1 . . . , Xn ∈ {0, 1} independent. ∙ k-Sum of Independent Integer Random Variables (k-SIIRV) X =

n

j=1

Xj, with X1 . . . , Xn ∈ {0, 1, . . . , k − 1} independent. ∙ Poisson Multinomial Distribution (PMD) X =

n

j=1

Xj, with X1 . . . , Xn ∈ {e1, . . . , ek} independent. ∙ (Discrete) Log-Concave p(k)2 ≥ p(k − 1)p(k + 1) and supported on an interval

13

slide-50
SLIDE 50

but… will we ever learn?

slide-51
SLIDE 51

testing by learning

Trivial baseline in property testing: “you can learn, so you can test.” (i) Learn p without assumptions using a learner for n (ii) Check if

TV p 3

(Computational) Yes, but… (i) has sample complexity n

2 .

15

slide-52
SLIDE 52

testing by learning

Trivial baseline in property testing: “you can learn, so you can test.” (i) Learn p without assumptions using a learner for ∆([n]) (ii) Check if

TV p 3

(Computational) Yes, but… (i) has sample complexity n

2 .

15

slide-53
SLIDE 53

testing by learning

Trivial baseline in property testing: “you can learn, so you can test.” (i) Learn p without assumptions using a learner for ∆([n]) (ii) Check if dTV(ˆ p, P) ≤ ε

3

(Computational) Yes, but… (i) has sample complexity n

2 .

15

slide-54
SLIDE 54

testing by learning

Trivial baseline in property testing: “you can learn, so you can test.” (i) Learn p without assumptions using a learner for ∆([n]) (ii) Check if dTV(ˆ p, P) ≤ ε

3

(Computational) Yes, but… (i) has sample complexity Θ(n/ε2).

15

slide-55
SLIDE 55

testing by learning

“Folklore” baseline in property testing: “if you can learn, you can test.” (i) Learn p as if p using a learner for (ii) Test

TV p p 3 vs. TV p p 2 3

(iii) Check if

TV p 3

(Computational) The triangle inequality does the rest.

16

slide-56
SLIDE 56

testing by learning

“Folklore” baseline in property testing: “if you can learn, you can test.” (i) Learn p as if p ∈ P using a learner for P (ii) Test

TV p p 3 vs. TV p p 2 3

(iii) Check if

TV p 3

(Computational) The triangle inequality does the rest.

16

slide-57
SLIDE 57

testing by learning

“Folklore” baseline in property testing: “if you can learn, you can test.” (i) Learn p as if p ∈ P using a learner for P (ii) Test dTV(ˆ p, p) ≤ ε

3 vs. dTV(ˆ

p, p) ≥ 2ε

3

(iii) Check if

TV p 3

(Computational) The triangle inequality does the rest.

16

slide-58
SLIDE 58

testing by learning

“Folklore” baseline in property testing: “if you can learn, you can test.” (i) Learn p as if p ∈ P using a learner for P (ii) Test dTV(ˆ p, p) ≤ ε

3 vs. dTV(ˆ

p, p) ≥ 2ε

3

(iii) Check if dTV(ˆ p, P) ≤ ε

3

(Computational) The triangle inequality does the rest.

16

slide-59
SLIDE 59

testing by learning

“Folklore” baseline in property testing: “if you can learn, you can test.” (i) Learn p as if p ∈ P using a learner for P (ii) Test dTV(ˆ p, p) ≤ ε

3 vs. dTV(ˆ

p, p) ≥ 2ε

3

(iii) Check if dTV(ˆ p, P) ≤ ε

3

(Computational) The triangle inequality does the rest.

16

slide-60
SLIDE 60

testing by learning?

“Folklore” baseline in property testing: “if you can learn, you can test.” (i) Learn p as if p ∈ P using a learner for P (ii) Test if dTV(ˆ p, p) ≤ ε

3 vs. dTV(ˆ

p, p) ≥ 2ε

3

(iii) Check if dTV(ˆ p, P) ≤ ε

3

(Computational) Not quite. (ii) fine for functions. But for distributions? Requires Ω(

n log n)

samples [VV11, JYW17]

17

slide-61
SLIDE 61

unified approaches: leveraging structure

slide-62
SLIDE 62

swiss army knives

What we want General algorithms applying to all (or many) distribution testing problems. Theorem (Wishful) Let be a class of distributions that all exhibit some “nice structure.” If can be tested with q queries, algorithm can too, with “roughly” q queries as well. More formally, we want: Goal Design general-purpose testing algorithms that, when applied to a property , have (tight, or at least reasonable) sample complexity q as long as satisfies some structural assumption parameterized by .

19

slide-63
SLIDE 63

swiss army knives

What we want General algorithms applying to all (or many) distribution testing problems. Theorem (Wishful) Let P be a class of distributions that all exhibit some “nice structure.” If P can be tested with q queries, algorithm T can too, with “roughly” q queries as well. More formally, we want: Goal Design general-purpose testing algorithms that, when applied to a property , have (tight, or at least reasonable) sample complexity q as long as satisfies some structural assumption parameterized by .

19

slide-64
SLIDE 64

swiss army knives

What we want General algorithms applying to all (or many) distribution testing problems. Theorem (Wishful) Let P be a class of distributions that all exhibit some “nice structure.” If P can be tested with q queries, algorithm T can too, with “roughly” q queries as well. More formally, we want: Goal Design general-purpose testing algorithms that, when applied to a property P, have (tight, or at least reasonable) sample complexity q(ε, τ) as long as P satisfies some structural assumption Sτ parameterized by τ.

19

slide-65
SLIDE 65

swiss army knives: shape restrictions

Structural assumption Sτ: every distribution in P is well-approximated (in a specific ℓ2-type sense) by a piecewise-constant distribution with LP(τ) pieces.

20

slide-66
SLIDE 66

swiss army knives: shape restrictions

Structural assumption Sτ: every distribution in P is well-approximated (in a specific ℓ2-type sense) by a piecewise-constant distribution with LP(τ) pieces. Theorem ([CDGR16]) There exists an algorithm which, given sampling access to an unknown distribution p over [n] and parameter ε ∈ (0, 1], can distinguish with probability 2/3 between (a) p ∈ P versus (b) dTV(p, P) > ε, with ˜ O( √ nLP(ε)/ε3 + LP(ε)/ε2) samples.

20

slide-67
SLIDE 67

swiss army knives: shape restrictions

Outline: Abstracting ideas from [BKR04] (for monotonicity):

  • 1. decomposition step: recursively build a partition Π of [n] in

O(LP(ε)) intervals s.t. p is roughly uniform on each piece. If successful, then p will be close to its “flattening” q on Π; if not, we have proof that p / ∈ P and we can reject.

  • 2. approximation step: learn q. Can be done with few samples

since Π has few intervals.

  • 3. projection step: (computational) verify that dTV(q, P) < O(ε).

Applications ∙ monotonicity ∙ unimodality ∙ k-modality ∙ k-histograms ∙ log-concavity ∙ Poisson Binomial ∙ Monotone Hazard Rate …

21

slide-68
SLIDE 68

swiss army knives: shape restrictions

Outline: Abstracting ideas from [BKR04] (for monotonicity):

  • 1. decomposition step: recursively build a partition Π of [n] in

O(LP(ε)) intervals s.t. p is roughly uniform on each piece. If successful, then p will be close to its “flattening” q on Π; if not, we have proof that p / ∈ P and we can reject.

  • 2. approximation step: learn q. Can be done with few samples

since Π has few intervals.

  • 3. projection step: (computational) verify that dTV(q, P) < O(ε).

Applications ∙ monotonicity ∙ unimodality ∙ k-modality ∙ k-histograms ∙ log-concavity ∙ Poisson Binomial ∙ Monotone Hazard Rate …

21

slide-69
SLIDE 69

that’s great! but…

Figure: A 3-SIIRV (for n = 100). Like all of us, it has ups and downs.

22

slide-70
SLIDE 70

swiss army knives: fourier sparsity

Structural assumption Sτ: every distribution in P has sparse Fourier and effective support: ∃MP(τ), SP(τ) s.t. ∀p ∈ P, ∃Ip ⊆ [n] with |Ip| ≤ MP(τ) ∥ˆ p1SP(ε)∥

2 ≤ O(ε),

∥p1Ip∥1 ≤ O(ε) Theorem ([CDS17]) There exists an algorithm which, given sampling access to an unknown distribution p over [n] and parameter ε ∈ (0, 1], can distinguish with probability 2/3 between (a) p ∈ P versus (b) dTV(p, P) > ε, with ˜ O( √ |SP(ε)| MP(ε)/ε2 + |SP(ε)| /ε2) samples.

23

slide-71
SLIDE 71

swiss army knives: fourier sparsity

Outline:

  • 1. effective support test: take samples to identify a candidate Ip,

and check |Ip| ≤ M(ε)

  • 2. Fourier effective support test: invoke a Fourier sparsity

subroutine to check that ∥ˆ p1SP(ε)∥

2 ≤ O(ε) (if so learn q, inverse

Fourier transform of ˆ p1SP(ε))

  • 3. projection step: (computational) verify that dTV(q, P) < O(ε).

Applications ∙ k-SIIRVS ∙ Poisson Binomial ∙ Poisson Multinomial ∙ log-concavity

24

slide-72
SLIDE 72

swiss army knives: fourier sparsity

Outline:

  • 1. effective support test: take samples to identify a candidate Ip,

and check |Ip| ≤ M(ε)

  • 2. Fourier effective support test: invoke a Fourier sparsity

subroutine to check that ∥ˆ p1SP(ε)∥

2 ≤ O(ε) (if so learn q, inverse

Fourier transform of ˆ p1SP(ε))

  • 3. projection step: (computational) verify that dTV(q, P) < O(ε).

Applications ∙ k-SIIRVS ∙ Poisson Binomial ∙ Poisson Multinomial ∙ log-concavity

24

slide-73
SLIDE 73

in more detail

slide-74
SLIDE 74

fourier sparsity: the guiding example

Theorem (Testing SIIRVs) There exists an algorithm that, given k, n ∈ N, ε ∈ (0, 1], and sample access to p ∈ ∆(N), tests the class of k-SIIRVs with O (kn1/4 ε2 log1/4 1 ε + k2 ε2 log2 k ε ) samples from p, and runs in time n(k/ε)O(k log(k/ε)). First non-trivial tester for SIIRVs. Near-optimal for constant k: lower bound of

k1 2n1 4

2

[CDGR16].

26

slide-75
SLIDE 75

fourier sparsity: the guiding example

Theorem (Testing SIIRVs) There exists an algorithm that, given k, n ∈ N, ε ∈ (0, 1], and sample access to p ∈ ∆(N), tests the class of k-SIIRVs with O (kn1/4 ε2 log1/4 1 ε + k2 ε2 log2 k ε ) samples from p, and runs in time n(k/ε)O(k log(k/ε)). First non-trivial tester for SIIRVs. Near-optimal for constant k: lower bound of

k1 2n1 4

2

[CDGR16].

26

slide-76
SLIDE 76

fourier sparsity: the guiding example

Theorem (Testing SIIRVs) There exists an algorithm that, given k, n ∈ N, ε ∈ (0, 1], and sample access to p ∈ ∆(N), tests the class of k-SIIRVs with O (kn1/4 ε2 log1/4 1 ε + k2 ε2 log2 k ε ) samples from p, and runs in time n(k/ε)O(k log(k/ε)). First non-trivial tester for SIIRVs. Near-optimal for constant k: lower bound of Ω (

k1/2n1/4 ε2

) [CDGR16].

26

slide-77
SLIDE 77

fourier sparsity: the guiding example

Theorem (Testing SIIRVs) There exists an algorithm that, given k, n ∈ N, ε ∈ (0, 1], and sample access to p ∈ ∆(N), tests the class of k-SIIRVs with O (kn1/4 ε2 log1/4 1 ε + k2 ε2 log2 k ε ) samples from p, and runs in time n(k/ε)O(k log(k/ε)). First non-trivial tester for SIIRVs. Near-optimal for constant k: lower bound of Ω (

k1/2n1/4 ε2

) [CDGR16].

26

slide-78
SLIDE 78

fourier sparsity: the guiding example

k-SIIRVs… ∙ are very badly approximated by histograms ∙ have sparse effective support ∙ have nicely bounded

2 norm

∙ have very nice Fourier spectrum

27

slide-79
SLIDE 79

fourier sparsity: the guiding example

k-SIIRVs… ∙ are very badly approximated by histograms ∙ have sparse effective support ∙ have nicely bounded

2 norm

∙ have very nice Fourier spectrum

27

slide-80
SLIDE 80

fourier sparsity: the guiding example

k-SIIRVs… ∙ are very badly approximated by histograms ∙ have sparse effective support ∙ have nicely bounded ℓ2 norm ∙ have very nice Fourier spectrum

27

slide-81
SLIDE 81

fourier sparsity: the guiding example

k-SIIRVs… ∙ are very badly approximated by histograms ∙ have sparse effective support ∙ have nicely bounded ℓ2 norm ∙ have very nice Fourier spectrum

27

slide-82
SLIDE 82

fourier sparsity (the fine print)

Theorem (General Testing Statement)

Let P ⊆ ∆(N) be a property satisfying the following. ∃ S: (0, 1] → 2N, M: (0, 1] → N, and qI : (0, 1] → N s.t. for all ε ∈ (0, 1],

  • 1. Fourier sparsity: ∀p ∈ P, the Fourier transform (modulo M(ε)) of p is concentrated on S(ε):

namely, ∥ p1S(ε)∥2

2 ≤ O(ε2).

  • 2. Support sparsity: ∀p ∈ P, ∃ interval I ⊆ N with |I| ≤ M(ε) such that (i) p is concentrated
  • n I: p(I) ≥ 1 − O(ε) and (ii) I can be identified w.h.p. with qI(ε) samples.
  • 3. Projection: there is a procedure ProjectP which, on input ε and the explicit description of

h ∈ ∆(N), runs in time T(ε) and distinguishes between dTV(h, P) ≤ 2ε

5 , and

dTV(h, P) > ε

2 .

  • 4. (Optional) L2-norm bound: ∃b ∈ (0, 1] s.t. ∥p∥2

2 ≤ b ∀p ∈ P.

Then, ∃ a tester for P with sample complexity m equal to O ( √ |S(ε)| M(ε) ε2 + |S(ε)| ε2 + qI(ε) ) (if (iv) holds, can replace by O ( √

bM(ε) ε2

+ |S(ε)|

ε2

+ qI(ε) ) ); and runs in time O(m |S| + T(ε)). Further, when the algorithm accepts, it also learns p: i.e., outputs hypothesis h s.t. dTV(p, h) ≤ ε.

28

slide-83
SLIDE 83

Require: sample access to a distribution p ∈ ∆(N), parameter ε ∈ (0, 1], b ∈ (0, 1], functions S: (0, 1] → 2N, M: (0, 1] → N, qI : (0, 1] → N, and procedure ProjectP

1: Effective Support 2:

Take qI(ε) samples to identify a “candidate set” I. ▷ Works s.h.p if p ∈ P.

3:

Take O(1/ε) samples to distinguish b/w p(I) ≥ 1 − ε

5 and p(I) < 1 − ε 4 .

▷ Correct w.h.p.

4:

if |I| > M(ε) or we detected that p(I) > ε

4 then

5:

return reject

6:

end if

7: 8: Fourier Effective Support 9:

Simulating sample access to p′ = p mod M(ε), call TestFourierSupport on p′ with parameters M(ε),

ε 5√ M(ε) , b, and S(ε).

10:

if TestFourierSupport returned reject then

11:

return reject

12:

end if

13:

Let h = ( h(ξ))ξ∈S(ε) be the Fourier coefficients it outputs, and h their inverse Fourier transform (modulo M(ε)) ▷ Do not actually compute h here.

14: 15: Projection Step 16:

Call ProjectP on parameters ε and h, and return accept if it does, reject otherwise.

17:

29

slide-84
SLIDE 84

fourier sparsity: the guiding example

With this in hand… The testing result for k-SIIRVs immediately follows. (Modulo one little lie.) Other results… For PBD (k 2) and PMDs (multidimensional) as well, the second w/ the suitable generalization of discrete Fourier transform.

30

slide-85
SLIDE 85

fourier sparsity: the guiding example

With this in hand… The testing result for k-SIIRVs immediately follows.(Modulo one little lie.) Other results… For PBD (k 2) and PMDs (multidimensional) as well, the second w/ the suitable generalization of discrete Fourier transform.

30

slide-86
SLIDE 86

fourier sparsity: the guiding example

With this in hand… The testing result for k-SIIRVs immediately follows.(Modulo one little lie.) Other results… For PBD (k = 2) and PMDs (multidimensional) as well, the second w/ the suitable generalization of discrete Fourier transform.

30

slide-87
SLIDE 87

fourier sparsity (the main tool)

Theorem (Testing Fourier Sparsity)

Given parameters M ≥ 1, ε, b ∈ (0, 1], subset S ⊆ [M] and sample access to q ∈ ∆([M]), TestFourierSupport either rejects or outputs Fourier coefficients h′ = ( h′(ξ))ξ∈S s.t., w.h.p., all the following holds.

  • 1. if ∥q∥2

2 > 2b, then it rejects;

  • 2. if ∥q∥2

2 ≤ 2b and ∀q∗ : [M] → R with

q∗ supported entirely on S, ∥q − q∗∥2 > ε, then it rejects;

  • 3. if ∥q∥2

2 ≤ b and ∃q∗ : [M] → R with

q∗ supported entirely on S s.t. ∥q − q∗∥2 ≤ ε

2 , then it

does not reject;

  • 4. if it does not reject, then ∥

q1S − h′∥2 ≤ O(ε √ M) and the inverse Fourier transform (modulo M) h′ of the Fourier coefficients it outputs satisfies ∥q − h′∥2 ≤ O(ε). Moreover, it takes m = O ( √

b ε2 + |S| Mε2 +

√ M ) samples from q, and runs in time O(m |S|).

31

slide-88
SLIDE 88

fourier sparsity (the main tool)

Idea

Consider the Fourier coefficients of the empirical distribution (from few samples).

Second idea

Do not consider directly these coefficients (timewise, expensive). Instead, rely on (the analysis of) an

2 identity tester [CDVV14]+Plancherel to get guarantees on the FC. 32

slide-89
SLIDE 89

fourier sparsity (the main tool)

Idea

Consider the Fourier coefficients of the empirical distribution (from few samples).

Second idea

Do not consider directly these coefficients (timewise, expensive). Instead, rely on (the analysis of) an ℓ2 identity tester [CDVV14]+Plancherel to get guarantees on the FC.

32

slide-90
SLIDE 90

fourier sparsity (the main tool)

Idea

Consider the Fourier coefficients of the empirical distribution (from few samples).

Second idea

Do not consider directly these coefficients (timewise, expensive). Instead, rely on (the analysis of) an ℓ2 identity tester [CDVV14]+Plancherel to get guarantees on the FC.

32

slide-91
SLIDE 91
  • pen questions, and questions.
slide-92
SLIDE 92
  • pen questions

∙ More applications: what is your favorite property? ∙ Uncertainty Principle: what about this S M term? ∙ Fourier works: what about other bases?

34

slide-93
SLIDE 93
  • pen questions

∙ More applications: what is your favorite property? ∙ Uncertainty Principle: what about this √ |S(ε)| M(ε) term? ∙ Fourier works: what about other bases?

34

slide-94
SLIDE 94
  • pen questions

∙ More applications: what is your favorite property? ∙ Uncertainty Principle: what about this √ |S(ε)| M(ε) term? ∙ Fourier works: what about other bases?

34

slide-95
SLIDE 95

Thank You.

35

slide-96
SLIDE 96

Jayadev Acharya and Constantinos Daskalakis. Testing Poisson Binomial Distributions. In Proceedings of SODA, pages 1829–1840, 2015. Jayadev Acharya, Constantinos Daskalakis, and Gautam C. Kamath. Optimal Testing for Properties of Distributions. In C. Cortes, N.D. Lawrence, D.D. Lee, M. Sugiyama, R. Garnett, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 3577–3598. Curran Associates, Inc., 2015. Jayadev Acharya, Ilias Diakonikolas, Jerry Zheng Li, and Ludwig Schmidt. Sample-optimal density estimation in nearly-linear time. In Proceedings of SODA, pages 1278–1289. SIAM, 2017. Eric Blais, Clément L. Canonne, and Tom Gur. Distribution testing lower bounds via reductions from communication complexity. In Computational Complexity Conference, volume 79 of LIPIcs, pages 28:1–28:40. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017. Tuğkan Batu, Eldar Fischer, Lance Fortnow, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. Testing random variables for independence and identity. In Proceedings of FOCS, pages 442–451, 2001.

35

slide-97
SLIDE 97

Tuğkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing that distributions are close. In Proceedings of FOCS, pages 189–197, 2000. Arnab Bhattacharyya, Eldar Fischer, Ronitt Rubinfeld, and Paul Valiant. Testing monotonicity of distributions over general partial orders. In Proceedings of ITCS, pages 239–252, 2011. Tuğkan Batu, Ravi Kumar, and Ronitt Rubinfeld. Sublinear algorithms for testing monotone and unimodal distributions. In Proceedings of STOC, pages 381–390, New York, NY, USA, 2004. ACM. Arnab Bhattacharyya and Yuichi Yoshida. Property Testing. Forthcoming, 2017. Clément L. Canonne. A Survey on Distribution Testing: your data is Big. But is it Blue? Electronic Colloquium on Computational Complexity (ECCC), 22:63, April 2015. Clément L. Canonne, Ilias Diakonikolas, Themis Gouleakis, and Ronitt Rubinfeld. Testing Shape Restrictions of Discrete Distributions. In Proceedings of STACS, 2016.

35

slide-98
SLIDE 98

See also [CDGR17] (full version). Clément L. Canonne, Ilias Diakonikolas, Themis Gouleakis, and Ronitt Rubinfeld. Testing shape restrictions of discrete distributions. Theory of Computing Systems, pages 1–59, 2017. Yu Cheng, Ilias Diakonikolas, and Alistair Stewart. Playing anonymous games using simple strategies. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, Proceedings of SODA, pages 616–631, Philadelphia, PA, USA, 2017. Society for Industrial and Applied Mathematics. Siu-on Chan, Ilias Diakonikolas, Rocco A. Servedio, and Xiaorui Sun. Learning mixtures of structured distributions over discrete domains. In Proceedings of SODA, pages 1380–1394, 2013. Siu-on Chan, Ilias Diakonikolas, Rocco A. Servedio, and Xiaorui Sun. Efficient density estimation via piecewise polynomial approximation. In Proceedings of STOC, pages 604–613. ACM, 2014. Siu-on Chan, Ilias Diakonikolas, Rocco A. Servedio, and Sun. Xiaorui. Near-optimal density estimation in near-linear time using variable-width histograms.

35

slide-99
SLIDE 99

In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 1844–1852, 2014. Siu-on Chan, Ilias Diakonikolas, Gregory Valiant, and Paul Valiant. Optimal algorithms for testing closeness of discrete distributions. In Proceedings of SODA, pages 1193–1203, 2014. Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price. Collision-based testers are optimal for uniformity and closeness. Electronic Colloquium on Computational Complexity (ECCC), 23:178, 2016. Ilias Diakonikolas and Daniel M. Kane. A new approach for testing properties of discrete distributions. In Proceedings of FOCS. IEEE Computer Society, 2016. Oded Goldreich, Shafi Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. Journal of the ACM, 45(4):653–750, July 1998. Oded Goldreich, editor. Property Testing: Current Research and Surveys. Springer, 2010. LNCS 6390.

35

slide-100
SLIDE 100

Oded Goldreich. Introduction to Property Testing. Forthcoming, 2017. Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs. Technical Report TR00-020, Electronic Colloquium on Computational Complexity (ECCC), 2000. Jiantao Jiao, Han Yanjun, and Tsachy Weissman. Minimax Estimation of the L_1 Distance. ArXiv e-prints, May 2017. Reut Levi, Dana Ron, and Ronitt Rubinfeld. Testing properties of collections of distributions. Theory of Computing, 9:295–347, 2013. Liam Paninski. A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Transactions on Information Theory, 54(10):4750–4755, 2008. Dana Ron. Property Testing: A Learning Theory Perspective.

35

slide-101
SLIDE 101

Foundations and Trends in Machine Learning, 1(3):307–402, 2008. Dana Ron. Algorithmic and analysis techniques in property testing. Foundations and Trends in Theoretical Computer Science, 5:73–205, 2010. Ronitt Rubinfeld and Madhu Sudan. Robust characterization of polynomials with applications to program testing. SIAM Journal on Computing, 25(2):252–271, 1996. Ronitt Rubinfeld. Taming big probability distributions. XRDS: Crossroads, The ACM Magazine for Students, 19(1):24, sep 2012. Paul Valiant. Testing symmetric properties of distributions. SIAM Journal on Computing, 40(6):1927–1968, 2011. Gregory Valiant and Paul Valiant. Estimating the unseen: An n/ log n-sample estimator for entropy and support size, shown optimal via new clts. In Proceedings of STOC, pages 685–694, 2011. Gregory Valiant and Paul Valiant. An automatic inequality prover and instance optimal identity testing.

35

slide-102
SLIDE 102

In Proceedings of FOCS, 2014. Gregory Valiant and Paul Valiant. An automatic inequality prover and instance optimal identity testing. SIAM Journal on Computing, 46(1):429–455, 2017. Journal version of [VV14].

35