Frontiers in Distribution Testing: A Sample of What to Expect
Too Early for Puns?
Clément Canonne October 14, 2017
Columbia University Stanford University
Frontiers in Distribution Testing: A Sample of What to Expect Too - - PowerPoint PPT Presentation
Frontiers in Distribution Testing: A Sample of What to Expect Too Early for Puns? Clment Canonne October 14, 2017 Columbia University Stanford University Background, Context, and Motivation Expensive access: pricey data Model
Columbia University Stanford University
1
1
1
1
1
1
1
1
1
2
3
4
4
4
4
4
4
4
2
2 ,
4 3
4 3
2
2 [AD15, CDGR16, CDS17]
Caveat: The above is not entirely accurate, and only the (usually) dominant term is included. For instance, the sample complexity of equivalence is actually Θ(max(n2/3/ε4/3, √n/ε2)); for monotonicity, the current best upper bound has an additional 1/ε4 term, while for PBDs the lower bound of Ω(n1/4/ε2) is almost matched by an O(n1/4/ε2 + log2(1/ε)/ε2) upper bound. Don’t sue me.
5
2 ,
4 3
4 3
2
2 [AD15, CDGR16, CDS17]
Caveat: The above is not entirely accurate, and only the (usually) dominant term is included. For instance, the sample complexity of equivalence is actually Θ(max(n2/3/ε4/3, √n/ε2)); for monotonicity, the current best upper bound has an additional 1/ε4 term, while for PBDs the lower bound of Ω(n1/4/ε2) is almost matched by an O(n1/4/ε2 + log2(1/ε)/ε2) upper bound. Don’t sue me.
5
4 3
4 3
2
2 [AD15, CDGR16, CDS17]
Caveat: The above is not entirely accurate, and only the (usually) dominant term is included. For instance, the sample complexity of equivalence is actually Θ(max(n2/3/ε4/3, √n/ε2)); for monotonicity, the current best upper bound has an additional 1/ε4 term, while for PBDs the lower bound of Ω(n1/4/ε2) is almost matched by an O(n1/4/ε2 + log2(1/ε)/ε2) upper bound. Don’t sue me.
5
4 3
2
2 [AD15, CDGR16, CDS17]
Caveat: The above is not entirely accurate, and only the (usually) dominant term is included. For instance, the sample complexity of equivalence is actually Θ(max(n2/3/ε4/3, √n/ε2)); for monotonicity, the current best upper bound has an additional 1/ε4 term, while for PBDs the lower bound of Ω(n1/4/ε2) is almost matched by an O(n1/4/ε2 + log2(1/ε)/ε2) upper bound. Don’t sue me.
5
2
2 [AD15, CDGR16, CDS17]
Caveat: The above is not entirely accurate, and only the (usually) dominant term is included. For instance, the sample complexity of equivalence is actually Θ(max(n2/3/ε4/3, √n/ε2)); for monotonicity, the current best upper bound has an additional 1/ε4 term, while for PBDs the lower bound of Ω(n1/4/ε2) is almost matched by an O(n1/4/ε2 + log2(1/ε)/ε2) upper bound. Don’t sue me.
5
2 [AD15, CDGR16, CDS17]
Caveat: The above is not entirely accurate, and only the (usually) dominant term is included. For instance, the sample complexity of equivalence is actually Θ(max(n2/3/ε4/3, √n/ε2)); for monotonicity, the current best upper bound has an additional 1/ε4 term, while for PBDs the lower bound of Ω(n1/4/ε2) is almost matched by an O(n1/4/ε2 + log2(1/ε)/ε2) upper bound. Don’t sue me.
5
Caveat: The above is not entirely accurate, and only the (usually) dominant term is included. For instance, the sample complexity of equivalence is actually Θ(max(n2/3/ε4/3, √n/ε2)); for monotonicity, the current best upper bound has an additional 1/ε4 term, while for PBDs the lower bound of Ω(n1/4/ε2) is almost matched by an O(n1/4/ε2 + log2(1/ε)/ε2) upper bound. Don’t sue me.
5
Caveat: The above is not entirely accurate, and only the (usually) dominant term is included. For instance, the sample complexity of equivalence is actually Θ(max(n2/3/ε4/3, √n/ε2)); for monotonicity, the current best upper bound has an additional 1/ε4 term, while for PBDs the lower bound of Ω(n1/4/ε2) is almost matched by an O(n1/4/ε2 + log2(1/ε)/ε2) upper bound. Don’t sue me.
5
Caveat: The above is not entirely accurate, and only the (usually) dominant term is included. For instance, the sample complexity of equivalence is actually Θ(max(n2/3/ε4/3, √n/ε2)); for monotonicity, the current best upper bound has an additional 1/ε4 term, while for PBDs the lower bound of Ω(n1/4/ε2) is almost matched by an O(n1/4/ε2 + log2(1/ε)/ε2) upper bound. Don’t sue me.
5
Caveat: The above is not entirely accurate, and only the (usually) dominant term is included. For instance, the sample complexity of equivalence is actually Θ(max(n2/3/ε4/3, √n/ε2)); for monotonicity, the current best upper bound has an additional 1/ε4 term, while for PBDs the lower bound of Ω(n1/4/ε2) is almost matched by an O(n1/4/ε2 + log2(1/ε)/ε2) upper bound. Don’t sue me.
5
Caveat: The above is not entirely accurate, and only the (usually) dominant term is included. For instance, the sample complexity of equivalence is actually Θ(max(n2/3/ε4/3, √n/ε2)); for monotonicity, the current best upper bound has an additional 1/ε4 term, while for PBDs the lower bound of Ω(n1/4/ε2) is almost matched by an O(n1/4/ε2 + log2(1/ε)/ε2) upper bound. Don’t sue me.
5
6
6
6
6
i∈Ω
1 distance):
S
7
i∈Ω
1 distance):
S
7
i∈Ω
S⊆Ω
x∈Ω
7
i∈Ω
S⊆Ω
x∈Ω
7
i∈Ω
S⊆Ω
x∈Ω
7
i∈Ω
S⊆Ω
x∈Ω
7
i∈Ω
S⊆Ω
x∈Ω
7
3
2 . 8
3
2 . 8
3
2 . 8
3
8
3 vs. dTV p p 2 3
3
9
3 vs. dTV p p 2 3
3
9
3 vs. dTV(ˆ
3
3
9
3 vs. dTV(ˆ
3
3
9
3 vs. dTV(ˆ
3
3
9
3 vs. dTV(ˆ
3
3
n log n)
10
2 distance
2 p
2 vs. dTV p p 2 3
3
11
2 p
2 vs. dTV p p 2 3
3
11
3
3
11
3
3
11
3
3
11
3
12
3
12
3
12
3
12
3
12
2 sense) using this structure
3
13
2 sense) using this structure
3
13
3
13
3
13
3
13
2 tester
14
2 tester
14
2 tester
14
14
14
14
n log n)
n log n).
Technically, and as Jiantao’s talk will describe: a more accurate description is that whatever estimation can be performed in k log k samples via the plug-in empirical estimator, the optimal scheme does with k. “Enlarge your sample,” if you will.
15
n log n)
n log n).
Technically, and as Jiantao’s talk will describe: a more accurate description is that whatever estimation can be performed in k log k samples via the plug-in empirical estimator, the optimal scheme does with k. “Enlarge your sample,” if you will.
15
n
2 log n
16
n ε2 log n)
16
n ε2 log n)
16
n ε2 log n)
16
n ε2 log n)
16
n ε2 log n)
16
17
18
18
18
18
18
18
19
19
19
19
20
2th century
21
22
23
23
23
23
23
23
23
23
23
23
23