Intro Results Related Work Conclusion
Entropy property testing with finitely many errors Changlong Wu - - PowerPoint PPT Presentation
Entropy property testing with finitely many errors Changlong Wu - - PowerPoint PPT Presentation
Intro Results Related Work Conclusion Entropy property testing with finitely many errors Changlong Wu (Univ of Hawaii, Manoa) Empty line Empty line Joint work with Narayana Santhanam (Univ of Hawaii, Manoa) ISIT2020 Online Talk June, 2020
Intro Results Related Work Conclusion
Introduction
Meta-question: When will scientist find perfect theory eventually almost surely?
1 / 20
Intro Results Related Work Conclusion
Introduction
Meta-question: When will scientist find perfect theory eventually almost surely? Consider a scientist building a theory that describes a nature phenomenon by making observations.
1 / 20
Intro Results Related Work Conclusion
Introduction
Meta-question: When will scientist find perfect theory eventually almost surely? Consider a scientist building a theory that describes a nature phenomenon by making observations. The scientist may refine his theory every time new observations
- arrive. (e.g. Newton → Einstein)
1 / 20
Intro Results Related Work Conclusion
Introduction
Meta-question: When will scientist find perfect theory eventually almost surely? Consider a scientist building a theory that describes a nature phenomenon by making observations. The scientist may refine his theory every time new observations
- arrive. (e.g. Newton → Einstein)
Will the scientist perpetually refine his theory or settle a perfect theory after making finitely many observations?
1 / 20
Intro Results Related Work Conclusion
A toy example
Let p be a distribution over {1, 2, · · · , m}, and H(p) be the entropy of p.
2 / 20
Intro Results Related Work Conclusion
A toy example
Let p be a distribution over {1, 2, · · · , m}, and H(p) be the entropy of p. For some fixed h ∈ [0, log m], we would like to decide: Is H(p) = h? by observing i.i.d. samples X1, X2, · · · ∼ p.
2 / 20
Intro Results Related Work Conclusion
A toy example
Let p be a distribution over {1, 2, · · · , m}, and H(p) be the entropy of p. For some fixed h ∈ [0, log m], we would like to decide: Is H(p) = h? by observing i.i.d. samples X1, X2, · · · ∼ p. Seems to be an ill-posed problem...
2 / 20
Intro Results Related Work Conclusion
A toy example
Let p be a distribution over {1, 2, · · · , m}, and H(p) be the entropy of p. For some fixed h ∈ [0, log m], we would like to decide: Is H(p) = h? by observing i.i.d. samples X1, X2, · · · ∼ p. Seems to be an ill-posed problem... Since one can’t decide for distributions p with H(p) arbitrary close but not equals to h.
2 / 20
Intro Results Related Work Conclusion
A toy example
Let p be a distribution over {1, 2, · · · , m}, and H(p) be the entropy of p. For some fixed h ∈ [0, log m], we would like to decide: Is H(p) = h? by observing i.i.d. samples X1, X2, · · · ∼ p. Seems to be an ill-posed problem... Since one can’t decide for distributions p with H(p) arbitrary close but not equals to h.
2 / 20
Intro Results Related Work Conclusion
A toy example
We are allowed to sample as long as we want, but after some point we must make the right decision.
3 / 20
Intro Results Related Work Conclusion
A toy example
We are allowed to sample as long as we want, but after some point we must make the right decision. We show that for any h ∈ [0, log m], there exist a universal decision rule Φ, such that for any distribution p over [m], we have Φ(X n
1 ) → 1{H(p) = h}, almost surely as n → ∞
where X1, X2, · · · ∼ p independently.
3 / 20
Intro Results Related Work Conclusion
A toy example
We are allowed to sample as long as we want, but after some point we must make the right decision. We show that for any h ∈ [0, log m], there exist a universal decision rule Φ, such that for any distribution p over [m], we have Φ(X n
1 ) → 1{H(p) = h}, almost surely as n → ∞
where X1, X2, · · · ∼ p independently. In other words, Φ makes the right decision eventually almost surely.
3 / 20
Intro Results Related Work Conclusion
Proof?
Let ˆ pn be the empirical distribution of p with n samples
4 / 20
Intro Results Related Work Conclusion
Proof?
Let ˆ pn be the empirical distribution of p with n samples Standard concentration inequality yields that there exist an number N such that for any n ≥ N, we have p(||ˆ pn − p||TV ≥ log2 n
√n ) ≤ 1 n2 .
4 / 20
Intro Results Related Work Conclusion
Proof?
Let ˆ pn be the empirical distribution of p with n samples Standard concentration inequality yields that there exist an number N such that for any n ≥ N, we have p(||ˆ pn − p||TV ≥ log2 n
√n ) ≤ 1 n2 .
Entropy function is uniform continuous over bounded support, we have function t(n) → 0, such that for n ≥ N p(|H(ˆ pn) − H(p)| ≥ t(n)) ≤ 1 n2 .
4 / 20
Intro Results Related Work Conclusion
Proof?
The decision rule is as follows: if |H(ˆ pn) − h| ≤ t(n) we decide ”yes”, otherwise decide ”no”.
5 / 20
Intro Results Related Work Conclusion
Proof?
The decision rule is as follows: if |H(ˆ pn) − h| ≤ t(n) we decide ”yes”, otherwise decide ”no”. Now, if indeed we have H(p) = h, we know by Borel-Cantelli lemma that the rule will be correct for all but finite n ≥ N w.p. 1.
5 / 20
Intro Results Related Work Conclusion
Proof?
The decision rule is as follows: if |H(ˆ pn) − h| ≤ t(n) we decide ”yes”, otherwise decide ”no”. Now, if indeed we have H(p) = h, we know by Borel-Cantelli lemma that the rule will be correct for all but finite n ≥ N w.p. 1. If H(p) = h, we know that there exist some number Np such that for all but finite n ≥ Np we have |H(ˆ pn) − h| > |H(p) − h| − t(n) > t(n) w.p. 1, since t(n) → 0.
5 / 20
Intro Results Related Work Conclusion
Testing general entropy property
Let P be a class of distributions over N, and A ⊂ R+. For what combination of P and A, we can find a decision rule Φ such that Φ(X n
1 ) → 1{H(p) ∈ A}, almost surely as n → ∞
for all p ∈ P and X1, X2, · · · i.i.d. ∼ p?
6 / 20
Intro Results Related Work Conclusion
Fσ-separable
Sets A ⊂ R+ and Ac = R+\A are said to be Fσ-separable, if there exist collection of sets {Bn}n∈N and {Cn}n∈N such that
- 1. A =
n∈N Bn and Ac = n∈N Cn;
- 2. For all n ∈ N, Bn ⊂ Bn+1 and Cn ⊂ Cn+1;
- 3. For all n ∈ N, inf{|x − y| : x ∈ Bn, y ∈ Cn} > 0.
7 / 20
Intro Results Related Work Conclusion
Bounded support case
Theorem For any A ⊂ [0, log m], we can decide Is H(p) ∈ A? eventually almost surely for all distributions p over [m] iff A and Ac are Fσ-separable.
8 / 20
Intro Results Related Work Conclusion
Infinite Alphabets
Does the result extend to distributions on naturals with arbitrary support?
9 / 20
Intro Results Related Work Conclusion
Infinite Alphabets
Does the result extend to distributions on naturals with arbitrary support? The answer is no, we prove the following the theorem: Theorem For any k ≥ 1, there is no decision rule that decides
- 1. Is H(p) ≥ k?
- 2. Is H(p) finite?
eventual almost surely for all distributions over N.
9 / 20
Intro Results Related Work Conclusion
Infinite Alphabets
Does the result extend to distributions on naturals with arbitrary support? The answer is no, we prove the following the theorem: Theorem For any k ≥ 1, there is no decision rule that decides
- 1. Is H(p) ≥ k?
- 2. Is H(p) finite?
eventual almost surely for all distributions over N. Proof uses a diagonization argument...
9 / 20
Intro Results Related Work Conclusion
Infinite Alphabets
We note the following somewhat surprising theorem: Theorem For any k ≥ 1, there exists a decision rule that decides Is H(p)>k? eventual almost surely for all distributions over N.
10 / 20
Intro Results Related Work Conclusion
Infinite Alphabets
We note the following somewhat surprising theorem: Theorem For any k ≥ 1, there exists a decision rule that decides Is H(p)>k? eventual almost surely for all distributions over N. The difference from the H(p)≥k case is that, one can construct an estimator ˆ H such that ˆ H(X n
1 ) ≤ H(p) and ˆ
H(X n
1 ) → H(p) almost
surely. Decide ”yes” if ˆ H(X n
1 ) ≤ k and ”no” otherwise.
10 / 20
Intro Results Related Work Conclusion
Preparing for the main result: Tail entropy
For any function ρ : N → R+ and class P of distributions over N. We say the tail entropy of P is eventually dominated by ρ if for all p ∈ P there exist an number Np such that for all n ≥ Np we have Hn(p) =
- i≥n
−pi log pi ≤ ρ(n).
11 / 20
Intro Results Related Work Conclusion
Main Result
Theorem Let ρ : N → R+ be an arbitrary function such that ρ(n) → 0 as n → ∞, P is eventually dominated by ρ, and A ⊂ R+. Then, there exist decision rule that decides Is H(p) ∈ A? eventually almost surely for all p ∈ P, iff A and Ac are Fσ-separable.
12 / 20
Intro Results Related Work Conclusion
Sketch of Proof
- 1. Since A and Ac are Fσ-separable, there exist B1 ⊂ B2 ⊂ · · · A
and C1 ⊂ C2 ⊂ · · · Ac with A = Bn, Ac = Cn such that for all n inf{|x − y| : x ∈ Bn, y ∈ Cn} = ǫn > 0
13 / 20
Intro Results Related Work Conclusion
Sketch of Proof
- 1. Since A and Ac are Fσ-separable, there exist B1 ⊂ B2 ⊂ · · · A
and C1 ⊂ C2 ⊂ · · · Ac with A = Bn, Ac = Cn such that for all n inf{|x − y| : x ∈ Bn, y ∈ Cn} = ǫn > 0
13 / 20
Intro Results Related Work Conclusion
Sketch of Proof (Cont.)
- 2. Define
Pn = {p ∈ P : H(p) ∈ Bn∪Cn and ∀k > N(n), Hk(p) ≤ ρ(k)} where N(n) ր +∞ and is choosing so that ρ(N(n)) ≤ ǫn
8 .
14 / 20
Intro Results Related Work Conclusion
Sketch of Proof (Cont.)
- 2. Define
Pn = {p ∈ P : H(p) ∈ Bn∪Cn and ∀k > N(n), Hk(p) ≤ ρ(k)} where N(n) ր +∞ and is choosing so that ρ(N(n)) ≤ ǫn
8 .
- 3. We have P = Pn and Pn ⊂ Pn+1, by eventual dominance
- f ρ and properties of {Bn} and {Cn}.
14 / 20
Intro Results Related Work Conclusion
Sketch of Proof (Cont.)
- 4. By construction, the problem restricted on Pn can be decided
with arbitrary confidence and bounded number of samples. Denote bn be the sample complexity that achieves 1 − 2−n confidence.
- 5. The decision rule for P is as follows: if the sample size equals
bn we use the decision rule for Pn to make the decision, and retain the same decision until sample size reaches bn+1. Repeat the process for n + 1.
15 / 20
Intro Results Related Work Conclusion
When do we have eventual dominance?
Clearly, the class of distributions with finite support is eventually dominated.
16 / 20
Intro Results Related Work Conclusion
When do we have eventual dominance?
Clearly, the class of distributions with finite support is eventually dominated. The following lemma show that finite first moment is also sufficient: Lemma Let P be the class of all distributions over N with finite first moment, then P is eventually dominate by ρ(n) = log2 n
n
.
16 / 20
Intro Results Related Work Conclusion
Relation with regularization
A model class P together with a binary property f : P → {0, 1} is said to be regularizable, if P can be decomposed into P =
- n∈N
Pn such each Pn is uniformly testable for property f .
17 / 20
Intro Results Related Work Conclusion
Relation with regularization
A model class P together with a binary property f : P → {0, 1} is said to be regularizable, if P can be decomposed into P =
- n∈N
Pn such each Pn is uniformly testable for property f . Our result shows that a model class is regularizable for some property iff the class is finitely decidable by testing the same property.
17 / 20
Intro Results Related Work Conclusion
In conclusion
- 1. Under mild conditions, we completely characterized the
decidability of entropy properties of distributions over N.
- 2. Our approaches also yield elementary proof of the results in
(Cover, 1973), (Dembo-Peres, 1994) and (Koplowitz et al., 1995).
- 3. A full version of this work with more problem setups available
from: https://arxiv.org/abs/2001.03710
18 / 20
Intro Results Related Work Conclusion
Related Work
Problems with similar flavor were initiated by (Cover, 1973) A substantial extension of Cover’s work appears in (Dembo-Peres, 1994) A line of research that follows such work: (Kulkarni-Tse, 1994), (Koplowitz et al., 1995), (Newman, 2016), (Newman, 2019). A prediction analogy appears in (Santhanam-Anantharam, 2016) (Wu-Santhanam, 2019). A deterministic computational analogy was extensively studied in the TCS community, see (Zeugmann-Zilles, 2006) for a survey.
19 / 20
Intro Results Related Work Conclusion
Thank you!
20 / 20