[PPT] - Kolmogorov Complexity and Other Entropy Measures Iftach Haitner PowerPoint Presentation

SLIDE 1

Application of Information Theory, Lecture 8

Kolmogorov Complexity and Other Entropy Measures

Iftach Haitner

Tel Aviv University.

December 16, 2014

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 1 / 24

SLIDE 2

Part I Kolmogorov Complexity

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 2 / 24

SLIDE 3

Description length

◮ What is the description length of the following strings?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

SLIDE 4

Description length

◮ What is the description length of the following strings?

1. 010101010101010101010101010101010101

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

SLIDE 5

Description length

◮ What is the description length of the following strings?

1. 010101010101010101010101010101010101
2. 011010100000100111100110011001111110

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

SLIDE 6

Description length

◮ What is the description length of the following strings?

1. 010101010101010101010101010101010101
2. 011010100000100111100110011001111110
3. 111010100110001100111100010101011111

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

SLIDE 7

Description length

◮ What is the description length of the following strings?

1. 010101010101010101010101010101010101
2. 011010100000100111100110011001111110
3. 111010100110001100111100010101011111

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

SLIDE 8

Description length

◮ What is the description length of the following strings?

1. 010101010101010101010101010101010101
2. 011010100000100111100110011001111110
3. 111010100110001100111100010101011111

◮

1. Eighteen 01

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

SLIDE 9

Description length

◮ What is the description length of the following strings?

1. 010101010101010101010101010101010101
2. 011010100000100111100110011001111110
3. 111010100110001100111100010101011111

◮

1. Eighteen 01
2. First 36 bit of the binary expansion of

√ 2 − 1

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

SLIDE 10

Description length

◮ What is the description length of the following strings?

1. 010101010101010101010101010101010101
2. 011010100000100111100110011001111110
3. 111010100110001100111100010101011111

◮

1. Eighteen 01
2. First 36 bit of the binary expansion of

√ 2 − 1

3. Looks random, but 22 ones out of 36

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

SLIDE 11

Description length

◮ What is the description length of the following strings?

1. 010101010101010101010101010101010101
2. 011010100000100111100110011001111110
3. 111010100110001100111100010101011111

◮

1. Eighteen 01
2. First 36 bit of the binary expansion of

√ 2 − 1

3. Looks random, but 22 ones out of 36

◮ Bergg’s paradox: Let s be “the smallest positive integer that cannot be

described in twelve English words”

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

SLIDE 12

Description length

◮ What is the description length of the following strings?

1. 010101010101010101010101010101010101
2. 011010100000100111100110011001111110
3. 111010100110001100111100010101011111

◮

1. Eighteen 01
2. First 36 bit of the binary expansion of

√ 2 − 1

3. Looks random, but 22 ones out of 36

◮ Bergg’s paradox: Let s be “the smallest positive integer that cannot be

described in twelve English words”

◮ The above is a definition of s, of less than twelve English words...

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

SLIDE 13

Description length

◮ What is the description length of the following strings?

1. 010101010101010101010101010101010101
2. 011010100000100111100110011001111110
3. 111010100110001100111100010101011111

◮

1. Eighteen 01
2. First 36 bit of the binary expansion of

√ 2 − 1

3. Looks random, but 22 ones out of 36

◮ Bergg’s paradox: Let s be “the smallest positive integer that cannot be

described in twelve English words”

◮ The above is a definition of s, of less than twelve English words... ◮ Solution: the word “described" above in the definition of s is not well

defined

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

SLIDE 14

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

SLIDE 15

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

◮ Now the term “described” is well defined.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

SLIDE 16

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

◮ Now the term “described” is well defined. ◮ Why C++?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

SLIDE 17

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

◮ Now the term “described” is well defined. ◮ Why C++? ◮ All (complete) programming language/computational model are

essentially equivalent.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

SLIDE 18

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

◮ Now the term “described” is well defined. ◮ Why C++? ◮ All (complete) programming language/computational model are

essentially equivalent.

◮ Let K ′(x) be the description length of x in another complete language,

then |K(x) − k′(x)| ≤ const.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

SLIDE 19

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

◮ Now the term “described” is well defined. ◮ Why C++? ◮ All (complete) programming language/computational model are

essentially equivalent.

◮ Let K ′(x) be the description length of x in another complete language,

then |K(x) − k′(x)| ≤ const.

◮ What is K(x) for x = 0101010101 . . . 01

n pairs

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

SLIDE 20

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

◮ Now the term “described” is well defined. ◮ Why C++? ◮ All (complete) programming language/computational model are

essentially equivalent.

◮ Let K ′(x) be the description length of x in another complete language,

then |K(x) − k′(x)| ≤ const.

◮ What is K(x) for x = 0101010101 . . . 01

n pairs

◮ “For i = 1 : i++ : n; print 01”

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

SLIDE 21

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

◮ Now the term “described” is well defined. ◮ Why C++? ◮ All (complete) programming language/computational model are

essentially equivalent.

◮ Let K ′(x) be the description length of x in another complete language,

then |K(x) − k′(x)| ≤ const.

◮ What is K(x) for x = 0101010101 . . . 01

n pairs

◮ “For i = 1 : i++ : n; print 01” ◮ K(x) ≤ log n + const

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

SLIDE 22

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

◮ Now the term “described” is well defined. ◮ Why C++? ◮ All (complete) programming language/computational model are

essentially equivalent.

◮ Let K ′(x) be the description length of x in another complete language,

then |K(x) − k′(x)| ≤ const.

◮ What is K(x) for x = 0101010101 . . . 01

n pairs

◮ “For i = 1 : i++ : n; print 01” ◮ K(x) ≤ log n + const ◮ This is considered to be small complexity. We typically ignore log n

factors.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

SLIDE 23

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

◮ Now the term “described” is well defined. ◮ Why C++? ◮ All (complete) programming language/computational model are

essentially equivalent.

◮ Let K ′(x) be the description length of x in another complete language,

then |K(x) − k′(x)| ≤ const.

◮ What is K(x) for x = 0101010101 . . . 01

n pairs

◮ “For i = 1 : i++ : n; print 01” ◮ K(x) ≤ log n + const ◮ This is considered to be small complexity. We typically ignore log n

factors.

◮ What is K(x) for x being the first n digits of π?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

SLIDE 24

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

◮ Now the term “described” is well defined. ◮ Why C++? ◮ All (complete) programming language/computational model are

essentially equivalent.

◮ Let K ′(x) be the description length of x in another complete language,

then |K(x) − k′(x)| ≤ const.

◮ What is K(x) for x = 0101010101 . . . 01

n pairs

◮ “For i = 1 : i++ : n; print 01” ◮ K(x) ≤ log n + const ◮ This is considered to be small complexity. We typically ignore log n

factors.

◮ What is K(x) for x being the first n digits of π? ◮ K(x) = log n + const

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

SLIDE 25

More examples

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 5 / 24

SLIDE 26

More examples

◮ What is K(x) for x ∈ {0, 1}n with k ones?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 5 / 24

SLIDE 27

More examples

◮ What is K(x) for x ∈ {0, 1}n with k ones? ◮ Recall that

n

k

≤ 2nh(k/n)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 5 / 24

SLIDE 28

More examples

◮ What is K(x) for x ∈ {0, 1}n with k ones? ◮ Recall that

n

k

≤ 2nh(k/n)

◮ Hence K(x) ≤ log n + nh(k/n)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 5 / 24

SLIDE 29

Bounds

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

SLIDE 30

Bounds

◮ K(x) ≤ |x| + const

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

SLIDE 31

Bounds

◮ K(x) ≤ |x| + const ◮ Proof: “output x”

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

SLIDE 32

Bounds

◮ K(x) ≤ |x| + const ◮ Proof: “output x” ◮ Most sequences have high Kolmogorov complexity:

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

SLIDE 33

Bounds

◮ K(x) ≤ |x| + const ◮ Proof: “output x” ◮ Most sequences have high Kolmogorov complexity: ◮ At most 2n−1 (C++) programs of length ≤ n − 2

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

SLIDE 34

Bounds

◮ K(x) ≤ |x| + const ◮ Proof: “output x” ◮ Most sequences have high Kolmogorov complexity: ◮ At most 2n−1 (C++) programs of length ≤ n − 2 ◮ 2n strings of length n

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

SLIDE 35

Bounds

◮ K(x) ≤ |x| + const ◮ Proof: “output x” ◮ Most sequences have high Kolmogorov complexity: ◮ At most 2n−1 (C++) programs of length ≤ n − 2 ◮ 2n strings of length n ◮ Hence, at least 1

2 of n-bit strings have Kolmogorov complexity at least

n − 1

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

SLIDE 36

Bounds

◮ K(x) ≤ |x| + const ◮ Proof: “output x” ◮ Most sequences have high Kolmogorov complexity: ◮ At most 2n−1 (C++) programs of length ≤ n − 2 ◮ 2n strings of length n ◮ Hence, at least 1

2 of n-bit strings have Kolmogorov complexity at least

n − 1

◮ In particular, a random sequence has Kolmogorov complexity ≈ n

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

SLIDE 37

Conditional Kolmogorov complexity

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 7 / 24

SLIDE 38

Conditional Kolmogorov complexity

◮ K(x|y) — Kolmogorov complexity of x given y. The length of the

shortest partogram that outputd x on input y

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 7 / 24

SLIDE 39

Conditional Kolmogorov complexity

◮ K(x|y) — Kolmogorov complexity of x given y. The length of the

shortest partogram that outputd x on input y

◮ Chain rule

K(x, y) ≈ k(y) + k(x|y)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 7 / 24

SLIDE 40

H vs. K

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

SLIDE 41

H vs. K

H(X) speaks about a random variable X and K(x) of a string x, but

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

SLIDE 42

H vs. K

H(X) speaks about a random variable X and K(x) of a string x, but

◮ Both quantities measure the amount of uncertainty or randomness in an

bject

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

SLIDE 43

H vs. K

H(X) speaks about a random variable X and K(x) of a string x, but

◮ Both quantities measure the amount of uncertainty or randomness in an

bject

◮ Both measure the number of bits it takes to describe an object

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

SLIDE 44

H vs. K

H(X) speaks about a random variable X and K(x) of a string x, but

◮ Both quantities measure the amount of uncertainty or randomness in an

bject

◮ Both measure the number of bits it takes to describe an object ◮ Another property: Let X1, . . . , Xn be iid, then whp

K(X1, . . . , Xn) ≈ H(X1, . . . , Xn) = nH(X1)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

SLIDE 45

H vs. K

H(X) speaks about a random variable X and K(x) of a string x, but

◮ Both quantities measure the amount of uncertainty or randomness in an

bject

◮ Both measure the number of bits it takes to describe an object ◮ Another property: Let X1, . . . , Xn be iid, then whp

K(X1, . . . , Xn) ≈ H(X1, . . . , Xn) = nH(X1)

◮ Proof: ?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

SLIDE 46

H vs. K

H(X) speaks about a random variable X and K(x) of a string x, but

◮ Both quantities measure the amount of uncertainty or randomness in an

bject

◮ Both measure the number of bits it takes to describe an object ◮ Another property: Let X1, . . . , Xn be iid, then whp

K(X1, . . . , Xn) ≈ H(X1, . . . , Xn) = nH(X1)

◮ Proof: ?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

SLIDE 47

H vs. K

H(X) speaks about a random variable X and K(x) of a string x, but

◮ Both quantities measure the amount of uncertainty or randomness in an

bject

◮ Both measure the number of bits it takes to describe an object ◮ Another property: Let X1, . . . , Xn be iid, then whp

K(X1, . . . , Xn) ≈ H(X1, . . . , Xn) = nH(X1)

◮ Proof: ? AEP

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

SLIDE 48

H vs. K

H(X) speaks about a random variable X and K(x) of a string x, but

◮ Both quantities measure the amount of uncertainty or randomness in an

bject

◮ Both measure the number of bits it takes to describe an object ◮ Another property: Let X1, . . . , Xn be iid, then whp

K(X1, . . . , Xn) ≈ H(X1, . . . , Xn) = nH(X1)

◮ Proof: ? AEP ◮ Example: coin flip (0.7, 0.3) then whp we get a string with

K(x) ≈ n · h(0.3)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

SLIDE 49

Universal compression

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 9 / 24

SLIDE 50

Universal compression

◮ A program of length K(x) that outputs x, compresses x into k(x) bit of

information.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 9 / 24

SLIDE 51

Universal compression

◮ A program of length K(x) that outputs x, compresses x into k(x) bit of

information.

◮ Example: length of the human genome: 6 · 109 bits

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 9 / 24

SLIDE 52

Universal compression

◮ A program of length K(x) that outputs x, compresses x into k(x) bit of

information.

◮ Example: length of the human genome: 6 · 109 bits ◮ But the code is redundant

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 9 / 24

SLIDE 53

Universal compression

◮ A program of length K(x) that outputs x, compresses x into k(x) bit of

information.

◮ Example: length of the human genome: 6 · 109 bits ◮ But the code is redundant ◮ The relevant number to measure the number of possible values is the

Kolmogorov complexity of the code.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 9 / 24

SLIDE 54

Universal compression

◮ A program of length K(x) that outputs x, compresses x into k(x) bit of

information.

◮ Example: length of the human genome: 6 · 109 bits ◮ But the code is redundant ◮ The relevant number to measure the number of possible values is the

Kolmogorov complexity of the code.

◮ No-one knows its value...

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 9 / 24

SLIDE 55

Universal probability

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

SLIDE 56

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

SLIDE 57

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p. Definition 1 The universal probability of a string x is PU(x) =

p : p()=x 2−|p| = Prp←{0,1}∞ [p() = x]

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

SLIDE 58

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p. Definition 1 The universal probability of a string x is PU(x) =

p : p()=x 2−|p| = Prp←{0,1}∞ [p() = x]

◮ Namely, the probability that if one picks a program at random, it prints x.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

SLIDE 59

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p. Definition 1 The universal probability of a string x is PU(x) =

p : p()=x 2−|p| = Prp←{0,1}∞ [p() = x]

◮ Namely, the probability that if one picks a program at random, it prints x. ◮ Insensitive (up o constant factor) to the computation model.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

SLIDE 60

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p. Definition 1 The universal probability of a string x is PU(x) =

p : p()=x 2−|p| = Prp←{0,1}∞ [p() = x]

◮ Namely, the probability that if one picks a program at random, it prints x. ◮ Insensitive (up o constant factor) to the computation model. ◮ Interpretation: PU(x) is the the probability that you observe x in nature.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

SLIDE 61

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p. Definition 1 The universal probability of a string x is PU(x) =

p : p()=x 2−|p| = Prp←{0,1}∞ [p() = x]

◮ Namely, the probability that if one picks a program at random, it prints x. ◮ Insensitive (up o constant factor) to the computation model. ◮ Interpretation: PU(x) is the the probability that you observe x in nature.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

SLIDE 62

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p. Definition 1 The universal probability of a string x is PU(x) =

p : p()=x 2−|p| = Prp←{0,1}∞ [p() = x]

◮ Namely, the probability that if one picks a program at random, it prints x. ◮ Insensitive (up o constant factor) to the computation model. ◮ Interpretation: PU(x) is the the probability that you observe x in nature. ◮ Computer as an intelligent amplifier

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

SLIDE 63

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p. Definition 1 The universal probability of a string x is PU(x) =

p : p()=x 2−|p| = Prp←{0,1}∞ [p() = x]

◮ Namely, the probability that if one picks a program at random, it prints x. ◮ Insensitive (up o constant factor) to the computation model. ◮ Interpretation: PU(x) is the the probability that you observe x in nature. ◮ Computer as an intelligent amplifier

Theorem 2 ∃c > 0 such that 2−K(x) ≤ PU(x) ≤ c · 2−K(x) for every x ∈ {0, 1}∗.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

SLIDE 64

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p. Definition 1 The universal probability of a string x is PU(x) =

p : p()=x 2−|p| = Prp←{0,1}∞ [p() = x]

◮ Namely, the probability that if one picks a program at random, it prints x. ◮ Insensitive (up o constant factor) to the computation model. ◮ Interpretation: PU(x) is the the probability that you observe x in nature. ◮ Computer as an intelligent amplifier

Theorem 2 ∃c > 0 such that 2−K(x) ≤ PU(x) ≤ c · 2−K(x) for every x ∈ {0, 1}∗.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

SLIDE 65

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p. Definition 1 The universal probability of a string x is PU(x) =

p : p()=x 2−|p| = Prp←{0,1}∞ [p() = x]

◮ Namely, the probability that if one picks a program at random, it prints x. ◮ Insensitive (up o constant factor) to the computation model. ◮ Interpretation: PU(x) is the the probability that you observe x in nature. ◮ Computer as an intelligent amplifier

Theorem 2 ∃c > 0 such that 2−K(x) ≤ PU(x) ≤ c · 2−K(x) for every x ∈ {0, 1}∗.

◮ The interesting part is PU(x) ≤ c · 2−K(x)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

SLIDE 66

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p. Definition 1 The universal probability of a string x is PU(x) =

p : p()=x 2−|p| = Prp←{0,1}∞ [p() = x]

◮ Namely, the probability that if one picks a program at random, it prints x. ◮ Insensitive (up o constant factor) to the computation model. ◮ Interpretation: PU(x) is the the probability that you observe x in nature. ◮ Computer as an intelligent amplifier

Theorem 2 ∃c > 0 such that 2−K(x) ≤ PU(x) ≤ c · 2−K(x) for every x ∈ {0, 1}∗.

◮ The interesting part is PU(x) ≤ c · 2−K(x) ◮ Hence, for X ∼ PU, it holds that

EK(X) [−] H(X)
≤ c

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

SLIDE 67

Proving Theorem 2

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 11 / 24

SLIDE 68

Proving Theorem 2

◮ We need to find c > 0 such that k(x) ≤ log

1 PU(x) + c for every x ∈ {0, 1}∗

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 11 / 24

SLIDE 69

Proving Theorem 2

◮ We need to find c > 0 such that k(x) ≤ log

1 PU(x) + c for every x ∈ {0, 1}∗

◮ In other words, find a program to output x whose length is log

1 PU(x) + c

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 11 / 24

SLIDE 70

Proving Theorem 2

◮ We need to find c > 0 such that k(x) ≤ log

1 PU(x) + c for every x ∈ {0, 1}∗

◮ In other words, find a program to output x whose length is log

1 PU(x) + c

◮ Idea, program chooses a leaf on the Shannon code for PU (in which x is

f depth
log

1 PU(x)

)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 11 / 24

SLIDE 71

Proving Theorem 2

◮ We need to find c > 0 such that k(x) ≤ log

1 PU(x) + c for every x ∈ {0, 1}∗

◮ In other words, find a program to output x whose length is log

1 PU(x) + c

◮ Idea, program chooses a leaf on the Shannon code for PU (in which x is

f depth
log

1 PU(x)

)

◮ Problem: PU is not computable

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 11 / 24

SLIDE 72

Proving Theorem 2

◮ We need to find c > 0 such that k(x) ≤ log

1 PU(x) + c for every x ∈ {0, 1}∗

◮ In other words, find a program to output x whose length is log

1 PU(x) + c

◮ Idea, program chooses a leaf on the Shannon code for PU (in which x is

f depth
log

1 PU(x)

)

◮ Problem: PU is not computable ◮ Solution: compute a better and better estimate for the tree of PU along

with the “mapping" from the tree nodes back to codewords.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 11 / 24

SLIDE 73

Proving Theorem 2

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

SLIDE 74

Proving Theorem 2

◮ Initial T to be the infinite Binary tree.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

SLIDE 75

Proving Theorem 2

◮ Initial T to be the infinite Binary tree.

Program 3 (M) Enumerate over all programs in {0, 1}∗: at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and (∗, x, n(x)) / ∈ T, place (p, x, n(x)) at unused n(x)-depth node of T, for n(x) =

log

1 ˆ PU(x)

+ 1 and ˆ

PU(x) =

p′ : emulated p′ has output x 2−|p′|

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

SLIDE 76

Proving Theorem 2

◮ Initial T to be the infinite Binary tree.

Program 3 (M) Enumerate over all programs in {0, 1}∗: at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and (∗, x, n(x)) / ∈ T, place (p, x, n(x)) at unused n(x)-depth node of T, for n(x) =

log

1 ˆ PU(x)

+ 1 and ˆ

PU(x) =

p′ : emulated p′ has output x 2−|p′|

◮ The program never gets stack (can always add the node).

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

SLIDE 77

Proving Theorem 2

◮ Initial T to be the infinite Binary tree.

Program 3 (M) Enumerate over all programs in {0, 1}∗: at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and (∗, x, n(x)) / ∈ T, place (p, x, n(x)) at unused n(x)-depth node of T, for n(x) =

log

1 ˆ PU(x)

+ 1 and ˆ

PU(x) =

p′ : emulated p′ has output x 2−|p′|

◮ The program never gets stack (can always add the node).

Proof: Let x ∈ {0, 1}∗. At each point through the execution of M,

(p,x,·)∈T 2−|p| ≤ 2−K(x)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

SLIDE 78

Proving Theorem 2

◮ Initial T to be the infinite Binary tree.

Program 3 (M) Enumerate over all programs in {0, 1}∗: at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and (∗, x, n(x)) / ∈ T, place (p, x, n(x)) at unused n(x)-depth node of T, for n(x) =

log

1 ˆ PU(x)

+ 1 and ˆ

PU(x) =

p′ : emulated p′ has output x 2−|p′|

◮ The program never gets stack (can always add the node).

Proof: Let x ∈ {0, 1}∗. At each point through the execution of M,

(p,x,·)∈T 2−|p| ≤ 2−K(x)

Since

x 2−K(x) ≤ 1, the proof follows by Kraft inequality.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

SLIDE 79

Proving Theorem 2

◮ Initial T to be the infinite Binary tree.

Program 3 (M) Enumerate over all programs in {0, 1}∗: at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and (∗, x, n(x)) / ∈ T, place (p, x, n(x)) at unused n(x)-depth node of T, for n(x) =

log

1 ˆ PU(x)

+ 1 and ˆ

PU(x) =

p′ : emulated p′ has output x 2−|p′|

◮ The program never gets stack (can always add the node).

Proof: Let x ∈ {0, 1}∗. At each point through the execution of M,

(p,x,·)∈T 2−|p| ≤ 2−K(x)

Since

x 2−K(x) ≤ 1, the proof follows by Kraft inequality.

◮ ∀x ∈ {0, 1}∗: M adds a node (·, x, ·) to T at depth 2 +

log

1 PU(x)

Iftach Haitner (TAU)

Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

SLIDE 80

Proving Theorem 2

◮ Initial T to be the infinite Binary tree.

Program 3 (M) Enumerate over all programs in {0, 1}∗: at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and (∗, x, n(x)) / ∈ T, place (p, x, n(x)) at unused n(x)-depth node of T, for n(x) =

log

1 ˆ PU(x)

+ 1 and ˆ

PU(x) =

p′ : emulated p′ has output x 2−|p′|

◮ The program never gets stack (can always add the node).

Proof: Let x ∈ {0, 1}∗. At each point through the execution of M,

(p,x,·)∈T 2−|p| ≤ 2−K(x)

Since

x 2−K(x) ≤ 1, the proof follows by Kraft inequality.

◮ ∀x ∈ {0, 1}∗: M adds a node (·, x, ·) to T at depth 2 +

log

1 PU(x)

Proof: ˆ

PU(x) converges to PU(x)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

SLIDE 81

Proving Theorem 2

◮ Initial T to be the infinite Binary tree.

Program 3 (M) Enumerate over all programs in {0, 1}∗: at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and (∗, x, n(x)) / ∈ T, place (p, x, n(x)) at unused n(x)-depth node of T, for n(x) =

log

1 ˆ PU(x)

+ 1 and ˆ

PU(x) =

p′ : emulated p′ has output x 2−|p′|

◮ The program never gets stack (can always add the node).

Proof: Let x ∈ {0, 1}∗. At each point through the execution of M,

(p,x,·)∈T 2−|p| ≤ 2−K(x)

Since

x 2−K(x) ≤ 1, the proof follows by Kraft inequality.

◮ ∀x ∈ {0, 1}∗: M adds a node (·, x, ·) to T at depth 2 +

log

1 PU(x)

Proof: ˆ

PU(x) converges to PU(x)

◮ For x ∈ {0, 1}∗, let ℓ(x) be the location its (2 +

log

1 PU(x)

)-depth node

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

SLIDE 82

Proving Theorem 2

◮ Initial T to be the infinite Binary tree.

Program 3 (M) Enumerate over all programs in {0, 1}∗: at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and (∗, x, n(x)) / ∈ T, place (p, x, n(x)) at unused n(x)-depth node of T, for n(x) =

log

1 ˆ PU(x)

+ 1 and ˆ

PU(x) =

p′ : emulated p′ has output x 2−|p′|

◮ The program never gets stack (can always add the node).

Proof: Let x ∈ {0, 1}∗. At each point through the execution of M,

(p,x,·)∈T 2−|p| ≤ 2−K(x)

Since

x 2−K(x) ≤ 1, the proof follows by Kraft inequality.

◮ ∀x ∈ {0, 1}∗: M adds a node (·, x, ·) to T at depth 2 +

log

1 PU(x)

Proof: ˆ

PU(x) converges to PU(x)

◮ For x ∈ {0, 1}∗, let ℓ(x) be the location its (2 +

log

1 PU(x)

)-depth node

◮ Program for printing x. Run M till it assigns the node at the location of ℓ(x)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

SLIDE 83

Applications

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

SLIDE 84

Applications

◮ (another) Proof that there are infinity many primes.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

SLIDE 85

Applications

◮ (another) Proof that there are infinity many primes. ◮ Assume there are finitely many primes p1, . . . , pm

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

SLIDE 86

Applications

◮ (another) Proof that there are infinity many primes. ◮ Assume there are finitely many primes p1, . . . , pm ◮ Any length n integer x can be written as x = m

i=1 pdi i

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

SLIDE 87

Applications

◮ (another) Proof that there are infinity many primes. ◮ Assume there are finitely many primes p1, . . . , pm ◮ Any length n integer x can be written as x = m

i=1 pdi i

◮ di ≤ n, hence length di ≤ log n

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

SLIDE 88

Applications

◮ (another) Proof that there are infinity many primes. ◮ Assume there are finitely many primes p1, . . . , pm ◮ Any length n integer x can be written as x = m

i=1 pdi i

◮ di ≤ n, hence length di ≤ log n ◮ Hence, K(x) ≤ m · log n + const

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

SLIDE 89

Applications

◮ (another) Proof that there are infinity many primes. ◮ Assume there are finitely many primes p1, . . . , pm ◮ Any length n integer x can be written as x = m

i=1 pdi i

◮ di ≤ n, hence length di ≤ log n ◮ Hence, K(x) ≤ m · log n + const ◮ But for most numbers k(x) ≥ n − 1

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

SLIDE 90

Computability of K

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

SLIDE 91

Computability of K

◮ Can we compute K(x)?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

SLIDE 92

Computability of K

◮ Can we compute K(x)? ◮ Answer, No.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

SLIDE 93

Computability of K

◮ Can we compute K(x)? ◮ Answer, No. ◮ Proof: Assume K is computable by a program of length C

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

SLIDE 94

Computability of K

◮ Can we compute K(x)? ◮ Answer, No. ◮ Proof: Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K(s) > 2C + 10, 000

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

SLIDE 95

Computability of K

◮ Can we compute K(x)? ◮ Answer, No. ◮ Proof: Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K(s) > 2C + 10, 000 ◮ s can be computed by the following program:

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

SLIDE 96

Computability of K

◮ Can we compute K(x)? ◮ Answer, No. ◮ Proof: Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K(s) > 2C + 10, 000 ◮ s can be computed by the following program:

1. x = 0

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

SLIDE 97

Computability of K

◮ Can we compute K(x)? ◮ Answer, No. ◮ Proof: Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K(s) > 2C + 10, 000 ◮ s can be computed by the following program:

1. x = 0
2. While (K(x) < 2C + 10, 000): x++

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

SLIDE 98

Computability of K

◮ Can we compute K(x)? ◮ Answer, No. ◮ Proof: Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K(s) > 2C + 10, 000 ◮ s can be computed by the following program:

1. x = 0
2. While (K(x) < 2C + 10, 000): x++
3. Output x

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

SLIDE 99

Computability of K

◮ Can we compute K(x)? ◮ Answer, No. ◮ Proof: Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K(s) > 2C + 10, 000 ◮ s can be computed by the following program:

1. x = 0
2. While (K(x) < 2C + 10, 000): x++
3. Output x

◮ Thus K(s) < C + log C + log 10, 000 + const < 2C + 10, 000

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

SLIDE 100

Computability of K

◮ Can we compute K(x)? ◮ Answer, No. ◮ Proof: Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K(s) > 2C + 10, 000 ◮ s can be computed by the following program:

1. x = 0
2. While (K(x) < 2C + 10, 000): x++
3. Output x

◮ Thus K(s) < C + log C + log 10, 000 + const < 2C + 10, 000 ◮ Bergg’s Paradox, revisited:

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

SLIDE 101

Computability of K

◮ Can we compute K(x)? ◮ Answer, No. ◮ Proof: Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K(s) > 2C + 10, 000 ◮ s can be computed by the following program:

1. x = 0
2. While (K(x) < 2C + 10, 000): x++
3. Output x

◮ Thus K(s) < C + log C + log 10, 000 + const < 2C + 10, 000 ◮ Bergg’s Paradox, revisited: ◮ s — the smallest positive number with K(s) > 10000

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

SLIDE 102

Computability of K

◮ Can we compute K(x)? ◮ Answer, No. ◮ Proof: Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K(s) > 2C + 10, 000 ◮ s can be computed by the following program:

1. x = 0
2. While (K(x) < 2C + 10, 000): x++
3. Output x

◮ Thus K(s) < C + log C + log 10, 000 + const < 2C + 10, 000 ◮ Bergg’s Paradox, revisited: ◮ s — the smallest positive number with K(s) > 10000 ◮ This is not a paradox, since the description of s is not short.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

SLIDE 103

Explicit large complexity strings

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

SLIDE 104

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

SLIDE 105

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

SLIDE 106

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

SLIDE 107

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

SLIDE 108

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

◮ For most strings K(x) > C + 1, but it cannot be proven even for a single

string

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

SLIDE 109

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

◮ For most strings K(x) > C + 1, but it cannot be proven even for a single

string

◮ K(x) ≥ C is an example for a theorem that cannot be proven, and for

most x’s cannot be disproved.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

SLIDE 110

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

◮ For most strings K(x) > C + 1, but it cannot be proven even for a single

string

◮ K(x) ≥ C is an example for a theorem that cannot be proven, and for

most x’s cannot be disproved.

◮ Proof: for integer C define the program TC:

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

SLIDE 111

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

◮ For most strings K(x) > C + 1, but it cannot be proven even for a single

string

◮ K(x) ≥ C is an example for a theorem that cannot be proven, and for

most x’s cannot be disproved.

◮ Proof: for integer C define the program TC:

1. y = 0

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

SLIDE 112

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

◮ For most strings K(x) > C + 1, but it cannot be proven even for a single

string

◮ K(x) ≥ C is an example for a theorem that cannot be proven, and for

most x’s cannot be disproved.

◮ Proof: for integer C define the program TC:

1. y = 0
2. If y is a proof for the statement k(x) > C, output x

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

SLIDE 113

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

◮ For most strings K(x) > C + 1, but it cannot be proven even for a single

string

◮ K(x) ≥ C is an example for a theorem that cannot be proven, and for

most x’s cannot be disproved.

◮ Proof: for integer C define the program TC:

1. y = 0
2. If y is a proof for the statement k(x) > C, output x
3. y++

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

SLIDE 114

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

◮ For most strings K(x) > C + 1, but it cannot be proven even for a single

string

◮ K(x) ≥ C is an example for a theorem that cannot be proven, and for

most x’s cannot be disproved.

◮ Proof: for integer C define the program TC:

1. y = 0
2. If y is a proof for the statement k(x) > C, output x
3. y++

◮ |TC| = log C + D, where D is a const

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

SLIDE 115

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

◮ For most strings K(x) > C + 1, but it cannot be proven even for a single

string

◮ K(x) ≥ C is an example for a theorem that cannot be proven, and for

most x’s cannot be disproved.

◮ Proof: for integer C define the program TC:

1. y = 0
2. If y is a proof for the statement k(x) > C, output x
3. y++

◮ |TC| = log C + D, where D is a const ◮ Take C such that C > log C + D

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

SLIDE 116

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

◮ For most strings K(x) > C + 1, but it cannot be proven even for a single

string

◮ K(x) ≥ C is an example for a theorem that cannot be proven, and for

most x’s cannot be disproved.

◮ Proof: for integer C define the program TC:

1. y = 0
2. If y is a proof for the statement k(x) > C, output x
3. y++

◮ |TC| = log C + D, where D is a const ◮ Take C such that C > log C + D ◮ If TC stops and outputs x, then k(x) < log C + D < C, a contradiction to

the fact that ∃ proof that k(x) > C.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

SLIDE 117

Part II Other Entropy Measures

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 16 / 24

SLIDE 118

Other entropy measures

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

SLIDE 119

Other entropy measures

Let X ∼ p be a random variable over X.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

SLIDE 120

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

SLIDE 121

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)|

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

SLIDE 122

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)}

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

SLIDE 123

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

SLIDE 124

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

SLIDE 125

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

SLIDE 126

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

◮ Collision entropy/Renyi entropy of X is H2(X) = − log CP(X)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

SLIDE 127

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

◮ Collision entropy/Renyi entropy of X is H2(X) = − log CP(X) ◮ For α = 1 ∈ N — Hα =

1 1−α log

n

i=1 pα i

=

α 1−α log(pα)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

SLIDE 128

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

◮ Collision entropy/Renyi entropy of X is H2(X) = − log CP(X) ◮ For α = 1 ∈ N — Hα =

1 1−α log

n

i=1 pα i

=

α 1−α log(pα)

◮ H∞(X) ≤ H2(X) ≤ H(X) ≤ H0(X) (Jensen)

Equality iff X is uniform over X

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

SLIDE 129

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

◮ Collision entropy/Renyi entropy of X is H2(X) = − log CP(X) ◮ For α = 1 ∈ N — Hα =

1 1−α log

n

i=1 pα i

=

α 1−α log(pα)

◮ H∞(X) ≤ H2(X) ≤ H(X) ≤ H0(X) (Jensen)

Equality iff X is uniform over X

◮ For instance, CP(X) ≤

x p(x) maxx′ p(x′) = maxx′ p(x′).

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

SLIDE 130

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

◮ Collision entropy/Renyi entropy of X is H2(X) = − log CP(X) ◮ For α = 1 ∈ N — Hα =

1 1−α log

n

i=1 pα i

=

α 1−α log(pα)

◮ H∞(X) ≤ H2(X) ≤ H(X) ≤ H0(X) (Jensen)

Equality iff X is uniform over X

◮ For instance, CP(X) ≤

x p(x) maxx′ p(x′) = maxx′ p(x′).

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

SLIDE 131

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

◮ Collision entropy/Renyi entropy of X is H2(X) = − log CP(X) ◮ For α = 1 ∈ N — Hα =

1 1−α log

n

i=1 pα i

=

α 1−α log(pα)

◮ H∞(X) ≤ H2(X) ≤ H(X) ≤ H0(X) (Jensen)

Equality iff X is uniform over X

◮ For instance, CP(X) ≤

x p(x) maxx′ p(x′) = maxx′ p(x′). Hence,

H2(X) ≥ − log maxx′ p(x′) = H∞(X).

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

SLIDE 132

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

◮ Collision entropy/Renyi entropy of X is H2(X) = − log CP(X) ◮ For α = 1 ∈ N — Hα =

1 1−α log

n

i=1 pα i

=

α 1−α log(pα)

◮ H∞(X) ≤ H2(X) ≤ H(X) ≤ H0(X) (Jensen)

Equality iff X is uniform over X

◮ For instance, CP(X) ≤

x p(x) maxx′ p(x′) = maxx′ p(x′). Hence,

H2(X) ≥ − log maxx′ p(x′) = H∞(X).

◮ H2(X) ≤ 2 H∞(X)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

SLIDE 133

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

◮ Collision entropy/Renyi entropy of X is H2(X) = − log CP(X) ◮ For α = 1 ∈ N — Hα =

1 1−α log

n

i=1 pα i

=

α 1−α log(pα)

◮ H∞(X) ≤ H2(X) ≤ H(X) ≤ H0(X) (Jensen)

Equality iff X is uniform over X

◮ For instance, CP(X) ≤

x p(x) maxx′ p(x′) = maxx′ p(x′). Hence,

H2(X) ≥ − log maxx′ p(x′) = H∞(X).

◮ H2(X) ≤ 2 H∞(X) ◮ Proof: CP(X) ≥ (maxx′ p(x′))2.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

SLIDE 134

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

◮ Collision entropy/Renyi entropy of X is H2(X) = − log CP(X) ◮ For α = 1 ∈ N — Hα =

1 1−α log

n

i=1 pα i

=

α 1−α log(pα)

◮ H∞(X) ≤ H2(X) ≤ H(X) ≤ H0(X) (Jensen)

Equality iff X is uniform over X

◮ For instance, CP(X) ≤

x p(x) maxx′ p(x′) = maxx′ p(x′). Hence,

H2(X) ≥ − log maxx′ p(x′) = H∞(X).

◮ H2(X) ≤ 2 H∞(X) ◮ Proof: CP(X) ≥ (maxx′ p(x′))2.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

SLIDE 135

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

◮ Collision entropy/Renyi entropy of X is H2(X) = − log CP(X) ◮ For α = 1 ∈ N — Hα =

1 1−α log

n

i=1 pα i

=

α 1−α log(pα)

◮ H∞(X) ≤ H2(X) ≤ H(X) ≤ H0(X) (Jensen)

Equality iff X is uniform over X

◮ For instance, CP(X) ≤

x p(x) maxx′ p(x′) = maxx′ p(x′). Hence,

H2(X) ≥ − log maxx′ p(x′) = H∞(X).

◮ H2(X) ≤ 2 H∞(X) ◮ Proof: CP(X) ≥ (maxx′ p(x′))2. Hence, − log CP(X) ≤ −2 H∞(X)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

SLIDE 136

Other entropy measures, cont

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 18 / 24

SLIDE 137

Other entropy measures, cont

◮ No simple chain rule.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 18 / 24

SLIDE 138

Other entropy measures, cont

◮ No simple chain rule. ◮ Let X =⊥ wp 1

2 and uniform over {0, 1}n otherwise, and let Y be

indicator for X =⊥.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 18 / 24

SLIDE 139

Other entropy measures, cont

◮ No simple chain rule. ◮ Let X =⊥ wp 1

2 and uniform over {0, 1}n otherwise, and let Y be

indicator for X =⊥.

◮ H∞(X|Y = 1) = 0 and H∞(X|Y = 0) = n. But H∞(X) = 1.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 18 / 24

SLIDE 140

Section 1 Shannon to Min entropy

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 19 / 24

SLIDE 141

Shannon to Min entropy

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

SLIDE 142

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

SLIDE 143

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

SLIDE 144

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof:

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

SLIDE 145

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof: (quantitative) AEP .

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

SLIDE 146

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof: (quantitative) AEP .

◮ An,ε := {x ∈ Supp(X n): 2−n(H(X)+ε) ≤ pn(x) ≤ 2−n(H(X)−ε)}

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

SLIDE 147

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof: (quantitative) AEP .

◮ An,ε := {x ∈ Supp(X n): 2−n(H(X)+ε) ≤ pn(x) ≤ 2−n(H(X)−ε)} ◮ − log pn(x) ≥ n · (H(X) − ε) for any x ∈ An,ε

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

SLIDE 148

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof: (quantitative) AEP .

◮ An,ε := {x ∈ Supp(X n): 2−n(H(X)+ε) ≤ pn(x) ≤ 2−n(H(X)−ε)} ◮ − log pn(x) ≥ n · (H(X) − ε) for any x ∈ An,ε

Proposition 6 (Hoeffding’s inequality) Let Z 1, . . . , Z n be iids over [0, 1] with expectation µ. Then, Pr

|

n

j=i Z j

n

− µ| ≥ ε

≤ 2 · e−2ε2n for every ε > 0.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

SLIDE 149

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof: (quantitative) AEP .

◮ An,ε := {x ∈ Supp(X n): 2−n(H(X)+ε) ≤ pn(x) ≤ 2−n(H(X)−ε)} ◮ − log pn(x) ≥ n · (H(X) − ε) for any x ∈ An,ε

Proposition 6 (Hoeffding’s inequality) Let Z 1, . . . , Z n be iids over [0, 1] with expectation µ. Then, Pr

|

n

j=i Z j

n

− µ| ≥ ε

≤ 2 · e−2ε2n for every ε > 0.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

SLIDE 150

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof: (quantitative) AEP .

◮ An,ε := {x ∈ Supp(X n): 2−n(H(X)+ε) ≤ pn(x) ≤ 2−n(H(X)−ε)} ◮ − log pn(x) ≥ n · (H(X) − ε) for any x ∈ An,ε

Proposition 6 (Hoeffding’s inequality) Let Z 1, . . . , Z n be iids over [0, 1] with expectation µ. Then, Pr

|

n

j=i Z j

n

− µ| ≥ ε

≤ 2 · e−2ε2n for every ε > 0.

◮ Taking Zi = log p(Xi), it follows that Pr [X n /

∈ An,ε] ≤ 2 · e−2ε2n

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

SLIDE 151

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof: (quantitative) AEP .

◮ An,ε := {x ∈ Supp(X n): 2−n(H(X)+ε) ≤ pn(x) ≤ 2−n(H(X)−ε)} ◮ − log pn(x) ≥ n · (H(X) − ε) for any x ∈ An,ε

Proposition 6 (Hoeffding’s inequality) Let Z 1, . . . , Z n be iids over [0, 1] with expectation µ. Then, Pr

|

n

j=i Z j

n

− µ| ≥ ε

≤ 2 · e−2ε2n for every ε > 0.

◮ Taking Zi = log p(Xi), it follows that Pr [X n /

∈ An,ε] ≤ 2 · e−2ε2n

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

SLIDE 152

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof: (quantitative) AEP .

◮ An,ε := {x ∈ Supp(X n): 2−n(H(X)+ε) ≤ pn(x) ≤ 2−n(H(X)−ε)} ◮ − log pn(x) ≥ n · (H(X) − ε) for any x ∈ An,ε

Proposition 6 (Hoeffding’s inequality) Let Z 1, . . . , Z n be iids over [0, 1] with expectation µ. Then, Pr

|

n

j=i Z j

n

− µ| ≥ ε

≤ 2 · e−2ε2n for every ε > 0.

◮ Taking Zi = log p(Xi), it follows that Pr [X n /

∈ An,ε] ≤ 2 · e−2ε2n Corollary 7 ∃ rv W that is (2 · e−2ε2n)-close to X n, and H∞(W) ≥ n(H(X) − ε).

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

SLIDE 153

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof: (quantitative) AEP .

◮ An,ε := {x ∈ Supp(X n): 2−n(H(X)+ε) ≤ pn(x) ≤ 2−n(H(X)−ε)} ◮ − log pn(x) ≥ n · (H(X) − ε) for any x ∈ An,ε

Proposition 6 (Hoeffding’s inequality) Let Z 1, . . . , Z n be iids over [0, 1] with expectation µ. Then, Pr

|

n

j=i Z j

n

− µ| ≥ ε

≤ 2 · e−2ε2n for every ε > 0.

◮ Taking Zi = log p(Xi), it follows that Pr [X n /

∈ An,ε] ≤ 2 · e−2ε2n Corollary 7 ∃ rv W that is (2 · e−2ε2n)-close to X n, and H∞(W) ≥ n(H(X) − ε). Proof:

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

SLIDE 154

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof: (quantitative) AEP .

◮ An,ε := {x ∈ Supp(X n): 2−n(H(X)+ε) ≤ pn(x) ≤ 2−n(H(X)−ε)} ◮ − log pn(x) ≥ n · (H(X) − ε) for any x ∈ An,ε

Proposition 6 (Hoeffding’s inequality) Let Z 1, . . . , Z n be iids over [0, 1] with expectation µ. Then, Pr

|

n

j=i Z j

n

− µ| ≥ ε

≤ 2 · e−2ε2n for every ε > 0.

◮ Taking Zi = log p(Xi), it follows that Pr [X n /

∈ An,ε] ≤ 2 · e−2ε2n Corollary 7 ∃ rv W that is (2 · e−2ε2n)-close to X n, and H∞(W) ≥ n(H(X) − ε). Proof: W = X n if X n ∈ An,ε, and “well spread” outside Supp(X n) otherwise.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

SLIDE 155

Shannon to Min entropy, conditional version

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 21 / 24

SLIDE 156

Shannon to Min entropy, conditional version

Lemma 8 Let (X, Y) ∼ p let ε > 0. Then Pr(xn,yn)←(X,Y)n

− log pn

X n|Y n(xn|yn) ≤ n · (H(X|Y) − ε)

< 2 · e−2ε2n.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 21 / 24

SLIDE 157

Shannon to Min entropy, conditional version

Lemma 8 Let (X, Y) ∼ p let ε > 0. Then Pr(xn,yn)←(X,Y)n

− log pn

X n|Y n(xn|yn) ≤ n · (H(X|Y) − ε)

< 2 · e−2ε2n.

Proof:

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 21 / 24

SLIDE 158

Shannon to Min entropy, conditional version

Lemma 8 Let (X, Y) ∼ p let ε > 0. Then Pr(xn,yn)←(X,Y)n

− log pn

X n|Y n(xn|yn) ≤ n · (H(X|Y) − ε)

< 2 · e−2ε2n.

Proof: same proof, letting Zi = log pX|Y(Xi, Yi)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 21 / 24

SLIDE 159

Shannon to Min entropy, conditional version

Lemma 8 Let (X, Y) ∼ p let ε > 0. Then Pr(xn,yn)←(X,Y)n

− log pn

X n|Y n(xn|yn) ≤ n · (H(X|Y) − ε)

< 2 · e−2ε2n.

Proof: same proof, letting Zi = log pX|Y(Xi, Yi) Corollary 9 ∃ rv W over X n × Yn that is (2 · e−2ε2n)-far from (X, Y)n,

◮ SD(WYn, Y n) = 0, and

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 21 / 24

SLIDE 160

Shannon to Min entropy, conditional version

Lemma 8 Let (X, Y) ∼ p let ε > 0. Then Pr(xn,yn)←(X,Y)n

− log pn

X n|Y n(xn|yn) ≤ n · (H(X|Y) − ε)

< 2 · e−2ε2n.

Proof: same proof, letting Zi = log pX|Y(Xi, Yi) Corollary 9 ∃ rv W over X n × Yn that is (2 · e−2ε2n)-far from (X, Y)n,

◮ SD(WYn, Y n) = 0, and ◮ H(W | WYn = y) ≥ n · (H(X|Y) − ε), for any y ∈ Supp(Y n)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 21 / 24

SLIDE 161

Shannon to Min entropy, conditional version

Lemma 8 Let (X, Y) ∼ p let ε > 0. Then Pr(xn,yn)←(X,Y)n

− log pn

X n|Y n(xn|yn) ≤ n · (H(X|Y) − ε)

< 2 · e−2ε2n.

Proof: same proof, letting Zi = log pX|Y(Xi, Yi) Corollary 9 ∃ rv W over X n × Yn that is (2 · e−2ε2n)-far from (X, Y)n,

◮ SD(WYn, Y n) = 0, and ◮ H(W | WYn = y) ≥ n · (H(X|Y) − ε), for any y ∈ Supp(Y n)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 21 / 24

SLIDE 162

Shannon to Min entropy, conditional version

Lemma 8 Let (X, Y) ∼ p let ε > 0. Then Pr(xn,yn)←(X,Y)n

− log pn

X n|Y n(xn|yn) ≤ n · (H(X|Y) − ε)

< 2 · e−2ε2n.

Proof: same proof, letting Zi = log pX|Y(Xi, Yi) Corollary 9 ∃ rv W over X n × Yn that is (2 · e−2ε2n)-far from (X, Y)n,

◮ SD(WYn, Y n) = 0, and ◮ H(W | WYn = y) ≥ n · (H(X|Y) − ε), for any y ∈ Supp(Y n)

Proof: ?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 21 / 24

SLIDE 163

Section 2 Renyi-entropy to Uniform Distribution

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 22 / 24

SLIDE 164

Pairwise independent hashing

Definition 10 (pairwise independent function family) A function family G = {g : D → R} is pairwise independent, if ∀ x = x′ ∈ D and y, y′ ∈ R, it holds that Prg←G [g(x) = y ∧ g(x′) = y′)] = ( 1

|R|)2.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 23 / 24

SLIDE 165

Pairwise independent hashing

Definition 10 (pairwise independent function family) A function family G = {g : D → R} is pairwise independent, if ∀ x = x′ ∈ D and y, y′ ∈ R, it holds that Prg←G [g(x) = y ∧ g(x′) = y′)] = ( 1

|R|)2.

◮ Example: for D = {0, 1}n and R = {0, 1}m let

G = {(A, b) ∈ {0, 1}m×n × {0, 1}m} with (A, b)(x) = A × x + b.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 23 / 24

SLIDE 166

Pairwise independent hashing

Definition 10 (pairwise independent function family) A function family G = {g : D → R} is pairwise independent, if ∀ x = x′ ∈ D and y, y′ ∈ R, it holds that Prg←G [g(x) = y ∧ g(x′) = y′)] = ( 1

|R|)2.

◮ Example: for D = {0, 1}n and R = {0, 1}m let

G = {(A, b) ∈ {0, 1}m×n × {0, 1}m} with (A, b)(x) = A × x + b.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 23 / 24

SLIDE 167

Pairwise independent hashing

Definition 10 (pairwise independent function family) A function family G = {g : D → R} is pairwise independent, if ∀ x = x′ ∈ D and y, y′ ∈ R, it holds that Prg←G [g(x) = y ∧ g(x′) = y′)] = ( 1

|R|)2.

◮ Example: for D = {0, 1}n and R = {0, 1}m let

G = {(A, b) ∈ {0, 1}m×n × {0, 1}m} with (A, b)(x) = A × x + b.

◮ 2-universal families: Prg←G [g(x) = g(x′))] =

1 |R|.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 23 / 24

SLIDE 168

Pairwise independent hashing

Definition 10 (pairwise independent function family) A function family G = {g : D → R} is pairwise independent, if ∀ x = x′ ∈ D and y, y′ ∈ R, it holds that Prg←G [g(x) = y ∧ g(x′) = y′)] = ( 1

|R|)2.

◮ Example: for D = {0, 1}n and R = {0, 1}m let

G = {(A, b) ∈ {0, 1}m×n × {0, 1}m} with (A, b)(x) = A × x + b.

◮ 2-universal families: Prg←G [g(x) = g(x′))] =

1 |R|.

◮ Example for universal family that is not pairwise independent?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 23 / 24

SLIDE 169

Pairwise independent hashing

Definition 10 (pairwise independent function family) A function family G = {g : D → R} is pairwise independent, if ∀ x = x′ ∈ D and y, y′ ∈ R, it holds that Prg←G [g(x) = y ∧ g(x′) = y′)] = ( 1

|R|)2.

◮ Example: for D = {0, 1}n and R = {0, 1}m let

G = {(A, b) ∈ {0, 1}m×n × {0, 1}m} with (A, b)(x) = A × x + b.

◮ 2-universal families: Prg←G [g(x) = g(x′))] =

1 |R|.

◮ Example for universal family that is not pairwise independent? ◮ Many-wise independent

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 23 / 24

SLIDE 170

Pairwise independent hashing

Definition 10 (pairwise independent function family) A function family G = {g : D → R} is pairwise independent, if ∀ x = x′ ∈ D and y, y′ ∈ R, it holds that Prg←G [g(x) = y ∧ g(x′) = y′)] = ( 1

|R|)2.

◮ Example: for D = {0, 1}n and R = {0, 1}m let

G = {(A, b) ∈ {0, 1}m×n × {0, 1}m} with (A, b)(x) = A × x + b.

◮ 2-universal families: Prg←G [g(x) = g(x′))] =

1 |R|.

◮ Example for universal family that is not pairwise independent? ◮ Many-wise independent ◮ We identify functions with their description.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 23 / 24

SLIDE 171

Pairwise independent hashing

Definition 10 (pairwise independent function family) A function family G = {g : D → R} is pairwise independent, if ∀ x = x′ ∈ D and y, y′ ∈ R, it holds that Prg←G [g(x) = y ∧ g(x′) = y′)] = ( 1

|R|)2.

◮ Example: for D = {0, 1}n and R = {0, 1}m let

G = {(A, b) ∈ {0, 1}m×n × {0, 1}m} with (A, b)(x) = A × x + b.

◮ 2-universal families: Prg←G [g(x) = g(x′))] =

1 |R|.

◮ Example for universal family that is not pairwise independent? ◮ Many-wise independent ◮ We identify functions with their description. ◮ Amazingly useful tool

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 23 / 24

SLIDE 172

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

SLIDE 173

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

SLIDE 174

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

SLIDE 175

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Proof: Let q be the uniform distribution over U.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

SLIDE 176

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Proof: Let q be the uniform distribution over U.

◮ p − q2

2 = u∈U(d(u)−q(u))2

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

SLIDE 177

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Proof: Let q be the uniform distribution over U.

◮ p − q2

2 = u∈U(d(u)−q(u))2

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

SLIDE 178

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Proof: Let q be the uniform distribution over U.

◮ p − q2

2 = u∈U(d(u)−q(u))2 = p2 2+q2 2−2p, q

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

SLIDE 179

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Proof: Let q be the uniform distribution over U.

◮ p − q2

2 = u∈U(d(u)−q(u))2 = p2 2+q2 2−2p, q = CP(p)− 1 |U| ≤ δ |U|

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

SLIDE 180

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Proof: Let q be the uniform distribution over U.

◮ p − q2

2 = u∈U(d(u)−q(u))2 = p2 2+q2 2−2p, q = CP(p)− 1 |U| ≤ δ |U|

◮ Chebyshev Sum Inequality: (n

i=1 ai)2 ≤ n n i=1 a2 i

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

SLIDE 181

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Proof: Let q be the uniform distribution over U.

◮ p − q2

2 = u∈U(d(u)−q(u))2 = p2 2+q2 2−2p, q = CP(p)− 1 |U| ≤ δ |U|

◮ Chebyshev Sum Inequality: (n

i=1 ai)2 ≤ n n i=1 a2 i

◮ Hence, p − q2

1 ≤ |U| · p − q2 2

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

SLIDE 182

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Proof: Let q be the uniform distribution over U.

◮ p − q2

2 = u∈U(d(u)−q(u))2 = p2 2+q2 2−2p, q = CP(p)− 1 |U| ≤ δ |U|

◮ Chebyshev Sum Inequality: (n

i=1 ai)2 ≤ n n i=1 a2 i

◮ Hence, p − q2

1 ≤ |U| · p − q2 2

◮ Thus, SD(p, q) = 1

2 p − q1 ≤ √ δ 2 .

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

SLIDE 183

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Proof: Let q be the uniform distribution over U.

◮ p − q2

2 = u∈U(d(u)−q(u))2 = p2 2+q2 2−2p, q = CP(p)− 1 |U| ≤ δ |U|

◮ Chebyshev Sum Inequality: (n

i=1 ai)2 ≤ n n i=1 a2 i

◮ Hence, p − q2

1 ≤ |U| · p − q2 2

◮ Thus, SD(p, q) = 1

2 p − q1 ≤ √ δ 2 .

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

SLIDE 184

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Proof: Let q be the uniform distribution over U.

◮ p − q2

2 = u∈U(d(u)−q(u))2 = p2 2+q2 2−2p, q = CP(p)− 1 |U| ≤ δ |U|

◮ Chebyshev Sum Inequality: (n

i=1 ai)2 ≤ n n i=1 a2 i

◮ Hence, p − q2

1 ≤ |U| · p − q2 2

◮ Thus, SD(p, q) = 1

2 p − q1 ≤ √ δ 2 .

To deuce the proof of Lemma 11, we notice that CP(G, G(X)) ≤

1 |G| · (2−k + 2−m) = 1+2m−k |G×{0,1}m|

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24