Kolmogorov Complexity and Other Entropy Measures Iftach Haitner - - PowerPoint PPT Presentation

kolmogorov complexity and other entropy measures
SMART_READER_LITE
LIVE PREVIEW

Kolmogorov Complexity and Other Entropy Measures Iftach Haitner - - PowerPoint PPT Presentation

Application of Information Theory, Lecture 8 Kolmogorov Complexity and Other Entropy Measures Iftach Haitner Tel Aviv University. December 16, 2014 Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 1 / 24


slide-1
SLIDE 1

Application of Information Theory, Lecture 8

Kolmogorov Complexity and Other Entropy Measures

Iftach Haitner

Tel Aviv University.

December 16, 2014

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 1 / 24

slide-2
SLIDE 2

Part I Kolmogorov Complexity

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 2 / 24

slide-3
SLIDE 3

Description length

◮ What is the description length of the following strings?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

slide-4
SLIDE 4

Description length

◮ What is the description length of the following strings?

  • 1. 010101010101010101010101010101010101

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

slide-5
SLIDE 5

Description length

◮ What is the description length of the following strings?

  • 1. 010101010101010101010101010101010101
  • 2. 011010100000100111100110011001111110

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

slide-6
SLIDE 6

Description length

◮ What is the description length of the following strings?

  • 1. 010101010101010101010101010101010101
  • 2. 011010100000100111100110011001111110
  • 3. 111010100110001100111100010101011111

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

slide-7
SLIDE 7

Description length

◮ What is the description length of the following strings?

  • 1. 010101010101010101010101010101010101
  • 2. 011010100000100111100110011001111110
  • 3. 111010100110001100111100010101011111

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

slide-8
SLIDE 8

Description length

◮ What is the description length of the following strings?

  • 1. 010101010101010101010101010101010101
  • 2. 011010100000100111100110011001111110
  • 3. 111010100110001100111100010101011111

  • 1. Eighteen 01

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

slide-9
SLIDE 9

Description length

◮ What is the description length of the following strings?

  • 1. 010101010101010101010101010101010101
  • 2. 011010100000100111100110011001111110
  • 3. 111010100110001100111100010101011111

  • 1. Eighteen 01
  • 2. First 36 bit of the binary expansion of

√ 2 − 1

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

slide-10
SLIDE 10

Description length

◮ What is the description length of the following strings?

  • 1. 010101010101010101010101010101010101
  • 2. 011010100000100111100110011001111110
  • 3. 111010100110001100111100010101011111

  • 1. Eighteen 01
  • 2. First 36 bit of the binary expansion of

√ 2 − 1

  • 3. Looks random, but 22 ones out of 36

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

slide-11
SLIDE 11

Description length

◮ What is the description length of the following strings?

  • 1. 010101010101010101010101010101010101
  • 2. 011010100000100111100110011001111110
  • 3. 111010100110001100111100010101011111

  • 1. Eighteen 01
  • 2. First 36 bit of the binary expansion of

√ 2 − 1

  • 3. Looks random, but 22 ones out of 36

◮ Bergg’s paradox: Let s be “the smallest positive integer that cannot be

described in twelve English words”

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

slide-12
SLIDE 12

Description length

◮ What is the description length of the following strings?

  • 1. 010101010101010101010101010101010101
  • 2. 011010100000100111100110011001111110
  • 3. 111010100110001100111100010101011111

  • 1. Eighteen 01
  • 2. First 36 bit of the binary expansion of

√ 2 − 1

  • 3. Looks random, but 22 ones out of 36

◮ Bergg’s paradox: Let s be “the smallest positive integer that cannot be

described in twelve English words”

◮ The above is a definition of s, of less than twelve English words...

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

slide-13
SLIDE 13

Description length

◮ What is the description length of the following strings?

  • 1. 010101010101010101010101010101010101
  • 2. 011010100000100111100110011001111110
  • 3. 111010100110001100111100010101011111

  • 1. Eighteen 01
  • 2. First 36 bit of the binary expansion of

√ 2 − 1

  • 3. Looks random, but 22 ones out of 36

◮ Bergg’s paradox: Let s be “the smallest positive integer that cannot be

described in twelve English words”

◮ The above is a definition of s, of less than twelve English words... ◮ Solution: the word “described" above in the definition of s is not well

defined

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24

slide-14
SLIDE 14

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

slide-15
SLIDE 15

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

◮ Now the term “described” is well defined.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

slide-16
SLIDE 16

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

◮ Now the term “described” is well defined. ◮ Why C++?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

slide-17
SLIDE 17

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

◮ Now the term “described” is well defined. ◮ Why C++? ◮ All (complete) programming language/computational model are

essentially equivalent.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

slide-18
SLIDE 18

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

◮ Now the term “described” is well defined. ◮ Why C++? ◮ All (complete) programming language/computational model are

essentially equivalent.

◮ Let K ′(x) be the description length of x in another complete language,

then |K(x) − k′(x)| ≤ const.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

slide-19
SLIDE 19

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

◮ Now the term “described” is well defined. ◮ Why C++? ◮ All (complete) programming language/computational model are

essentially equivalent.

◮ Let K ′(x) be the description length of x in another complete language,

then |K(x) − k′(x)| ≤ const.

◮ What is K(x) for x = 0101010101 . . . 01

  • n pairs

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

slide-20
SLIDE 20

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

◮ Now the term “described” is well defined. ◮ Why C++? ◮ All (complete) programming language/computational model are

essentially equivalent.

◮ Let K ′(x) be the description length of x in another complete language,

then |K(x) − k′(x)| ≤ const.

◮ What is K(x) for x = 0101010101 . . . 01

  • n pairs

◮ “For i = 1 : i++ : n; print 01”

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

slide-21
SLIDE 21

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

◮ Now the term “described” is well defined. ◮ Why C++? ◮ All (complete) programming language/computational model are

essentially equivalent.

◮ Let K ′(x) be the description length of x in another complete language,

then |K(x) − k′(x)| ≤ const.

◮ What is K(x) for x = 0101010101 . . . 01

  • n pairs

◮ “For i = 1 : i++ : n; print 01” ◮ K(x) ≤ log n + const

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

slide-22
SLIDE 22

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

◮ Now the term “described” is well defined. ◮ Why C++? ◮ All (complete) programming language/computational model are

essentially equivalent.

◮ Let K ′(x) be the description length of x in another complete language,

then |K(x) − k′(x)| ≤ const.

◮ What is K(x) for x = 0101010101 . . . 01

  • n pairs

◮ “For i = 1 : i++ : n; print 01” ◮ K(x) ≤ log n + const ◮ This is considered to be small complexity. We typically ignore log n

factors.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

slide-23
SLIDE 23

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

◮ Now the term “described” is well defined. ◮ Why C++? ◮ All (complete) programming language/computational model are

essentially equivalent.

◮ Let K ′(x) be the description length of x in another complete language,

then |K(x) − k′(x)| ≤ const.

◮ What is K(x) for x = 0101010101 . . . 01

  • n pairs

◮ “For i = 1 : i++ : n; print 01” ◮ K(x) ≤ log n + const ◮ This is considered to be small complexity. We typically ignore log n

factors.

◮ What is K(x) for x being the first n digits of π?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

slide-24
SLIDE 24

Kolmogorov complexity

◮ For s string x ∈ {0, 1}∗, let K(x) be the length of the shortest C++

program (written in binary) that outputs x (on empty input)

◮ Now the term “described” is well defined. ◮ Why C++? ◮ All (complete) programming language/computational model are

essentially equivalent.

◮ Let K ′(x) be the description length of x in another complete language,

then |K(x) − k′(x)| ≤ const.

◮ What is K(x) for x = 0101010101 . . . 01

  • n pairs

◮ “For i = 1 : i++ : n; print 01” ◮ K(x) ≤ log n + const ◮ This is considered to be small complexity. We typically ignore log n

factors.

◮ What is K(x) for x being the first n digits of π? ◮ K(x) = log n + const

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24

slide-25
SLIDE 25

More examples

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 5 / 24

slide-26
SLIDE 26

More examples

◮ What is K(x) for x ∈ {0, 1}n with k ones?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 5 / 24

slide-27
SLIDE 27

More examples

◮ What is K(x) for x ∈ {0, 1}n with k ones? ◮ Recall that

n

k

  • ≤ 2nh(k/n)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 5 / 24

slide-28
SLIDE 28

More examples

◮ What is K(x) for x ∈ {0, 1}n with k ones? ◮ Recall that

n

k

  • ≤ 2nh(k/n)

◮ Hence K(x) ≤ log n + nh(k/n)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 5 / 24

slide-29
SLIDE 29

Bounds

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

slide-30
SLIDE 30

Bounds

◮ K(x) ≤ |x| + const

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

slide-31
SLIDE 31

Bounds

◮ K(x) ≤ |x| + const ◮ Proof: “output x”

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

slide-32
SLIDE 32

Bounds

◮ K(x) ≤ |x| + const ◮ Proof: “output x” ◮ Most sequences have high Kolmogorov complexity:

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

slide-33
SLIDE 33

Bounds

◮ K(x) ≤ |x| + const ◮ Proof: “output x” ◮ Most sequences have high Kolmogorov complexity: ◮ At most 2n−1 (C++) programs of length ≤ n − 2

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

slide-34
SLIDE 34

Bounds

◮ K(x) ≤ |x| + const ◮ Proof: “output x” ◮ Most sequences have high Kolmogorov complexity: ◮ At most 2n−1 (C++) programs of length ≤ n − 2 ◮ 2n strings of length n

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

slide-35
SLIDE 35

Bounds

◮ K(x) ≤ |x| + const ◮ Proof: “output x” ◮ Most sequences have high Kolmogorov complexity: ◮ At most 2n−1 (C++) programs of length ≤ n − 2 ◮ 2n strings of length n ◮ Hence, at least 1

2 of n-bit strings have Kolmogorov complexity at least

n − 1

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

slide-36
SLIDE 36

Bounds

◮ K(x) ≤ |x| + const ◮ Proof: “output x” ◮ Most sequences have high Kolmogorov complexity: ◮ At most 2n−1 (C++) programs of length ≤ n − 2 ◮ 2n strings of length n ◮ Hence, at least 1

2 of n-bit strings have Kolmogorov complexity at least

n − 1

◮ In particular, a random sequence has Kolmogorov complexity ≈ n

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24

slide-37
SLIDE 37

Conditional Kolmogorov complexity

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 7 / 24

slide-38
SLIDE 38

Conditional Kolmogorov complexity

◮ K(x|y) — Kolmogorov complexity of x given y. The length of the

shortest partogram that outputd x on input y

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 7 / 24

slide-39
SLIDE 39

Conditional Kolmogorov complexity

◮ K(x|y) — Kolmogorov complexity of x given y. The length of the

shortest partogram that outputd x on input y

◮ Chain rule

K(x, y) ≈ k(y) + k(x|y)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 7 / 24

slide-40
SLIDE 40

H vs. K

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

slide-41
SLIDE 41

H vs. K

H(X) speaks about a random variable X and K(x) of a string x, but

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

slide-42
SLIDE 42

H vs. K

H(X) speaks about a random variable X and K(x) of a string x, but

◮ Both quantities measure the amount of uncertainty or randomness in an

  • bject

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

slide-43
SLIDE 43

H vs. K

H(X) speaks about a random variable X and K(x) of a string x, but

◮ Both quantities measure the amount of uncertainty or randomness in an

  • bject

◮ Both measure the number of bits it takes to describe an object

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

slide-44
SLIDE 44

H vs. K

H(X) speaks about a random variable X and K(x) of a string x, but

◮ Both quantities measure the amount of uncertainty or randomness in an

  • bject

◮ Both measure the number of bits it takes to describe an object ◮ Another property: Let X1, . . . , Xn be iid, then whp

K(X1, . . . , Xn) ≈ H(X1, . . . , Xn) = nH(X1)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

slide-45
SLIDE 45

H vs. K

H(X) speaks about a random variable X and K(x) of a string x, but

◮ Both quantities measure the amount of uncertainty or randomness in an

  • bject

◮ Both measure the number of bits it takes to describe an object ◮ Another property: Let X1, . . . , Xn be iid, then whp

K(X1, . . . , Xn) ≈ H(X1, . . . , Xn) = nH(X1)

◮ Proof: ?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

slide-46
SLIDE 46

H vs. K

H(X) speaks about a random variable X and K(x) of a string x, but

◮ Both quantities measure the amount of uncertainty or randomness in an

  • bject

◮ Both measure the number of bits it takes to describe an object ◮ Another property: Let X1, . . . , Xn be iid, then whp

K(X1, . . . , Xn) ≈ H(X1, . . . , Xn) = nH(X1)

◮ Proof: ?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

slide-47
SLIDE 47

H vs. K

H(X) speaks about a random variable X and K(x) of a string x, but

◮ Both quantities measure the amount of uncertainty or randomness in an

  • bject

◮ Both measure the number of bits it takes to describe an object ◮ Another property: Let X1, . . . , Xn be iid, then whp

K(X1, . . . , Xn) ≈ H(X1, . . . , Xn) = nH(X1)

◮ Proof: ? AEP

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

slide-48
SLIDE 48

H vs. K

H(X) speaks about a random variable X and K(x) of a string x, but

◮ Both quantities measure the amount of uncertainty or randomness in an

  • bject

◮ Both measure the number of bits it takes to describe an object ◮ Another property: Let X1, . . . , Xn be iid, then whp

K(X1, . . . , Xn) ≈ H(X1, . . . , Xn) = nH(X1)

◮ Proof: ? AEP ◮ Example: coin flip (0.7, 0.3) then whp we get a string with

K(x) ≈ n · h(0.3)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24

slide-49
SLIDE 49

Universal compression

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 9 / 24

slide-50
SLIDE 50

Universal compression

◮ A program of length K(x) that outputs x, compresses x into k(x) bit of

information.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 9 / 24

slide-51
SLIDE 51

Universal compression

◮ A program of length K(x) that outputs x, compresses x into k(x) bit of

information.

◮ Example: length of the human genome: 6 · 109 bits

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 9 / 24

slide-52
SLIDE 52

Universal compression

◮ A program of length K(x) that outputs x, compresses x into k(x) bit of

information.

◮ Example: length of the human genome: 6 · 109 bits ◮ But the code is redundant

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 9 / 24

slide-53
SLIDE 53

Universal compression

◮ A program of length K(x) that outputs x, compresses x into k(x) bit of

information.

◮ Example: length of the human genome: 6 · 109 bits ◮ But the code is redundant ◮ The relevant number to measure the number of possible values is the

Kolmogorov complexity of the code.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 9 / 24

slide-54
SLIDE 54

Universal compression

◮ A program of length K(x) that outputs x, compresses x into k(x) bit of

information.

◮ Example: length of the human genome: 6 · 109 bits ◮ But the code is redundant ◮ The relevant number to measure the number of possible values is the

Kolmogorov complexity of the code.

◮ No-one knows its value...

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 9 / 24

slide-55
SLIDE 55

Universal probability

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

slide-56
SLIDE 56

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

slide-57
SLIDE 57

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p. Definition 1 The universal probability of a string x is PU(x) =

p : p()=x 2−|p| = Prp←{0,1}∞ [p() = x]

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

slide-58
SLIDE 58

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p. Definition 1 The universal probability of a string x is PU(x) =

p : p()=x 2−|p| = Prp←{0,1}∞ [p() = x]

◮ Namely, the probability that if one picks a program at random, it prints x.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

slide-59
SLIDE 59

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p. Definition 1 The universal probability of a string x is PU(x) =

p : p()=x 2−|p| = Prp←{0,1}∞ [p() = x]

◮ Namely, the probability that if one picks a program at random, it prints x. ◮ Insensitive (up o constant factor) to the computation model.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

slide-60
SLIDE 60

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p. Definition 1 The universal probability of a string x is PU(x) =

p : p()=x 2−|p| = Prp←{0,1}∞ [p() = x]

◮ Namely, the probability that if one picks a program at random, it prints x. ◮ Insensitive (up o constant factor) to the computation model. ◮ Interpretation: PU(x) is the the probability that you observe x in nature.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

slide-61
SLIDE 61

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p. Definition 1 The universal probability of a string x is PU(x) =

p : p()=x 2−|p| = Prp←{0,1}∞ [p() = x]

◮ Namely, the probability that if one picks a program at random, it prints x. ◮ Insensitive (up o constant factor) to the computation model. ◮ Interpretation: PU(x) is the the probability that you observe x in nature.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

slide-62
SLIDE 62

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p. Definition 1 The universal probability of a string x is PU(x) =

p : p()=x 2−|p| = Prp←{0,1}∞ [p() = x]

◮ Namely, the probability that if one picks a program at random, it prints x. ◮ Insensitive (up o constant factor) to the computation model. ◮ Interpretation: PU(x) is the the probability that you observe x in nature. ◮ Computer as an intelligent amplifier

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

slide-63
SLIDE 63

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p. Definition 1 The universal probability of a string x is PU(x) =

p : p()=x 2−|p| = Prp←{0,1}∞ [p() = x]

◮ Namely, the probability that if one picks a program at random, it prints x. ◮ Insensitive (up o constant factor) to the computation model. ◮ Interpretation: PU(x) is the the probability that you observe x in nature. ◮ Computer as an intelligent amplifier

Theorem 2 ∃c > 0 such that 2−K(x) ≤ PU(x) ≤ c · 2−K(x) for every x ∈ {0, 1}∗.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

slide-64
SLIDE 64

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p. Definition 1 The universal probability of a string x is PU(x) =

p : p()=x 2−|p| = Prp←{0,1}∞ [p() = x]

◮ Namely, the probability that if one picks a program at random, it prints x. ◮ Insensitive (up o constant factor) to the computation model. ◮ Interpretation: PU(x) is the the probability that you observe x in nature. ◮ Computer as an intelligent amplifier

Theorem 2 ∃c > 0 such that 2−K(x) ≤ PU(x) ≤ c · 2−K(x) for every x ∈ {0, 1}∗.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

slide-65
SLIDE 65

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p. Definition 1 The universal probability of a string x is PU(x) =

p : p()=x 2−|p| = Prp←{0,1}∞ [p() = x]

◮ Namely, the probability that if one picks a program at random, it prints x. ◮ Insensitive (up o constant factor) to the computation model. ◮ Interpretation: PU(x) is the the probability that you observe x in nature. ◮ Computer as an intelligent amplifier

Theorem 2 ∃c > 0 such that 2−K(x) ≤ PU(x) ≤ c · 2−K(x) for every x ∈ {0, 1}∗.

◮ The interesting part is PU(x) ≤ c · 2−K(x)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

slide-66
SLIDE 66

Universal probability

K(x) = minp : p()=x |p|, where p() is the output of C++ program defined by p. Definition 1 The universal probability of a string x is PU(x) =

p : p()=x 2−|p| = Prp←{0,1}∞ [p() = x]

◮ Namely, the probability that if one picks a program at random, it prints x. ◮ Insensitive (up o constant factor) to the computation model. ◮ Interpretation: PU(x) is the the probability that you observe x in nature. ◮ Computer as an intelligent amplifier

Theorem 2 ∃c > 0 such that 2−K(x) ≤ PU(x) ≤ c · 2−K(x) for every x ∈ {0, 1}∗.

◮ The interesting part is PU(x) ≤ c · 2−K(x) ◮ Hence, for X ∼ PU, it holds that

  • EK(X) [−] H(X)
  • ≤ c

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24

slide-67
SLIDE 67

Proving Theorem 2

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 11 / 24

slide-68
SLIDE 68

Proving Theorem 2

◮ We need to find c > 0 such that k(x) ≤ log

1 PU(x) + c for every x ∈ {0, 1}∗

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 11 / 24

slide-69
SLIDE 69

Proving Theorem 2

◮ We need to find c > 0 such that k(x) ≤ log

1 PU(x) + c for every x ∈ {0, 1}∗

◮ In other words, find a program to output x whose length is log

1 PU(x) + c

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 11 / 24

slide-70
SLIDE 70

Proving Theorem 2

◮ We need to find c > 0 such that k(x) ≤ log

1 PU(x) + c for every x ∈ {0, 1}∗

◮ In other words, find a program to output x whose length is log

1 PU(x) + c

◮ Idea, program chooses a leaf on the Shannon code for PU (in which x is

  • f depth
  • log

1 PU(x)

  • )

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 11 / 24

slide-71
SLIDE 71

Proving Theorem 2

◮ We need to find c > 0 such that k(x) ≤ log

1 PU(x) + c for every x ∈ {0, 1}∗

◮ In other words, find a program to output x whose length is log

1 PU(x) + c

◮ Idea, program chooses a leaf on the Shannon code for PU (in which x is

  • f depth
  • log

1 PU(x)

  • )

◮ Problem: PU is not computable

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 11 / 24

slide-72
SLIDE 72

Proving Theorem 2

◮ We need to find c > 0 such that k(x) ≤ log

1 PU(x) + c for every x ∈ {0, 1}∗

◮ In other words, find a program to output x whose length is log

1 PU(x) + c

◮ Idea, program chooses a leaf on the Shannon code for PU (in which x is

  • f depth
  • log

1 PU(x)

  • )

◮ Problem: PU is not computable ◮ Solution: compute a better and better estimate for the tree of PU along

with the “mapping" from the tree nodes back to codewords.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 11 / 24

slide-73
SLIDE 73

Proving Theorem 2

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

slide-74
SLIDE 74

Proving Theorem 2

◮ Initial T to be the infinite Binary tree.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

slide-75
SLIDE 75

Proving Theorem 2

◮ Initial T to be the infinite Binary tree.

Program 3 (M) Enumerate over all programs in {0, 1}∗: at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and (∗, x, n(x)) / ∈ T, place (p, x, n(x)) at unused n(x)-depth node of T, for n(x) =

  • log

1 ˆ PU(x)

  • + 1 and ˆ

PU(x) =

p′ : emulated p′ has output x 2−|p′|

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

slide-76
SLIDE 76

Proving Theorem 2

◮ Initial T to be the infinite Binary tree.

Program 3 (M) Enumerate over all programs in {0, 1}∗: at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and (∗, x, n(x)) / ∈ T, place (p, x, n(x)) at unused n(x)-depth node of T, for n(x) =

  • log

1 ˆ PU(x)

  • + 1 and ˆ

PU(x) =

p′ : emulated p′ has output x 2−|p′|

◮ The program never gets stack (can always add the node).

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

slide-77
SLIDE 77

Proving Theorem 2

◮ Initial T to be the infinite Binary tree.

Program 3 (M) Enumerate over all programs in {0, 1}∗: at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and (∗, x, n(x)) / ∈ T, place (p, x, n(x)) at unused n(x)-depth node of T, for n(x) =

  • log

1 ˆ PU(x)

  • + 1 and ˆ

PU(x) =

p′ : emulated p′ has output x 2−|p′|

◮ The program never gets stack (can always add the node).

Proof: Let x ∈ {0, 1}∗. At each point through the execution of M,

  • (p,x,·)∈T 2−|p| ≤ 2−K(x)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

slide-78
SLIDE 78

Proving Theorem 2

◮ Initial T to be the infinite Binary tree.

Program 3 (M) Enumerate over all programs in {0, 1}∗: at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and (∗, x, n(x)) / ∈ T, place (p, x, n(x)) at unused n(x)-depth node of T, for n(x) =

  • log

1 ˆ PU(x)

  • + 1 and ˆ

PU(x) =

p′ : emulated p′ has output x 2−|p′|

◮ The program never gets stack (can always add the node).

Proof: Let x ∈ {0, 1}∗. At each point through the execution of M,

  • (p,x,·)∈T 2−|p| ≤ 2−K(x)

Since

x 2−K(x) ≤ 1, the proof follows by Kraft inequality.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

slide-79
SLIDE 79

Proving Theorem 2

◮ Initial T to be the infinite Binary tree.

Program 3 (M) Enumerate over all programs in {0, 1}∗: at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and (∗, x, n(x)) / ∈ T, place (p, x, n(x)) at unused n(x)-depth node of T, for n(x) =

  • log

1 ˆ PU(x)

  • + 1 and ˆ

PU(x) =

p′ : emulated p′ has output x 2−|p′|

◮ The program never gets stack (can always add the node).

Proof: Let x ∈ {0, 1}∗. At each point through the execution of M,

  • (p,x,·)∈T 2−|p| ≤ 2−K(x)

Since

x 2−K(x) ≤ 1, the proof follows by Kraft inequality.

◮ ∀x ∈ {0, 1}∗: M adds a node (·, x, ·) to T at depth 2 +

  • log

1 PU(x)

  • Iftach Haitner (TAU)

Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

slide-80
SLIDE 80

Proving Theorem 2

◮ Initial T to be the infinite Binary tree.

Program 3 (M) Enumerate over all programs in {0, 1}∗: at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and (∗, x, n(x)) / ∈ T, place (p, x, n(x)) at unused n(x)-depth node of T, for n(x) =

  • log

1 ˆ PU(x)

  • + 1 and ˆ

PU(x) =

p′ : emulated p′ has output x 2−|p′|

◮ The program never gets stack (can always add the node).

Proof: Let x ∈ {0, 1}∗. At each point through the execution of M,

  • (p,x,·)∈T 2−|p| ≤ 2−K(x)

Since

x 2−K(x) ≤ 1, the proof follows by Kraft inequality.

◮ ∀x ∈ {0, 1}∗: M adds a node (·, x, ·) to T at depth 2 +

  • log

1 PU(x)

  • Proof: ˆ

PU(x) converges to PU(x)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

slide-81
SLIDE 81

Proving Theorem 2

◮ Initial T to be the infinite Binary tree.

Program 3 (M) Enumerate over all programs in {0, 1}∗: at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and (∗, x, n(x)) / ∈ T, place (p, x, n(x)) at unused n(x)-depth node of T, for n(x) =

  • log

1 ˆ PU(x)

  • + 1 and ˆ

PU(x) =

p′ : emulated p′ has output x 2−|p′|

◮ The program never gets stack (can always add the node).

Proof: Let x ∈ {0, 1}∗. At each point through the execution of M,

  • (p,x,·)∈T 2−|p| ≤ 2−K(x)

Since

x 2−K(x) ≤ 1, the proof follows by Kraft inequality.

◮ ∀x ∈ {0, 1}∗: M adds a node (·, x, ·) to T at depth 2 +

  • log

1 PU(x)

  • Proof: ˆ

PU(x) converges to PU(x)

◮ For x ∈ {0, 1}∗, let ℓ(x) be the location its (2 +

  • log

1 PU(x)

  • )-depth node

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

slide-82
SLIDE 82

Proving Theorem 2

◮ Initial T to be the infinite Binary tree.

Program 3 (M) Enumerate over all programs in {0, 1}∗: at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and (∗, x, n(x)) / ∈ T, place (p, x, n(x)) at unused n(x)-depth node of T, for n(x) =

  • log

1 ˆ PU(x)

  • + 1 and ˆ

PU(x) =

p′ : emulated p′ has output x 2−|p′|

◮ The program never gets stack (can always add the node).

Proof: Let x ∈ {0, 1}∗. At each point through the execution of M,

  • (p,x,·)∈T 2−|p| ≤ 2−K(x)

Since

x 2−K(x) ≤ 1, the proof follows by Kraft inequality.

◮ ∀x ∈ {0, 1}∗: M adds a node (·, x, ·) to T at depth 2 +

  • log

1 PU(x)

  • Proof: ˆ

PU(x) converges to PU(x)

◮ For x ∈ {0, 1}∗, let ℓ(x) be the location its (2 +

  • log

1 PU(x)

  • )-depth node

◮ Program for printing x. Run M till it assigns the node at the location of ℓ(x)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24

slide-83
SLIDE 83

Applications

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

slide-84
SLIDE 84

Applications

◮ (another) Proof that there are infinity many primes.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

slide-85
SLIDE 85

Applications

◮ (another) Proof that there are infinity many primes. ◮ Assume there are finitely many primes p1, . . . , pm

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

slide-86
SLIDE 86

Applications

◮ (another) Proof that there are infinity many primes. ◮ Assume there are finitely many primes p1, . . . , pm ◮ Any length n integer x can be written as x = m

i=1 pdi i

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

slide-87
SLIDE 87

Applications

◮ (another) Proof that there are infinity many primes. ◮ Assume there are finitely many primes p1, . . . , pm ◮ Any length n integer x can be written as x = m

i=1 pdi i

◮ di ≤ n, hence length di ≤ log n

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

slide-88
SLIDE 88

Applications

◮ (another) Proof that there are infinity many primes. ◮ Assume there are finitely many primes p1, . . . , pm ◮ Any length n integer x can be written as x = m

i=1 pdi i

◮ di ≤ n, hence length di ≤ log n ◮ Hence, K(x) ≤ m · log n + const

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

slide-89
SLIDE 89

Applications

◮ (another) Proof that there are infinity many primes. ◮ Assume there are finitely many primes p1, . . . , pm ◮ Any length n integer x can be written as x = m

i=1 pdi i

◮ di ≤ n, hence length di ≤ log n ◮ Hence, K(x) ≤ m · log n + const ◮ But for most numbers k(x) ≥ n − 1

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24

slide-90
SLIDE 90

Computability of K

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

slide-91
SLIDE 91

Computability of K

◮ Can we compute K(x)?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

slide-92
SLIDE 92

Computability of K

◮ Can we compute K(x)? ◮ Answer, No.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

slide-93
SLIDE 93

Computability of K

◮ Can we compute K(x)? ◮ Answer, No. ◮ Proof: Assume K is computable by a program of length C

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

slide-94
SLIDE 94

Computability of K

◮ Can we compute K(x)? ◮ Answer, No. ◮ Proof: Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K(s) > 2C + 10, 000

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

slide-95
SLIDE 95

Computability of K

◮ Can we compute K(x)? ◮ Answer, No. ◮ Proof: Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K(s) > 2C + 10, 000 ◮ s can be computed by the following program:

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

slide-96
SLIDE 96

Computability of K

◮ Can we compute K(x)? ◮ Answer, No. ◮ Proof: Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K(s) > 2C + 10, 000 ◮ s can be computed by the following program:

  • 1. x = 0

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

slide-97
SLIDE 97

Computability of K

◮ Can we compute K(x)? ◮ Answer, No. ◮ Proof: Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K(s) > 2C + 10, 000 ◮ s can be computed by the following program:

  • 1. x = 0
  • 2. While (K(x) < 2C + 10, 000): x++

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

slide-98
SLIDE 98

Computability of K

◮ Can we compute K(x)? ◮ Answer, No. ◮ Proof: Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K(s) > 2C + 10, 000 ◮ s can be computed by the following program:

  • 1. x = 0
  • 2. While (K(x) < 2C + 10, 000): x++
  • 3. Output x

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

slide-99
SLIDE 99

Computability of K

◮ Can we compute K(x)? ◮ Answer, No. ◮ Proof: Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K(s) > 2C + 10, 000 ◮ s can be computed by the following program:

  • 1. x = 0
  • 2. While (K(x) < 2C + 10, 000): x++
  • 3. Output x

◮ Thus K(s) < C + log C + log 10, 000 + const < 2C + 10, 000

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

slide-100
SLIDE 100

Computability of K

◮ Can we compute K(x)? ◮ Answer, No. ◮ Proof: Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K(s) > 2C + 10, 000 ◮ s can be computed by the following program:

  • 1. x = 0
  • 2. While (K(x) < 2C + 10, 000): x++
  • 3. Output x

◮ Thus K(s) < C + log C + log 10, 000 + const < 2C + 10, 000 ◮ Bergg’s Paradox, revisited:

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

slide-101
SLIDE 101

Computability of K

◮ Can we compute K(x)? ◮ Answer, No. ◮ Proof: Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K(s) > 2C + 10, 000 ◮ s can be computed by the following program:

  • 1. x = 0
  • 2. While (K(x) < 2C + 10, 000): x++
  • 3. Output x

◮ Thus K(s) < C + log C + log 10, 000 + const < 2C + 10, 000 ◮ Bergg’s Paradox, revisited: ◮ s — the smallest positive number with K(s) > 10000

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

slide-102
SLIDE 102

Computability of K

◮ Can we compute K(x)? ◮ Answer, No. ◮ Proof: Assume K is computable by a program of length C ◮ Let s be the smallest positive integer s.t. K(s) > 2C + 10, 000 ◮ s can be computed by the following program:

  • 1. x = 0
  • 2. While (K(x) < 2C + 10, 000): x++
  • 3. Output x

◮ Thus K(s) < C + log C + log 10, 000 + const < 2C + 10, 000 ◮ Bergg’s Paradox, revisited: ◮ s — the smallest positive number with K(s) > 10000 ◮ This is not a paradox, since the description of s is not short.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24

slide-103
SLIDE 103

Explicit large complexity strings

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

slide-104
SLIDE 104

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

slide-105
SLIDE 105

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

slide-106
SLIDE 106

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

slide-107
SLIDE 107

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

slide-108
SLIDE 108

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

◮ For most strings K(x) > C + 1, but it cannot be proven even for a single

string

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

slide-109
SLIDE 109

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

◮ For most strings K(x) > C + 1, but it cannot be proven even for a single

string

◮ K(x) ≥ C is an example for a theorem that cannot be proven, and for

most x’s cannot be disproved.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

slide-110
SLIDE 110

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

◮ For most strings K(x) > C + 1, but it cannot be proven even for a single

string

◮ K(x) ≥ C is an example for a theorem that cannot be proven, and for

most x’s cannot be disproved.

◮ Proof: for integer C define the program TC:

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

slide-111
SLIDE 111

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

◮ For most strings K(x) > C + 1, but it cannot be proven even for a single

string

◮ K(x) ≥ C is an example for a theorem that cannot be proven, and for

most x’s cannot be disproved.

◮ Proof: for integer C define the program TC:

  • 1. y = 0

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

slide-112
SLIDE 112

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

◮ For most strings K(x) > C + 1, but it cannot be proven even for a single

string

◮ K(x) ≥ C is an example for a theorem that cannot be proven, and for

most x’s cannot be disproved.

◮ Proof: for integer C define the program TC:

  • 1. y = 0
  • 2. If y is a proof for the statement k(x) > C, output x

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

slide-113
SLIDE 113

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

◮ For most strings K(x) > C + 1, but it cannot be proven even for a single

string

◮ K(x) ≥ C is an example for a theorem that cannot be proven, and for

most x’s cannot be disproved.

◮ Proof: for integer C define the program TC:

  • 1. y = 0
  • 2. If y is a proof for the statement k(x) > C, output x
  • 3. y++

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

slide-114
SLIDE 114

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

◮ For most strings K(x) > C + 1, but it cannot be proven even for a single

string

◮ K(x) ≥ C is an example for a theorem that cannot be proven, and for

most x’s cannot be disproved.

◮ Proof: for integer C define the program TC:

  • 1. y = 0
  • 2. If y is a proof for the statement k(x) > C, output x
  • 3. y++

◮ |TC| = log C + D, where D is a const

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

slide-115
SLIDE 115

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

◮ For most strings K(x) > C + 1, but it cannot be proven even for a single

string

◮ K(x) ≥ C is an example for a theorem that cannot be proven, and for

most x’s cannot be disproved.

◮ Proof: for integer C define the program TC:

  • 1. y = 0
  • 2. If y is a proof for the statement k(x) > C, output x
  • 3. y++

◮ |TC| = log C + D, where D is a const ◮ Take C such that C > log C + D

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

slide-116
SLIDE 116

Explicit large complexity strings

◮ Can we give an explicit example of string x with large k(x)?

Theorem 4 ∃ constant C s.t. the theorem K(x) ≥ C cannot be proven (under any reasonable axiom system).

◮ For most strings K(x) > C + 1, but it cannot be proven even for a single

string

◮ K(x) ≥ C is an example for a theorem that cannot be proven, and for

most x’s cannot be disproved.

◮ Proof: for integer C define the program TC:

  • 1. y = 0
  • 2. If y is a proof for the statement k(x) > C, output x
  • 3. y++

◮ |TC| = log C + D, where D is a const ◮ Take C such that C > log C + D ◮ If TC stops and outputs x, then k(x) < log C + D < C, a contradiction to

the fact that ∃ proof that k(x) > C.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24

slide-117
SLIDE 117

Part II Other Entropy Measures

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 16 / 24

slide-118
SLIDE 118

Other entropy measures

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

slide-119
SLIDE 119

Other entropy measures

Let X ∼ p be a random variable over X.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

slide-120
SLIDE 120

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

slide-121
SLIDE 121

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)|

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

slide-122
SLIDE 122

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)}

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

slide-123
SLIDE 123

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

slide-124
SLIDE 124

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

slide-125
SLIDE 125

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

slide-126
SLIDE 126

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

◮ Collision entropy/Renyi entropy of X is H2(X) = − log CP(X)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

slide-127
SLIDE 127

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

◮ Collision entropy/Renyi entropy of X is H2(X) = − log CP(X) ◮ For α = 1 ∈ N — Hα =

1 1−α log

n

i=1 pα i

  • =

α 1−α log(pα)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

slide-128
SLIDE 128

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

◮ Collision entropy/Renyi entropy of X is H2(X) = − log CP(X) ◮ For α = 1 ∈ N — Hα =

1 1−α log

n

i=1 pα i

  • =

α 1−α log(pα)

◮ H∞(X) ≤ H2(X) ≤ H(X) ≤ H0(X) (Jensen)

Equality iff X is uniform over X

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

slide-129
SLIDE 129

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

◮ Collision entropy/Renyi entropy of X is H2(X) = − log CP(X) ◮ For α = 1 ∈ N — Hα =

1 1−α log

n

i=1 pα i

  • =

α 1−α log(pα)

◮ H∞(X) ≤ H2(X) ≤ H(X) ≤ H0(X) (Jensen)

Equality iff X is uniform over X

◮ For instance, CP(X) ≤

x p(x) maxx′ p(x′) = maxx′ p(x′).

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

slide-130
SLIDE 130

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

◮ Collision entropy/Renyi entropy of X is H2(X) = − log CP(X) ◮ For α = 1 ∈ N — Hα =

1 1−α log

n

i=1 pα i

  • =

α 1−α log(pα)

◮ H∞(X) ≤ H2(X) ≤ H(X) ≤ H0(X) (Jensen)

Equality iff X is uniform over X

◮ For instance, CP(X) ≤

x p(x) maxx′ p(x′) = maxx′ p(x′).

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

slide-131
SLIDE 131

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

◮ Collision entropy/Renyi entropy of X is H2(X) = − log CP(X) ◮ For α = 1 ∈ N — Hα =

1 1−α log

n

i=1 pα i

  • =

α 1−α log(pα)

◮ H∞(X) ≤ H2(X) ≤ H(X) ≤ H0(X) (Jensen)

Equality iff X is uniform over X

◮ For instance, CP(X) ≤

x p(x) maxx′ p(x′) = maxx′ p(x′). Hence,

H2(X) ≥ − log maxx′ p(x′) = H∞(X).

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

slide-132
SLIDE 132

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

◮ Collision entropy/Renyi entropy of X is H2(X) = − log CP(X) ◮ For α = 1 ∈ N — Hα =

1 1−α log

n

i=1 pα i

  • =

α 1−α log(pα)

◮ H∞(X) ≤ H2(X) ≤ H(X) ≤ H0(X) (Jensen)

Equality iff X is uniform over X

◮ For instance, CP(X) ≤

x p(x) maxx′ p(x′) = maxx′ p(x′). Hence,

H2(X) ≥ − log maxx′ p(x′) = H∞(X).

◮ H2(X) ≤ 2 H∞(X)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

slide-133
SLIDE 133

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

◮ Collision entropy/Renyi entropy of X is H2(X) = − log CP(X) ◮ For α = 1 ∈ N — Hα =

1 1−α log

n

i=1 pα i

  • =

α 1−α log(pα)

◮ H∞(X) ≤ H2(X) ≤ H(X) ≤ H0(X) (Jensen)

Equality iff X is uniform over X

◮ For instance, CP(X) ≤

x p(x) maxx′ p(x′) = maxx′ p(x′). Hence,

H2(X) ≥ − log maxx′ p(x′) = H∞(X).

◮ H2(X) ≤ 2 H∞(X) ◮ Proof: CP(X) ≥ (maxx′ p(x′))2.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

slide-134
SLIDE 134

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

◮ Collision entropy/Renyi entropy of X is H2(X) = − log CP(X) ◮ For α = 1 ∈ N — Hα =

1 1−α log

n

i=1 pα i

  • =

α 1−α log(pα)

◮ H∞(X) ≤ H2(X) ≤ H(X) ≤ H0(X) (Jensen)

Equality iff X is uniform over X

◮ For instance, CP(X) ≤

x p(x) maxx′ p(x′) = maxx′ p(x′). Hence,

H2(X) ≥ − log maxx′ p(x′) = H∞(X).

◮ H2(X) ≤ 2 H∞(X) ◮ Proof: CP(X) ≥ (maxx′ p(x′))2.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

slide-135
SLIDE 135

Other entropy measures

Let X ∼ p be a random variable over X.

◮ Recall that Shannon entropy of X is

H(X) =

x∈X −p(x) · log p(x) = EX [− log p(X)]

◮ Max entropy of X is H0(X) = log |Supp(X)| ◮ Min entropy of X is H∞(X) = minx∈X {− log p(x)} = − log maxx∈X {p(x)} ◮ Collision probability of X is CP(X) =

x∈X p(x)2

Probability of collision when drawing two independent samples from X

◮ Collision entropy/Renyi entropy of X is H2(X) = − log CP(X) ◮ For α = 1 ∈ N — Hα =

1 1−α log

n

i=1 pα i

  • =

α 1−α log(pα)

◮ H∞(X) ≤ H2(X) ≤ H(X) ≤ H0(X) (Jensen)

Equality iff X is uniform over X

◮ For instance, CP(X) ≤

x p(x) maxx′ p(x′) = maxx′ p(x′). Hence,

H2(X) ≥ − log maxx′ p(x′) = H∞(X).

◮ H2(X) ≤ 2 H∞(X) ◮ Proof: CP(X) ≥ (maxx′ p(x′))2. Hence, − log CP(X) ≤ −2 H∞(X)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 17 / 24

slide-136
SLIDE 136

Other entropy measures, cont

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 18 / 24

slide-137
SLIDE 137

Other entropy measures, cont

◮ No simple chain rule.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 18 / 24

slide-138
SLIDE 138

Other entropy measures, cont

◮ No simple chain rule. ◮ Let X =⊥ wp 1

2 and uniform over {0, 1}n otherwise, and let Y be

indicator for X =⊥.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 18 / 24

slide-139
SLIDE 139

Other entropy measures, cont

◮ No simple chain rule. ◮ Let X =⊥ wp 1

2 and uniform over {0, 1}n otherwise, and let Y be

indicator for X =⊥.

◮ H∞(X|Y = 1) = 0 and H∞(X|Y = 0) = n. But H∞(X) = 1.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 18 / 24

slide-140
SLIDE 140

Section 1 Shannon to Min entropy

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 19 / 24

slide-141
SLIDE 141

Shannon to Min entropy

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

slide-142
SLIDE 142

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

slide-143
SLIDE 143

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

slide-144
SLIDE 144

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof:

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

slide-145
SLIDE 145

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof: (quantitative) AEP .

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

slide-146
SLIDE 146

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof: (quantitative) AEP .

◮ An,ε := {x ∈ Supp(X n): 2−n(H(X)+ε) ≤ pn(x) ≤ 2−n(H(X)−ε)}

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

slide-147
SLIDE 147

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof: (quantitative) AEP .

◮ An,ε := {x ∈ Supp(X n): 2−n(H(X)+ε) ≤ pn(x) ≤ 2−n(H(X)−ε)} ◮ − log pn(x) ≥ n · (H(X) − ε) for any x ∈ An,ε

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

slide-148
SLIDE 148

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof: (quantitative) AEP .

◮ An,ε := {x ∈ Supp(X n): 2−n(H(X)+ε) ≤ pn(x) ≤ 2−n(H(X)−ε)} ◮ − log pn(x) ≥ n · (H(X) − ε) for any x ∈ An,ε

Proposition 6 (Hoeffding’s inequality) Let Z 1, . . . , Z n be iids over [0, 1] with expectation µ. Then, Pr

  • |

n

j=i Z j

n

− µ| ≥ ε

  • ≤ 2 · e−2ε2n for every ε > 0.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

slide-149
SLIDE 149

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof: (quantitative) AEP .

◮ An,ε := {x ∈ Supp(X n): 2−n(H(X)+ε) ≤ pn(x) ≤ 2−n(H(X)−ε)} ◮ − log pn(x) ≥ n · (H(X) − ε) for any x ∈ An,ε

Proposition 6 (Hoeffding’s inequality) Let Z 1, . . . , Z n be iids over [0, 1] with expectation µ. Then, Pr

  • |

n

j=i Z j

n

− µ| ≥ ε

  • ≤ 2 · e−2ε2n for every ε > 0.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

slide-150
SLIDE 150

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof: (quantitative) AEP .

◮ An,ε := {x ∈ Supp(X n): 2−n(H(X)+ε) ≤ pn(x) ≤ 2−n(H(X)−ε)} ◮ − log pn(x) ≥ n · (H(X) − ε) for any x ∈ An,ε

Proposition 6 (Hoeffding’s inequality) Let Z 1, . . . , Z n be iids over [0, 1] with expectation µ. Then, Pr

  • |

n

j=i Z j

n

− µ| ≥ ε

  • ≤ 2 · e−2ε2n for every ε > 0.

◮ Taking Zi = log p(Xi), it follows that Pr [X n /

∈ An,ε] ≤ 2 · e−2ε2n

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

slide-151
SLIDE 151

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof: (quantitative) AEP .

◮ An,ε := {x ∈ Supp(X n): 2−n(H(X)+ε) ≤ pn(x) ≤ 2−n(H(X)−ε)} ◮ − log pn(x) ≥ n · (H(X) − ε) for any x ∈ An,ε

Proposition 6 (Hoeffding’s inequality) Let Z 1, . . . , Z n be iids over [0, 1] with expectation µ. Then, Pr

  • |

n

j=i Z j

n

− µ| ≥ ε

  • ≤ 2 · e−2ε2n for every ε > 0.

◮ Taking Zi = log p(Xi), it follows that Pr [X n /

∈ An,ε] ≤ 2 · e−2ε2n

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

slide-152
SLIDE 152

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof: (quantitative) AEP .

◮ An,ε := {x ∈ Supp(X n): 2−n(H(X)+ε) ≤ pn(x) ≤ 2−n(H(X)−ε)} ◮ − log pn(x) ≥ n · (H(X) − ε) for any x ∈ An,ε

Proposition 6 (Hoeffding’s inequality) Let Z 1, . . . , Z n be iids over [0, 1] with expectation µ. Then, Pr

  • |

n

j=i Z j

n

− µ| ≥ ε

  • ≤ 2 · e−2ε2n for every ε > 0.

◮ Taking Zi = log p(Xi), it follows that Pr [X n /

∈ An,ε] ≤ 2 · e−2ε2n Corollary 7 ∃ rv W that is (2 · e−2ε2n)-close to X n, and H∞(W) ≥ n(H(X) − ε).

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

slide-153
SLIDE 153

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof: (quantitative) AEP .

◮ An,ε := {x ∈ Supp(X n): 2−n(H(X)+ε) ≤ pn(x) ≤ 2−n(H(X)−ε)} ◮ − log pn(x) ≥ n · (H(X) − ε) for any x ∈ An,ε

Proposition 6 (Hoeffding’s inequality) Let Z 1, . . . , Z n be iids over [0, 1] with expectation µ. Then, Pr

  • |

n

j=i Z j

n

− µ| ≥ ε

  • ≤ 2 · e−2ε2n for every ε > 0.

◮ Taking Zi = log p(Xi), it follows that Pr [X n /

∈ An,ε] ≤ 2 · e−2ε2n Corollary 7 ∃ rv W that is (2 · e−2ε2n)-close to X n, and H∞(W) ≥ n(H(X) − ε). Proof:

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

slide-154
SLIDE 154

Shannon to Min entropy

Given rv X ∼ p, let X n denote n independent copies of X, and let pn(x1 . . . , xn) = n

i=1 p(xi).

Lemma 5 Let X ∼ p and let ε > 0. Then Pr [− log pn(X n) ≤ n · (H(X) − ε)] < 2 · e−2ε2n. Proof: (quantitative) AEP .

◮ An,ε := {x ∈ Supp(X n): 2−n(H(X)+ε) ≤ pn(x) ≤ 2−n(H(X)−ε)} ◮ − log pn(x) ≥ n · (H(X) − ε) for any x ∈ An,ε

Proposition 6 (Hoeffding’s inequality) Let Z 1, . . . , Z n be iids over [0, 1] with expectation µ. Then, Pr

  • |

n

j=i Z j

n

− µ| ≥ ε

  • ≤ 2 · e−2ε2n for every ε > 0.

◮ Taking Zi = log p(Xi), it follows that Pr [X n /

∈ An,ε] ≤ 2 · e−2ε2n Corollary 7 ∃ rv W that is (2 · e−2ε2n)-close to X n, and H∞(W) ≥ n(H(X) − ε). Proof: W = X n if X n ∈ An,ε, and “well spread” outside Supp(X n) otherwise.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 20 / 24

slide-155
SLIDE 155

Shannon to Min entropy, conditional version

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 21 / 24

slide-156
SLIDE 156

Shannon to Min entropy, conditional version

Lemma 8 Let (X, Y) ∼ p let ε > 0. Then Pr(xn,yn)←(X,Y)n

  • − log pn

X n|Y n(xn|yn) ≤ n · (H(X|Y) − ε)

  • < 2 · e−2ε2n.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 21 / 24

slide-157
SLIDE 157

Shannon to Min entropy, conditional version

Lemma 8 Let (X, Y) ∼ p let ε > 0. Then Pr(xn,yn)←(X,Y)n

  • − log pn

X n|Y n(xn|yn) ≤ n · (H(X|Y) − ε)

  • < 2 · e−2ε2n.

Proof:

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 21 / 24

slide-158
SLIDE 158

Shannon to Min entropy, conditional version

Lemma 8 Let (X, Y) ∼ p let ε > 0. Then Pr(xn,yn)←(X,Y)n

  • − log pn

X n|Y n(xn|yn) ≤ n · (H(X|Y) − ε)

  • < 2 · e−2ε2n.

Proof: same proof, letting Zi = log pX|Y(Xi, Yi)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 21 / 24

slide-159
SLIDE 159

Shannon to Min entropy, conditional version

Lemma 8 Let (X, Y) ∼ p let ε > 0. Then Pr(xn,yn)←(X,Y)n

  • − log pn

X n|Y n(xn|yn) ≤ n · (H(X|Y) − ε)

  • < 2 · e−2ε2n.

Proof: same proof, letting Zi = log pX|Y(Xi, Yi) Corollary 9 ∃ rv W over X n × Yn that is (2 · e−2ε2n)-far from (X, Y)n,

◮ SD(WYn, Y n) = 0, and

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 21 / 24

slide-160
SLIDE 160

Shannon to Min entropy, conditional version

Lemma 8 Let (X, Y) ∼ p let ε > 0. Then Pr(xn,yn)←(X,Y)n

  • − log pn

X n|Y n(xn|yn) ≤ n · (H(X|Y) − ε)

  • < 2 · e−2ε2n.

Proof: same proof, letting Zi = log pX|Y(Xi, Yi) Corollary 9 ∃ rv W over X n × Yn that is (2 · e−2ε2n)-far from (X, Y)n,

◮ SD(WYn, Y n) = 0, and ◮ H(W | WYn = y) ≥ n · (H(X|Y) − ε), for any y ∈ Supp(Y n)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 21 / 24

slide-161
SLIDE 161

Shannon to Min entropy, conditional version

Lemma 8 Let (X, Y) ∼ p let ε > 0. Then Pr(xn,yn)←(X,Y)n

  • − log pn

X n|Y n(xn|yn) ≤ n · (H(X|Y) − ε)

  • < 2 · e−2ε2n.

Proof: same proof, letting Zi = log pX|Y(Xi, Yi) Corollary 9 ∃ rv W over X n × Yn that is (2 · e−2ε2n)-far from (X, Y)n,

◮ SD(WYn, Y n) = 0, and ◮ H(W | WYn = y) ≥ n · (H(X|Y) − ε), for any y ∈ Supp(Y n)

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 21 / 24

slide-162
SLIDE 162

Shannon to Min entropy, conditional version

Lemma 8 Let (X, Y) ∼ p let ε > 0. Then Pr(xn,yn)←(X,Y)n

  • − log pn

X n|Y n(xn|yn) ≤ n · (H(X|Y) − ε)

  • < 2 · e−2ε2n.

Proof: same proof, letting Zi = log pX|Y(Xi, Yi) Corollary 9 ∃ rv W over X n × Yn that is (2 · e−2ε2n)-far from (X, Y)n,

◮ SD(WYn, Y n) = 0, and ◮ H(W | WYn = y) ≥ n · (H(X|Y) − ε), for any y ∈ Supp(Y n)

Proof: ?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 21 / 24

slide-163
SLIDE 163

Section 2 Renyi-entropy to Uniform Distribution

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 22 / 24

slide-164
SLIDE 164

Pairwise independent hashing

Definition 10 (pairwise independent function family) A function family G = {g : D → R} is pairwise independent, if ∀ x = x′ ∈ D and y, y′ ∈ R, it holds that Prg←G [g(x) = y ∧ g(x′) = y′)] = ( 1

|R|)2.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 23 / 24

slide-165
SLIDE 165

Pairwise independent hashing

Definition 10 (pairwise independent function family) A function family G = {g : D → R} is pairwise independent, if ∀ x = x′ ∈ D and y, y′ ∈ R, it holds that Prg←G [g(x) = y ∧ g(x′) = y′)] = ( 1

|R|)2.

◮ Example: for D = {0, 1}n and R = {0, 1}m let

G = {(A, b) ∈ {0, 1}m×n × {0, 1}m} with (A, b)(x) = A × x + b.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 23 / 24

slide-166
SLIDE 166

Pairwise independent hashing

Definition 10 (pairwise independent function family) A function family G = {g : D → R} is pairwise independent, if ∀ x = x′ ∈ D and y, y′ ∈ R, it holds that Prg←G [g(x) = y ∧ g(x′) = y′)] = ( 1

|R|)2.

◮ Example: for D = {0, 1}n and R = {0, 1}m let

G = {(A, b) ∈ {0, 1}m×n × {0, 1}m} with (A, b)(x) = A × x + b.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 23 / 24

slide-167
SLIDE 167

Pairwise independent hashing

Definition 10 (pairwise independent function family) A function family G = {g : D → R} is pairwise independent, if ∀ x = x′ ∈ D and y, y′ ∈ R, it holds that Prg←G [g(x) = y ∧ g(x′) = y′)] = ( 1

|R|)2.

◮ Example: for D = {0, 1}n and R = {0, 1}m let

G = {(A, b) ∈ {0, 1}m×n × {0, 1}m} with (A, b)(x) = A × x + b.

◮ 2-universal families: Prg←G [g(x) = g(x′))] =

1 |R|.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 23 / 24

slide-168
SLIDE 168

Pairwise independent hashing

Definition 10 (pairwise independent function family) A function family G = {g : D → R} is pairwise independent, if ∀ x = x′ ∈ D and y, y′ ∈ R, it holds that Prg←G [g(x) = y ∧ g(x′) = y′)] = ( 1

|R|)2.

◮ Example: for D = {0, 1}n and R = {0, 1}m let

G = {(A, b) ∈ {0, 1}m×n × {0, 1}m} with (A, b)(x) = A × x + b.

◮ 2-universal families: Prg←G [g(x) = g(x′))] =

1 |R|.

◮ Example for universal family that is not pairwise independent?

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 23 / 24

slide-169
SLIDE 169

Pairwise independent hashing

Definition 10 (pairwise independent function family) A function family G = {g : D → R} is pairwise independent, if ∀ x = x′ ∈ D and y, y′ ∈ R, it holds that Prg←G [g(x) = y ∧ g(x′) = y′)] = ( 1

|R|)2.

◮ Example: for D = {0, 1}n and R = {0, 1}m let

G = {(A, b) ∈ {0, 1}m×n × {0, 1}m} with (A, b)(x) = A × x + b.

◮ 2-universal families: Prg←G [g(x) = g(x′))] =

1 |R|.

◮ Example for universal family that is not pairwise independent? ◮ Many-wise independent

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 23 / 24

slide-170
SLIDE 170

Pairwise independent hashing

Definition 10 (pairwise independent function family) A function family G = {g : D → R} is pairwise independent, if ∀ x = x′ ∈ D and y, y′ ∈ R, it holds that Prg←G [g(x) = y ∧ g(x′) = y′)] = ( 1

|R|)2.

◮ Example: for D = {0, 1}n and R = {0, 1}m let

G = {(A, b) ∈ {0, 1}m×n × {0, 1}m} with (A, b)(x) = A × x + b.

◮ 2-universal families: Prg←G [g(x) = g(x′))] =

1 |R|.

◮ Example for universal family that is not pairwise independent? ◮ Many-wise independent ◮ We identify functions with their description.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 23 / 24

slide-171
SLIDE 171

Pairwise independent hashing

Definition 10 (pairwise independent function family) A function family G = {g : D → R} is pairwise independent, if ∀ x = x′ ∈ D and y, y′ ∈ R, it holds that Prg←G [g(x) = y ∧ g(x′) = y′)] = ( 1

|R|)2.

◮ Example: for D = {0, 1}n and R = {0, 1}m let

G = {(A, b) ∈ {0, 1}m×n × {0, 1}m} with (A, b)(x) = A × x + b.

◮ 2-universal families: Prg←G [g(x) = g(x′))] =

1 |R|.

◮ Example for universal family that is not pairwise independent? ◮ Many-wise independent ◮ We identify functions with their description. ◮ Amazingly useful tool

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 23 / 24

slide-172
SLIDE 172

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

slide-173
SLIDE 173

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

slide-174
SLIDE 174

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

slide-175
SLIDE 175

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Proof: Let q be the uniform distribution over U.

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

slide-176
SLIDE 176

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Proof: Let q be the uniform distribution over U.

◮ p − q2

2 = u∈U(d(u)−q(u))2

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

slide-177
SLIDE 177

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Proof: Let q be the uniform distribution over U.

◮ p − q2

2 = u∈U(d(u)−q(u))2

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

slide-178
SLIDE 178

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Proof: Let q be the uniform distribution over U.

◮ p − q2

2 = u∈U(d(u)−q(u))2 = p2 2+q2 2−2p, q

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

slide-179
SLIDE 179

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Proof: Let q be the uniform distribution over U.

◮ p − q2

2 = u∈U(d(u)−q(u))2 = p2 2+q2 2−2p, q = CP(p)− 1 |U| ≤ δ |U|

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

slide-180
SLIDE 180

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Proof: Let q be the uniform distribution over U.

◮ p − q2

2 = u∈U(d(u)−q(u))2 = p2 2+q2 2−2p, q = CP(p)− 1 |U| ≤ δ |U|

◮ Chebyshev Sum Inequality: (n

i=1 ai)2 ≤ n n i=1 a2 i

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

slide-181
SLIDE 181

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Proof: Let q be the uniform distribution over U.

◮ p − q2

2 = u∈U(d(u)−q(u))2 = p2 2+q2 2−2p, q = CP(p)− 1 |U| ≤ δ |U|

◮ Chebyshev Sum Inequality: (n

i=1 ai)2 ≤ n n i=1 a2 i

◮ Hence, p − q2

1 ≤ |U| · p − q2 2

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

slide-182
SLIDE 182

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Proof: Let q be the uniform distribution over U.

◮ p − q2

2 = u∈U(d(u)−q(u))2 = p2 2+q2 2−2p, q = CP(p)− 1 |U| ≤ δ |U|

◮ Chebyshev Sum Inequality: (n

i=1 ai)2 ≤ n n i=1 a2 i

◮ Hence, p − q2

1 ≤ |U| · p − q2 2

◮ Thus, SD(p, q) = 1

2 p − q1 ≤ √ δ 2 .

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

slide-183
SLIDE 183

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Proof: Let q be the uniform distribution over U.

◮ p − q2

2 = u∈U(d(u)−q(u))2 = p2 2+q2 2−2p, q = CP(p)− 1 |U| ≤ δ |U|

◮ Chebyshev Sum Inequality: (n

i=1 ai)2 ≤ n n i=1 a2 i

◮ Hence, p − q2

1 ≤ |U| · p − q2 2

◮ Thus, SD(p, q) = 1

2 p − q1 ≤ √ δ 2 .

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24

slide-184
SLIDE 184

Leftover hash lemma

Lemma 11 (leftover hash lemma) Let X be a rv over {0, 1}n with H2(X) ≥ k let G = {g : {0, 1}n → {0, 1}m} be 2-universal and let G ← G. Then SD((G, G(X)), (G, ∼ {0, 1}m)) ≤ 1

2 · 2(m−k))/2.

Extraction. Lemma 12 Let p be a distribution over U with CP(p) ≤ 1+δ

|U| , then SD(p, ∼ U) ≤ √ δ 2 .

Proof: Let q be the uniform distribution over U.

◮ p − q2

2 = u∈U(d(u)−q(u))2 = p2 2+q2 2−2p, q = CP(p)− 1 |U| ≤ δ |U|

◮ Chebyshev Sum Inequality: (n

i=1 ai)2 ≤ n n i=1 a2 i

◮ Hence, p − q2

1 ≤ |U| · p − q2 2

◮ Thus, SD(p, q) = 1

2 p − q1 ≤ √ δ 2 .

To deuce the proof of Lemma 11, we notice that CP(G, G(X)) ≤

1 |G| · (2−k + 2−m) = 1+2m−k |G×{0,1}m|

Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 24 / 24