Formal Modeling in Cognitive Science Source Codes Lecture 30: - - PowerPoint PPT Presentation

▶

Jun 06, 2023 329 likes •397 views

Codes Codes Coding Theorems Coding Theorems 1 Codes Formal Modeling in Cognitive Science Source Codes Lecture 30: Codes; Kraft Inequality; Source Coding Theorem Properties of Codes Frank Keller 2 Coding Theorems Kraft Inequality School of

SLIDE 1

Codes Coding Theorems

Formal Modeling in Cognitive Science

Lecture 30: Codes; Kraft Inequality; Source Coding Theorem Frank Keller

School of Informatics University of Edinburgh keller@inf.ed.ac.uk

March 16, 2005

Frank Keller Formal Modeling in Cognitive Science 1 Codes Coding Theorems

1 Codes

Source Codes Properties of Codes

2 Coding Theorems

Kraft Inequality Shannon Information Source Coding Theorem

Frank Keller Formal Modeling in Cognitive Science 2 Codes Coding Theorems Source Codes Properties of Codes

Source Codes

Definition: Source Code A source code C for a random variable X is a mapping from x ∈ X to {0, 1}∗. Let C(x) denote the code word for x and l(x) denote the length of C(x). Here, {0, 1}∗ is the set of all finite binary strings (we will only consider binary codes). Definition: Expected Length The expected length L(C) of a source code C(x) for a random variable with the probability distribution f (x) is: L(C) =

x∈X

f (x)l(x)

Frank Keller Formal Modeling in Cognitive Science 3 Codes Coding Theorems Source Codes Properties of Codes

Source Codes

Example Let X be a random variable with the following distribution and code word assignment: x a b c d f (x)

1 2 1 4 1 8 1 8

C(x) 10 110 111 The expected code length of X is: L(C) =

x∈X

f (x)l(x) = 1 2 · 1 + 1 4 · 2 + 1 8 · 3 + 1 8 · 3 = 1.75

Frank Keller Formal Modeling in Cognitive Science 4

SLIDE 2

Codes Coding Theorems Source Codes Properties of Codes

Properties of Codes

Definition: Non-singular Code A code is called non-singular if every x ∈ X maps into a different string in {0, 1}∗. If a code is non-singular, then we can transmit a value of X unambiguously. However, what happens if we want to transmit several values

f X in a row?

We could use a special symbol to separate the code words. However, this is not an efficient use of the special symbol; instead use self-punctuating codes (prefix codes).

Frank Keller Formal Modeling in Cognitive Science 5 Codes Coding Theorems Source Codes Properties of Codes

Properties of Codes

Definition: Extension The extension C ∗ of a code C is: C ∗(x1x2 . . . xn) = C(x1)C(x2) . . . C(xn) where C(x1)C(x2) . . . C(xn) indicates the concatenation of the corresponding code words. Definition: Uniquely Decodable A code is called uniquely decodable if its extension is non-singular.

If the code is uniquely decodable, then for each string there is only

ne source string that produced it;

However, we have to look at the whole string to do the decoding.

Frank Keller Formal Modeling in Cognitive Science 6 Codes Coding Theorems Source Codes Properties of Codes

Properties of Codes

Definition: Prefix Code A code is called a prefix code (instantaneous code) if no code word is a prefix of another code word. We don’t have to wait for the whole string to be able to decode it; the end of a code word can be recognized instantaneously. Example The code in the previous example is a prefix code. Take the following sequence: 01011111010. The first symbol, 0, tells us we have an a; the next two symbols 10, have to correspond to b; the next three symbols have to correspond to a d, etc. The decoded sequence is: abdcb.

Frank Keller Formal Modeling in Cognitive Science 7 Codes Coding Theorems Source Codes Properties of Codes

Properties of Codes

Example The following table illustrates the different classes of codes: Non-singular, not

Uniq. decodable,

x Singular

uniq. decodable

not instant. Instant. a 10 b 010 00 10 c 01 11 110 d 10 110 111

Frank Keller Formal Modeling in Cognitive Science 8

SLIDE 3

Codes Coding Theorems Kraft Inequality Shannon Information Source Coding Theorem

Kraft Inequality

Problem: construct an instantaneous code of minimum expected length for a given random variable. The following inequality holds: Theorem: Kraft Inequality For an instantaneous code C for a random variable X, the code word lengths l(x) must satisfy the inequality:

x∈X

2−l(x) ≤ 1 Conversely, if the code word lengths satisfy this inequality, then there exists an instantaneous code with these word lengths.

Frank Keller Formal Modeling in Cognitive Science 9 Codes Coding Theorems Kraft Inequality Shannon Information Source Coding Theorem

Kraft Inequality

We can illustrate the Kraft Inequality using a coding tree. Start with a tree that contains all three-bit codes:

✟✟✟✟✟ ✟ ❍ ❍ ❍ ❍ ❍ ❍ ✟✟ ✟ ❍ ❍ ❍

00

✟ ✟ ❍ ❍

000 001 01

✟ ✟ ❍ ❍

010 011 1

✟✟ ✟ ❍ ❍ ❍

10

✟ ✟ ❍ ❍

100 101 11

✟ ✟ ❍ ❍

110 111

Frank Keller Formal Modeling in Cognitive Science 10 Codes Coding Theorems Kraft Inequality Shannon Information Source Coding Theorem

Kraft Inequality

For each code word, prune all the branches below it (as they violate the prefix condition). For example, if we decide to use the code word 0, we prune all the red branches:

✟✟✟✟✟ ✟ ❍ ❍ ❍ ❍ ❍ ❍ ✟✟✟ ❍ ❍ ❍

00

✟ ✟ ❍ ❍

000 001 01

✟ ✟ ❍ ❍

010 011 1

✟✟ ✟ ❍ ❍ ❍

10

✟ ✟ ❍ ❍

100 101 11

✟ ✟ ❍ ❍

110 111

Frank Keller Formal Modeling in Cognitive Science 11 Codes Coding Theorems Kraft Inequality Shannon Information Source Coding Theorem

Kraft Inequality

Now if we decide to use the code word 10:

✟✟✟✟✟✟ ❍ ❍ ❍ ❍ ❍ ❍ ✟✟✟ ❍ ❍ ❍

00

✟ ✟ ❍ ❍

000 001 01

✟ ✟ ❍ ❍

010 011 1

✟✟✟ ❍ ❍ ❍

10

✟ ✟ ❍ ❍

100 101 11

✟ ✟ ❍ ❍

110 111 The remaining leaves constitute a prefix code. Kraft inequality:

x∈X

2−l(x) = 2−1 + 2−2 + 2−3 + 2−3 = 1 2 + 1 4 + 1 8 + 1 8 = 1

Frank Keller Formal Modeling in Cognitive Science 12

SLIDE 4

Codes Coding Theorems Kraft Inequality Shannon Information Source Coding Theorem

Shannon Information

The Kraft inequality tells us that an instantaneous code exists. But we are interested in finding the optimal code, i.e., one that minimized the expected code length L(C). Theorem: Shannon Information The expected length L(C) of a code C for the random variable X with distribution f (x) is minimal if the code word lengths l(x) are given by: l(x) = − log f (x) This quantity is called the Shannon information. Shannon information is pointwise entropy. (See mutual information and pointwise mutual information.)

Frank Keller Formal Modeling in Cognitive Science 13 Codes Coding Theorems Kraft Inequality Shannon Information Source Coding Theorem

Shannon Information

Example Consider the following random variable with the optimal code lengths given by the Shannon information: x a b c d f (x)

1 2 1 4 1 8 1 8

l(x) 1 2 3 3 The expected code length L(C) for the optimal code is: L(C) =

x∈X

f (x)l(x) = −

x∈X

f (x) log f (x) = 1.75 Note that this is the same as the entropy of X, H(X).

Frank Keller Formal Modeling in Cognitive Science 14 Codes Coding Theorems Kraft Inequality Shannon Information Source Coding Theorem

Lower Bound on Expected Length

This observation about the relation between the entropy and the expected length of the optimal code can be generalized: Theorem: Lower Bound on Expected Length Let C be an instantaneous code for the random variable X. Then the expected code length L(C) is bounded by: L(C) ≥ H(X)

Frank Keller Formal Modeling in Cognitive Science 15 Codes Coding Theorems Kraft Inequality Shannon Information Source Coding Theorem

Upper Bound on Expected Length

Of course we are more interested in finding an upper bound, i.e., a code that has a maximum expected length: Theorem: Source Coding Theorem Let C a code with optimal code lengths, i.e, l(x) = − log f (x) for the random variable X with distribution f (x). Then the expected length L(C) is bounded by: H(X) ≤ L(C) < H(X) + 1 Why is the upper bound H(X) + 1 and not H(X)? Because sometimes the Shannon information gives us fractional lengths; we have to round up.

Frank Keller Formal Modeling in Cognitive Science 16

SLIDE 5

Codes Coding Theorems Kraft Inequality Shannon Information Source Coding Theorem

Source Coding Theorem

Example Consider the following random variable with the optimal code lengths given by the Shannon information: x a b c d e f (x) 0.25 0.25 0.2 0.15 0.15 l(x) 2.0 2.0 2.3 2.7 2.7 The entropy of this random variable is H(X) = 2.2855. The source coding theorem tells us: 2.2855 ≤ L(C) < 3.2855 where L(C) is the code length of the optimal code.

Frank Keller Formal Modeling in Cognitive Science 17 Codes Coding Theorems Kraft Inequality Shannon Information Source Coding Theorem

Source Coding Theorem

Example Now consider the following code that tries to the code words on the optimal code lengths as closely as possible: x a b c d e C(x) 00 10 11 010 011 l(x) 2 2 2 3 3 The expected code length for this code is therefore L(C) = 2.30. This is very close to the optimal code length of H(X) = 2.2855.

Frank Keller Formal Modeling in Cognitive Science 18 Codes Coding Theorems Kraft Inequality Shannon Information Source Coding Theorem

Finding the Optimal Code

The source coding theorem tells us the properties of the optimal code, but not how do we find it. A number of algorithms exist: Huffman coding: simple algorithm that finds a theoretically

ptimal code for a random variable (upper bound H(X) + 1);

key idea: construct coding tree in reverse, starting with the two least probable values of X; Shannon-Fano coding: construct a code based on the cumulative distribution F(x); faster algorithm, but less

ptimal code (upper bound H(X) + 2).

arithmetic coding: first estimate the probability distribution of the source, then compute an optimal code.

Frank Keller Formal Modeling in Cognitive Science 19 Codes Coding Theorems Kraft Inequality Shannon Information Source Coding Theorem

Summary

A code is uniquely decodable if there is only one possible source sequence for every code sequence; a code is instantaneous if each code word has a unique prefix; the optimal length of a code word is given by its Shannon information: − log f (x); source coding theorem: the expected length of the optimal code is bounded by entropy: H(X) ≤ L(C) < H(X) + 1. algorithms exist for finding the optimal code for a given random variable and its probability distribution.

Frank Keller Formal Modeling in Cognitive Science 20