A Quantitative Measure of Relevance Based on Kelly Gambling Theory - - PowerPoint PPT Presentation

a quantitative measure of relevance based on kelly
SMART_READER_LITE
LIVE PREVIEW

A Quantitative Measure of Relevance Based on Kelly Gambling Theory - - PowerPoint PPT Presentation

A Quantitative Measure of Relevance Based on Kelly Gambling Theory Mathias Winther Madsen Institute for Logic, Language, and Computation University of Amsterdam PLAN Why? How? Examples Why? Why? How? Why not use Shannon


slide-1
SLIDE 1

Mathias Winther Madsen Institute for Logic, Language, and Computation University of Amsterdam

A Quantitative Measure of Relevance Based on Kelly Gambling Theory

slide-2
SLIDE 2
  • Why?
  • How?
  • Examples

PLAN

slide-3
SLIDE 3

Why?

slide-4
SLIDE 4

Why?

slide-5
SLIDE 5

How?

slide-6
SLIDE 6

Why not use Shannon information?

Claude Shannon (1916 – 2001) H(X) == E log —————— 1 Pr(X == x)

slide-7
SLIDE 7

Why not use Shannon information?

Information === Prior — Posterior Content Uncertainty Uncertainty

(cf. Klir 2008; Shannon 1948)

slide-8
SLIDE 8

Why not use Shannon information?

Pr(X == 1) == 0.15 Pr(X == 2) == 0.19 Pr(X == 3) == 0.23 Pr(X == 4) == 0.21 Pr(X == 5) == 0.22 What is the value of X? H(X) == E log —————— == 2.31 1 Pr(X == x)

slide-9
SLIDE 9

Why not use Shannon information?

Pr(X == 1) == 0.15 Pr(X == 2) == 0.19 Pr(X == 3) == 0.23 Pr(X == 4) == 0.21 Pr(X == 5) == 0.22 Is X == 2? Is X == 3? Is X == 5? Is X in {4,5}? 1 1 1 1 == 2.34 Expected number

  • f questions:
slide-10
SLIDE 10

What color are my socks? H(p) == – ∑ p log p == 6.53 bits of entropy.

slide-11
SLIDE 11

How?

slide-12
SLIDE 12

Value-of-

= = =

Posterior — Prior Information Expectation Expectation

Why not use value-of-information?

$ ! ? !

$$

$$$

slide-13
SLIDE 13

Why not use value-of-information?

Rules:

  • Your capital can be

distributed freely

  • Bets on the actual outcome

are returned twofold

  • Bets on all other outcomes

are lost

slide-14
SLIDE 14

Why not use value-of-information?

(Everything

  • n Tails)

(Everything

  • n Heads)

Expected payoff Optimal Strategy: Degenerate Gambling

slide-15
SLIDE 15

Why not use value-of-information?

Rounds Rate of return (R) Probability Capital

slide-16
SLIDE 16

Why not use value-of-information?

Rate of return: Long-run behavior: E[ R1 · R2 · R3 · · · Rn ] Ri == Capital at time i + 1 Capital at time i

Rate of return (R) Probability

slide-17
SLIDE 17

Why not use value-of-information?

Rate of return: Long-run behavior: E[ R1 · R2 · R3 · · · Rn ] Converges to 0 in probability as n → ∞ Ri == Capital at time i + 1 Capital at time i

Rate of return (R) Probability

slide-18
SLIDE 18

Optimal reinvestment

Daniel Bernoulli (1700 – 1782) John Larry Kelly, Jr. (1923 – 1965)

slide-19
SLIDE 19

Optimal reinvestment

Doubling rate: (so R = 2W) Wi == log Capital at time i + 1 Capital at time i

slide-20
SLIDE 20

Optimal reinvestment

Doubling rate: (so R = 2W) Wi == log Capital at time i + 1 Capital at time i Long-run behavior: E[ R1 · R2 · R3 · · · Rn ] == E[2W1 + W2 + W3 + · · · + Wn] == 2E[W1 + W2 + W3 + · · · + Wn] → 2nE[W] for n → ∞

slide-21
SLIDE 21

Optimal reinvestment

Logarithmic expectation E[W] == ∑ p log bo is maximized by propor- tional gambling (b* == p). Arithmetic expectation E[R] == ∑ pbo is maximized by degenerate gambling

slide-22
SLIDE 22

Amount of relevant information === Posterior expected doubling rate — Prior expected doubling rate

$ ! ? !

$$

$$$

Measuring relevant information

slide-23
SLIDE 23

Measuring relevant information

Definition (Relevant Information): For an agent with utility function u, the amount of relevant information contained in the message Y == y is K(y) == ∑ maxs ∑ Pr(x | y) log u(s, x) – maxs ∑ Pr(x) log u(s, x) Posterior optimal doubling rate Prior optimal doubling rate

slide-24
SLIDE 24

Measuring relevant information

  • Expected relevant information is non-negative.
  • Relevant information equals the maximal

fraction of future gains you can pay for a piece

  • f information without loss.
  • When u has the form u(s, x) == v(x) s(x) for some

non-negative function v, relevant information equals Shannon information. K(y) == ∑ maxs ∑ Pr(x | y) log u(s, x) – maxs ∑ Pr(x) log u(s, x)

slide-25
SLIDE 25

Example: Code-breaking

slide-26
SLIDE 26

Example: Code-breaking

? ? ? ?

Entropy: H = 4 Accumulated information: I(X; Y) == 0

slide-27
SLIDE 27

Example: Code-breaking

1 ? ? ?

Entropy: H = 3 Accumulated information: I(X; Y) == 1

1 bit!

slide-28
SLIDE 28

Example: Code-breaking

1 ? ?

Entropy: H = 2 Accumulated information: I(X; Y) == 2

1 bit!

slide-29
SLIDE 29

Example: Code-breaking

1 1 ?

Entropy: H = 1 Accumulated information: I(X; Y) == 3

1 bit!

slide-30
SLIDE 30

Example: Code-breaking

1 1 1

Entropy: H = 0 Accumulated information: I(X; Y) == 4

1 bit!

slide-31
SLIDE 31

Example: Code-breaking

1 1 1

Entropy: H = 0 Accumulated information: I(X; Y) == 4

1 bit 1 bit 1 bit 1 bit

slide-32
SLIDE 32

Example: Code-breaking

Rules:

  • You can invest a fraction f of your

capital in the guessing game

  • If you guess the correct code, you

get your investment back 16-fold: u == 1 – f + 16 f

  • Otherwise, you lose it:

u == 1 – f

? ? ? ?

W(f) == —— log(1 – f ) + —— log(1 – f + 16f ) 15 16 1 16

slide-33
SLIDE 33

Example: Code-breaking

? ? ? ?

Optimal strategy: f * == 0 Optimal doubling rate: W(f *) == 0.00 W(f) == —— log(1 – f ) + —— log(1 – f + 16f ) 15 16 1 16

slide-34
SLIDE 34

Example: Code-breaking

1 ? ? ?

Optimal strategy: f * == 1/15 Optimal doubling rate: W(f *) == 0.04

0.04 bits

W(f) == —— log(1 – f ) + —— log(1 – f + 16f ) 7 8 1 8

slide-35
SLIDE 35

Example: Code-breaking

1 ? ?

Optimal strategy: f * == 3/15 Optimal doubling rate: W(f *) == 0.26

0.22 bits

W(f) == —— log(1 – f ) + —— log(1 – f + 16f ) 3 4 1 4

slide-36
SLIDE 36

Example: Code-breaking

1 1 ?

Optimal strategy: f * == 7/15 Optimal doubling rate: W(f *) == 1.05

0.79 bits

W(f) == —— log(1 – f ) + —— log(1 – f + 16f ) 1 2 1 2

slide-37
SLIDE 37

Example: Code-breaking

1 1 1

Optimal strategy: f * == 1 Optimal doubling rate: W(f *) == 4.00

2.95 bits

W(f) == —— log(1 – f ) + —— log(1 – f + 16f ) 1 1 1

slide-38
SLIDE 38

Example: Code-breaking

? ? ? ?

Raw information (drop in entropy) Relevant information (increase in doubling rate) 1.00 1.00 1.00 1.00 0.04 0.22 0.79 2.95

slide-39
SLIDE 39

Example: Randomization

slide-40
SLIDE 40

Example: Randomization

def choose(): if flip(): if flip(): return ROCK else: return PAPER else: return SCISSORS 1/2, 1/4, 1/4 1/3, 1/3, 1/3

slide-41
SLIDE 41

Example: Randomization

Rules:

  • You (1) and the

adversary (2) both bet $1

  • You move first
  • The winner takes

the whole pool W(p) == log min { p1 + 2 p2, p2 + 2 p3, p3 + 2 p1 } 1 2

slide-42
SLIDE 42

Example: Randomization

Best accessible strategy: p* == (1, 0, 0) Doubling rate: W(p*) == –∞ W(p) == log min { p1 + 2 p2, p2 + 2 p3, p3 + 2 p1 }

slide-43
SLIDE 43

Example: Randomization

Best accessible strategy: p* == (1/2, 1/2, 0) Doubling rate: W(p*) == –1.00 W(p) == log min { p1 + 2 p2, p2 + 2 p3, p3 + 2 p1 }

slide-44
SLIDE 44

Example: Randomization

Best accessible strategy: p* == (2/4, 1/4, 1/4) Doubling rate: W(p*) == –0.42 W(p) == log min { p1 + 2 p2, p2 + 2 p3, p3 + 2 p1 }

slide-45
SLIDE 45

Example: Randomization

Best accessible strategy: p* == (3/8, 3/8, 2/8) Doubling rate: W(p*) == –0.19 W(p) == log min { p1 + 2 p2, p2 + 2 p3, p3 + 2 p1 }

slide-46
SLIDE 46

Example: Randomization

Best accessible strategy: p* == (6/16, 5/16, 5/16) Doubling rate: W(p*) == –0.09 W(p) == log min { p1 + 2 p2, p2 + 2 p3, p3 + 2 p1 }

slide-47
SLIDE 47

Coin flips Distribution Doubling rate (1, 0, 0)

–∞

1 (1/2, 1/2, 0) –1.00 2 (1/2, 1/4, 1/4) –0.42 3 (3/8, 3/8, 2/8) –0.19 4 (6/16, 5/16, 5/16) –0.09 . . . . . . . . . ∞ (1/3, 1/3, 1/3) 0.00

Example: Randomization

∞ 0.58 0.23 0.10

slide-48
SLIDE 48
slide-49
SLIDE 49

Day 1: Uncertainty and Inference Probability theory: Semantics and expressivity Random variables Generative Bayesian models stochastic processes Uncertain and information: Uncertainty as cost The Hartley measure Shannon information content and entropy Huffman coding Day 2: Counting Typical Sequences The law of large numbers Typical sequences and the source coding theorem. Stochastic processes and entropy rates the source coding theorem for stochastic processes Examples Day 3: Guessing and Gambling Evidence, likelihood ratios, competitive prediction Kullback-Leibler divergence Examples of diverging stochastic models Expressivity and the bias/variance tradeoffs. Doubling rates and proportional betting Card color prediction Day 4: Asking Questions and Engineering Answers Questions and answers (or experiments and observations) mutual information Coin weighing The maximum entropy principle The channel coding theorem Day 5: Informative Descriptions and Residual Randomness The practical problem of source coding Kraft’s inequality and prefix codes Arithmetic coding Kolmogorov complexity Tests of randomness Asymptotic equivalence of complexity and entropy

January: Project course in information theory

N

  • w

w i t h M O R E S H A N N O N !

slide-50
SLIDE 50
slide-51
SLIDE 51