Convergence of Random Processes DS GA 1002 Probability and - - PowerPoint PPT Presentation

convergence of random processes
SMART_READER_LITE
LIVE PREVIEW

Convergence of Random Processes DS GA 1002 Probability and - - PowerPoint PPT Presentation

Convergence of Random Processes DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17 Carlos Fernandez-Granda Aim Define convergence for random processes Describe two convergence


slide-1
SLIDE 1

Convergence of Random Processes

DS GA 1002 Probability and Statistics for Data Science

http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17 Carlos Fernandez-Granda

slide-2
SLIDE 2

Aim

Define convergence for random processes Describe two convergence phenomena: the law of large numbers and the central limit theorem

slide-3
SLIDE 3

Types of convergence Law of Large Numbers Central Limit Theorem Monte Carlo simulation

slide-4
SLIDE 4

Convergence of deterministic sequences

A deterministic sequence of real numbers x1, x2, . . . converges to x ∈ R, lim

i→∞ xi = x

if xi is arbitrarily close to x as i grows For any ǫ > 0 there is an i0 such that for all i > i0 |xi − x| < ǫ Problem: Random sequences do not have fixed values

slide-5
SLIDE 5

Convergence with probability one

Consider a discrete random process X and a random variable X defined on the same probability space If we fix the outcome ω, X (i, ω) is a deterministic sequence and X (ω) is a constant We can determine whether lim

i→∞

  • X (i, ω) = X (ω)

for that particular ω

slide-6
SLIDE 6

Convergence with probability one

  • X converges with probability one to X if

P

  • ω | ω ∈ Ω,

lim

i→∞

  • X (ω, i) = X (ω)
  • = 1

Deterministic convergence occurs with probability one

slide-7
SLIDE 7

Puddle

Initial amount of water is uniform between 0 and 1 gallon After a time interval i there is i times less water

  • D (ω, i) := ω

i , i = 1, 2, . . .

slide-8
SLIDE 8

Puddle

1 2 3 4 5 6 7 8 9 10 0.2 0.4 0.6 0.8 i

  • D (ω, i)

ω = 0.31 ω = 0.89 ω = 0.52

slide-9
SLIDE 9

Puddle

If we fix ω ∈ (0, 1) lim

i→∞

  • D (ω, i) = lim

i→∞

ω i = 0

  • D converges to zero with probability one
slide-10
SLIDE 10

Puddle

10 20 30 40 50 0.5 1 i

  • D (ω, i)
slide-11
SLIDE 11

Alternative idea

Idea: Instead of fixing ω and checking deterministic convergence:

  • 1. Measure how close

X (i) and X are for a fixed i using a deterministic quantity

  • 2. Check whether the quantity tends to zero
slide-12
SLIDE 12

Convergence in mean square

The mean square of Y − X measures how close X and Y are If E

  • (X − Y )2

= 0 then X = Y with probability one Proof: By Markov’s inequality for any ǫ > 0 P

  • (Y − X)2 > ǫ

E

  • (X − Y )2

ǫ = 0

slide-13
SLIDE 13

Convergence in mean square

  • X converges to X in mean square if

lim

i→∞ E

  • X −

X (i) 2 = 0

slide-14
SLIDE 14

Convergence in probability

Alternative measure: Probability that |Y − X| > ǫ for small ǫ

  • X converges to X in probability if for any ǫ > 0

lim

i→∞ P

  • X −

X (i)

  • > ǫ
  • = 0
slide-15
SLIDE 15
  • Conv. in mean square implies conv. in probability

lim

i→∞ P

  • X −

X (i)

  • > ǫ
slide-16
SLIDE 16
  • Conv. in mean square implies conv. in probability

lim

i→∞ P

  • X −

X (i)

  • > ǫ
  • = lim

i→∞ P

  • X −

X (i) 2 > ǫ2

slide-17
SLIDE 17
  • Conv. in mean square implies conv. in probability

lim

i→∞ P

  • X −

X (i)

  • > ǫ
  • = lim

i→∞ P

  • X −

X (i) 2 > ǫ2

  • ≤ lim

i→∞

E

  • X −

X (i) 2 ǫ2

slide-18
SLIDE 18
  • Conv. in mean square implies conv. in probability

lim

i→∞ P

  • X −

X (i)

  • > ǫ
  • = lim

i→∞ P

  • X −

X (i) 2 > ǫ2

  • ≤ lim

i→∞

E

  • X −

X (i) 2 ǫ2 = 0

slide-19
SLIDE 19
  • Conv. in mean square implies conv. in probability

lim

i→∞ P

  • X −

X (i)

  • > ǫ
  • = lim

i→∞ P

  • X −

X (i) 2 > ǫ2

  • ≤ lim

i→∞

E

  • X −

X (i) 2 ǫ2 = 0 Convergence with probability one also implies convergence in probability

slide-20
SLIDE 20

Convergence in distribution

The distribution of ˜ X (i) converges to the distribution of X

  • X converges in distribution to X if

lim

i→∞ F X(i) (x) = FX (x)

for all x at which FX is continuous

slide-21
SLIDE 21

Convergence in distribution

Convergence in distribution does not imply that ˜ X (i) and X are close as i → ∞! Convergence in probability does imply convergence in distribution

slide-22
SLIDE 22

Binomial tends to Poisson

X (i) is binomial with parameters i and p := λ/i

◮ X is a Poisson random variable with parameter λ ◮

X (i) converges to X in distribution lim

i→∞ p X(i) (x) = lim i→∞

i x

  • px (1 − p)(i−x)

= λx e−λ x! = pX (x)

slide-23
SLIDE 23

Probability mass function of X (40)

10 20 30 40 5 · 10−2 0.1 0.15 k

slide-24
SLIDE 24

Probability mass function of X (80)

10 20 30 40 5 · 10−2 0.1 0.15 k

slide-25
SLIDE 25

Probability mass function of X (400)

10 20 30 40 5 · 10−2 0.1 0.15 k

slide-26
SLIDE 26

Probability mass function of X

10 20 30 40 5 · 10−2 0.1 0.15 k

slide-27
SLIDE 27

Types of convergence Law of Large Numbers Central Limit Theorem Monte Carlo simulation

slide-28
SLIDE 28

Moving average

The moving average A of a discrete random process X is

  • A (i) := 1

i

i

  • j=1
  • X (j)
slide-29
SLIDE 29

Weak law of large numbers

Let X be an iid discrete random process with mean µ

X := µ and

bounded variance σ2 The average A of X converges in mean square to µ

slide-30
SLIDE 30

Proof

E

  • A (i)
slide-31
SLIDE 31

Proof

E

  • A (i)
  • = E

 1 i

i

  • j=1
  • X (j)

 

slide-32
SLIDE 32

Proof

E

  • A (i)
  • = E

 1 i

i

  • j=1
  • X (j)

  = 1 i

i

  • j=1

E

  • X (j)
slide-33
SLIDE 33

Proof

E

  • A (i)
  • = E

 1 i

i

  • j=1
  • X (j)

  = 1 i

i

  • j=1

E

  • X (j)
  • = µ
slide-34
SLIDE 34

Proof

Var

  • A (i)
slide-35
SLIDE 35

Proof

Var

  • A (i)
  • = Var

 1 i

i

  • j=1
  • X (j)

 

slide-36
SLIDE 36

Proof

Var

  • A (i)
  • = Var

 1 i

i

  • j=1
  • X (j)

  = 1 i2

i

  • j=1

Var

  • X (j)
slide-37
SLIDE 37

Proof

Var

  • A (i)
  • = Var

 1 i

i

  • j=1
  • X (j)

  = 1 i2

i

  • j=1

Var

  • X (j)
  • = σ2

i

slide-38
SLIDE 38

Proof

lim

i→∞ E

  • A (i) − µ

2

slide-39
SLIDE 39

Proof

lim

i→∞ E

  • A (i) − µ

2 = lim

i→∞ E

  • A (i) − E
  • A (i)

2

slide-40
SLIDE 40

Proof

lim

i→∞ E

  • A (i) − µ

2 = lim

i→∞ E

  • A (i) − E
  • A (i)

2 = lim

i→∞ Var

  • A (i)
slide-41
SLIDE 41

Proof

lim

i→∞ E

  • A (i) − µ

2 = lim

i→∞ E

  • A (i) − E
  • A (i)

2 = lim

i→∞ Var

  • A (i)
  • = lim

i→∞

σ2 i

slide-42
SLIDE 42

Proof

lim

i→∞ E

  • A (i) − µ

2 = lim

i→∞ E

  • A (i) − E
  • A (i)

2 = lim

i→∞ Var

  • A (i)
  • = lim

i→∞

σ2 i = 0

slide-43
SLIDE 43

Strong law of large numbers

Let X be an iid discrete random process with mean µ

X := µ and

bounded variance σ2 The average A of X converges with probability one to µ

slide-44
SLIDE 44

iid standard Gaussian

10 20 30 40 50 i 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

Moving average Mean of iid seq.

slide-45
SLIDE 45

iid standard Gaussian

100 200 300 400 500 i 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

Moving average Mean of iid seq.

slide-46
SLIDE 46

iid standard Gaussian

1000 2000 3000 4000 5000 i 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

Moving average Mean of iid seq.

slide-47
SLIDE 47

iid geometric with p = 0.4

10 20 30 40 50 i 2 4 6 8 10 12

Moving average Mean of iid seq.

slide-48
SLIDE 48

iid geometric with p = 0.4

100 200 300 400 500 i 2 4 6 8 10 12

Moving average Mean of iid seq.

slide-49
SLIDE 49

iid geometric with p = 0.4

1000 2000 3000 4000 5000 i 2 4 6 8 10 12

Moving average Mean of iid seq.

slide-50
SLIDE 50

iid Cauchy

10 20 30 40 50 i 5 5 10 15 20 25 30

Moving average Median of iid seq.

slide-51
SLIDE 51

iid Cauchy

100 200 300 400 500 i 10 5 5 10

Moving average Median of iid seq.

slide-52
SLIDE 52

iid Cauchy

1000 2000 3000 4000 5000 i 60 50 40 30 20 10 10 20 30

Moving average Median of iid seq.

slide-53
SLIDE 53

Types of convergence Law of Large Numbers Central Limit Theorem Monte Carlo simulation

slide-54
SLIDE 54

Central Limit Theorem

Let X be an iid discrete random process with mean µ

X := µ and

bounded variance σ2 √n

  • A − µ
  • converges in distribution to a Gaussian random variable

with mean 0 and variance σ2 The average A is approximately Gaussian with mean µ and variance σ2/i

slide-55
SLIDE 55

Height data

◮ Example: Data from a population of 25 000 people ◮ We compare the histogram of the heights and the pdf of a Gaussian

random variable fitted to the data

slide-56
SLIDE 56

Height data

60 62 64 66 68 70 72 74 76

Height (inches)

0.05 0.10 0.15 0.20 0.25

Gaussian distribution Real data

slide-57
SLIDE 57

Sketch of proof

Pdf of sum of two independent random variables is the convolution

  • f their pdfs

fX+Y (z) = ∞

y=−∞

fX (z − y) fY (y) dy Repeated convolutions of any pdf with bounded variance result in a Gaussian!

slide-58
SLIDE 58

Repeated convolutions

i = 1 i = 2 i = 3 i = 4 i = 5

slide-59
SLIDE 59

Repeated convolutions

i = 1 i = 2 i = 3 i = 4 i = 5

slide-60
SLIDE 60

iid exponential λ = 2, i = 102

0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 1 2 3 4 5 6 7 8 9

slide-61
SLIDE 61

iid exponential λ = 2, i = 103

0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 5 10 15 20 25 30

slide-62
SLIDE 62

iid exponential λ = 2, i = 104

0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 10 20 30 40 50 60 70 80 90

slide-63
SLIDE 63

iid geometric p = 0.4, i = 102

1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 0.5 1.0 1.5 2.0 2.5

slide-64
SLIDE 64

iid geometric p = 0.4, i = 103

1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 1 2 3 4 5 6 7

slide-65
SLIDE 65

iid geometric p = 0.4, i = 104

1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 5 10 15 20 25

slide-66
SLIDE 66

iid Cauchy, i = 102

20 15 10 5 5 10 15 0.05 0.10 0.15 0.20 0.25 0.30

slide-67
SLIDE 67

iid Cauchy, i = 103

20 15 10 5 5 10 15 0.05 0.10 0.15 0.20 0.25 0.30

slide-68
SLIDE 68

iid Cauchy, i = 104

20 15 10 5 5 10 15 0.05 0.10 0.15 0.20 0.25 0.30

slide-69
SLIDE 69

Gaussian approximation to the binomial

X is binomial with parameters n and p Computing the probability that X is in a certain interval requires summing its pmf over the interval Central limit theorem provides a quick approximation X =

n

  • i=1

Bi, E (Bi) = p, Var (Bi) = p (1 − p)

1 nX is approximately Gaussian with mean p and variance p (1 − p) /n

X is approximately Gaussian with mean np and variance np (1 − p)

slide-70
SLIDE 70

Gaussian approximation to the binomial

Basketball player makes shot with probability p = 0.4 (shots are iid) Probability that she makes more than 420 shots out of 1000? Exact answer: P (X ≥ 420) =

1000

  • x=420

pX (x) =

1000

  • x=420

1000 x

  • 0.4x0.6(n−x) = 10.4 10−2

Approximation: P (X ≥ 420)

slide-71
SLIDE 71

Gaussian approximation to the binomial

Basketball player makes shot with probability p = 0.4 (shots are iid) Probability that she makes more than 420 shots out of 1000? Exact answer: P (X ≥ 420) =

1000

  • x=420

pX (x) =

1000

  • x=420

1000 x

  • 0.4x0.6(n−x) = 10.4 10−2

Approximation: P (X ≥ 420) ≈ P

  • np (1 − p)U + np ≥ 420
slide-72
SLIDE 72

Gaussian approximation to the binomial

Basketball player makes shot with probability p = 0.4 (shots are iid) Probability that she makes more than 420 shots out of 1000? Exact answer: P (X ≥ 420) =

1000

  • x=420

pX (x) =

1000

  • x=420

1000 x

  • 0.4x0.6(n−x) = 10.4 10−2

Approximation: P (X ≥ 420) ≈ P

  • np (1 − p)U + np ≥ 420
  • = P (U ≥ 1.29)
slide-73
SLIDE 73

Gaussian approximation to the binomial

Basketball player makes shot with probability p = 0.4 (shots are iid) Probability that she makes more than 420 shots out of 1000? Exact answer: P (X ≥ 420) =

1000

  • x=420

pX (x) =

1000

  • x=420

1000 x

  • 0.4x0.6(n−x) = 10.4 10−2

Approximation: P (X ≥ 420) ≈ P

  • np (1 − p)U + np ≥ 420
  • = P (U ≥ 1.29)

= 1 − Φ (1.29) = 9.85 10−2

slide-74
SLIDE 74

Types of convergence Law of Large Numbers Central Limit Theorem Monte Carlo simulation

slide-75
SLIDE 75

Monte Carlo simulation

Simulation is a powerful tool in probability and statistics Models are too complex to derive closed-form solutions (life is not a homework problem!) Example: Game of solitaire

slide-76
SLIDE 76

Game of solitaire

Aim: Compute the probability that you win at solitaire If every permutation of the cards has the same probability P (Win) = Number of permutations that lead to a win Total number Problem: Characterizing permutations that lead to a win is very difficult without playing out the game We can’t just check because there are 52! ≈ 8 1067 permutations! Solution: Sample many permutations and compute the fraction of wins

slide-77
SLIDE 77

In the words of Stanislaw Ulam

The first thoughts and attempts I made to practice (the Monte Carlo Method) were suggested by a question which occurred to me in 1946 as I was convalescing from an illness and playing solitaires. The question was what are the chances that a Canfield solitaire laid out with 52 cards will come out successfully? After spending a lot of time trying to estimate them by pure combinatorial calculations, I wondered whether a more practical method than "abstract thinking" might not be to lay it out say

  • ne hundred times and simply observe and count the number of successful

plays.This was already possible to envisage with the beginning of the new era of fast computers.

slide-78
SLIDE 78

Monte Carlo approximation

Main principle: Use simulation to approximate quantities that are challenging to compute exactly To approximate the probability of an event E

  • 1. Generate n independent samples from 1E: I1, I2, . . . , In
  • 2. Compute the average of the n samples
  • A (n) := 1

n

n

  • i=1

Ii By the law of large numbers A converges to P (E) as n → ∞ since E (1E) = P (E)

slide-79
SLIDE 79

Basketball league

Basketball league with m teams In a season every pair of teams plays once Teams are ordered: team 1 is best, team m is worst Model: For 1 ≤ i < j ≤ m P (team j beats team i) := 1 j − i + 1 Games are independent

slide-80
SLIDE 80

Basketball league

Aim: Compute probability of team ranks at the end of the season The rank of team i is modeled as a random variable Ri Pmf of R1, R2, . . . , Rm?

slide-81
SLIDE 81

m = 3

Game outcomes Rank Probability 1-2 1-3 2-3 R1 R2 R3 1 1 2 1 2 3 1/6 1 1 3 1 3 2 1/6 1 3 2 1 1 1 1/12 1 3 3 2 3 1 1/12 2 1 2 2 1 3 1/6 2 1 3 1 1 1 1/6 2 3 2 3 1 2 1/12 2 3 3 3 2 1 1/12

slide-82
SLIDE 82

m = 3

Probability mass function R1 R2 R3 1 7/12 1/2 5/12 2 1/4 1/4 1/4 3 1/6 1/4 1/3

slide-83
SLIDE 83

Basketball league

Problem: Number of possible outcomes is 2m(m−1)/2! For m = 10 this is larger than 1013 Solution: Apply Monte Carlo approximation

slide-84
SLIDE 84

m = 3

Game outcomes Rank 1-2 1-3 2-3 R1 R2 R3 1 3 2 1 1 1 1 1 3 1 3 2 2 1 2 2 1 3 2 3 2 3 1 2 2 1 3 1 1 1 1 1 2 1 2 3 2 1 3 1 1 1 2 3 2 3 1 2 1 1 2 1 2 3 2 3 2 3 1 2

slide-85
SLIDE 85

m = 3

Estimated pmf (n = 10) R1 R2 R3 1 0.6 (0.583) 0.7 (0.5) 0.3 (0.417) 2 0.1 (0.25) 0.2 (0.25) 0.4 (0.25) 3 0.3 (0.167) 0.1 (0.25) 0.3 (0.333)

slide-86
SLIDE 86

m = 3

Estimated pmf (n = 2, 000) R1 R2 R3 1 0.582 (0.583) 0.496 (0.5) 0.417 (0.417) 2 0.248 (0.25) 0.261 (0.25) 0.244 (0.25) 3 0.171 (0.167) 0.245 (0.25) 0.339 (0.333)

slide-87
SLIDE 87

Running times

2 4 6 8 10 12 14 16 18 20 10−3 10−2 10−1 100 101 102 103 Number of teams m Running time (seconds) Exact computation Monte Carlo approx.

slide-88
SLIDE 88

Error

m Average error 3 9.28 10−3 4 12.7 10−3 5 7.95 10−3 6 7.12 10−3 7 7.12 10−3

slide-89
SLIDE 89

m = 5

slide-90
SLIDE 90

m = 20

slide-91
SLIDE 91

m = 100