[PPT] - Convergence of Random Processes DS GA 1002 Probability and PowerPoint Presentation

SLIDE 1

Convergence of Random Processes

DS GA 1002 Probability and Statistics for Data Science

http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17 Carlos Fernandez-Granda

SLIDE 2

Aim

Define convergence for random processes Describe two convergence phenomena: the law of large numbers and the central limit theorem

SLIDE 3

Types of convergence Law of Large Numbers Central Limit Theorem Monte Carlo simulation

SLIDE 4

Convergence of deterministic sequences

A deterministic sequence of real numbers x1, x2, . . . converges to x ∈ R, lim

i→∞ xi = x

if xi is arbitrarily close to x as i grows For any ǫ > 0 there is an i0 such that for all i > i0 |xi − x| < ǫ Problem: Random sequences do not have fixed values

SLIDE 5

Convergence with probability one

Consider a discrete random process X and a random variable X defined on the same probability space If we fix the outcome ω, X (i, ω) is a deterministic sequence and X (ω) is a constant We can determine whether lim

i→∞

X (i, ω) = X (ω)

for that particular ω

SLIDE 6

Convergence with probability one

X converges with probability one to X if

P

ω | ω ∈ Ω,

lim

i→∞

X (ω, i) = X (ω)
= 1

Deterministic convergence occurs with probability one

SLIDE 7

Puddle

Initial amount of water is uniform between 0 and 1 gallon After a time interval i there is i times less water

D (ω, i) := ω

i , i = 1, 2, . . .

SLIDE 8

Puddle

1 2 3 4 5 6 7 8 9 10 0.2 0.4 0.6 0.8 i

D (ω, i)

ω = 0.31 ω = 0.89 ω = 0.52

SLIDE 9

Puddle

If we fix ω ∈ (0, 1) lim

i→∞

D (ω, i) = lim

i→∞

ω i = 0

D converges to zero with probability one

SLIDE 10

Puddle

10 20 30 40 50 0.5 1 i

D (ω, i)

SLIDE 11

Alternative idea

Idea: Instead of fixing ω and checking deterministic convergence:

1. Measure how close

X (i) and X are for a fixed i using a deterministic quantity

2. Check whether the quantity tends to zero

SLIDE 12

Convergence in mean square

The mean square of Y − X measures how close X and Y are If E

(X − Y )2

= 0 then X = Y with probability one Proof: By Markov’s inequality for any ǫ > 0 P

(Y − X)2 > ǫ
≤

E

(X − Y )2

ǫ = 0

SLIDE 13

Convergence in mean square

X converges to X in mean square if

lim

i→∞ E

X −

X (i) 2 = 0

SLIDE 14

Convergence in probability

Alternative measure: Probability that |Y − X| > ǫ for small ǫ

X converges to X in probability if for any ǫ > 0

lim

i→∞ P

X −

X (i)

> ǫ
= 0

SLIDE 15

Conv. in mean square implies conv. in probability

lim

i→∞ P

X −

X (i)

> ǫ

SLIDE 16

Conv. in mean square implies conv. in probability

lim

i→∞ P

X −

X (i)

> ǫ
= lim

i→∞ P

X −

X (i) 2 > ǫ2

SLIDE 17

Conv. in mean square implies conv. in probability

lim

i→∞ P

X −

X (i)

> ǫ
= lim

i→∞ P

X −

X (i) 2 > ǫ2

≤ lim

i→∞

E

X −

X (i) 2 ǫ2

SLIDE 18

Conv. in mean square implies conv. in probability

lim

i→∞ P

X −

X (i)

> ǫ
= lim

i→∞ P

X −

X (i) 2 > ǫ2

≤ lim

i→∞

E

X −

X (i) 2 ǫ2 = 0

SLIDE 19

Conv. in mean square implies conv. in probability

lim

i→∞ P

X −

X (i)

> ǫ
= lim

i→∞ P

X −

X (i) 2 > ǫ2

≤ lim

i→∞

E

X −

X (i) 2 ǫ2 = 0 Convergence with probability one also implies convergence in probability

SLIDE 20

Convergence in distribution

The distribution of ˜ X (i) converges to the distribution of X

X converges in distribution to X if

lim

i→∞ F X(i) (x) = FX (x)

for all x at which FX is continuous

SLIDE 21

Convergence in distribution

Convergence in distribution does not imply that ˜ X (i) and X are close as i → ∞! Convergence in probability does imply convergence in distribution

SLIDE 22

Binomial tends to Poisson

◮

X (i) is binomial with parameters i and p := λ/i

◮ X is a Poisson random variable with parameter λ ◮

X (i) converges to X in distribution lim

i→∞ p X(i) (x) = lim i→∞

i x

px (1 − p)(i−x)

= λx e−λ x! = pX (x)

SLIDE 23

Probability mass function of X (40)

10 20 30 40 5 · 10−2 0.1 0.15 k

SLIDE 24

Probability mass function of X (80)

10 20 30 40 5 · 10−2 0.1 0.15 k

SLIDE 25

Probability mass function of X (400)

10 20 30 40 5 · 10−2 0.1 0.15 k

SLIDE 26

Probability mass function of X

10 20 30 40 5 · 10−2 0.1 0.15 k

SLIDE 27

Types of convergence Law of Large Numbers Central Limit Theorem Monte Carlo simulation

SLIDE 28

Moving average

The moving average A of a discrete random process X is

A (i) := 1

i

j=1
X (j)

SLIDE 29

Weak law of large numbers

Let X be an iid discrete random process with mean µ

X := µ and

bounded variance σ2 The average A of X converges in mean square to µ

SLIDE 30

Proof

E

A (i)

SLIDE 31

Proof

E

A (i)
= E

 1 i

i

j=1
X (j)

 

SLIDE 32

Proof

E

A (i)
= E

 1 i

i

j=1
X (j)

  = 1 i

i

j=1

E

X (j)

SLIDE 33

Proof

E

A (i)
= E

 1 i

i

j=1
X (j)

  = 1 i

i

j=1

E

X (j)
= µ

SLIDE 34

Proof

Var

A (i)

SLIDE 35

Proof

Var

A (i)
= Var

 1 i

i

j=1
X (j)

 

SLIDE 36

Proof

Var

A (i)
= Var

 1 i

i

j=1
X (j)

  = 1 i2

i

j=1

Var

X (j)

SLIDE 37

Proof

Var

A (i)
= Var

 1 i

i

j=1
X (j)

  = 1 i2

i

j=1

Var

X (j)
= σ2

i

SLIDE 38

Proof

lim

i→∞ E

A (i) − µ

2

SLIDE 39

Proof

lim

i→∞ E

A (i) − µ

2 = lim

i→∞ E

A (i) − E
A (i)

2

SLIDE 40

Proof

lim

i→∞ E

A (i) − µ

2 = lim

i→∞ E

A (i) − E
A (i)

2 = lim

i→∞ Var

A (i)

SLIDE 41

Proof

lim

i→∞ E

A (i) − µ

2 = lim

i→∞ E

A (i) − E
A (i)

2 = lim

i→∞ Var

A (i)
= lim

i→∞

σ2 i

SLIDE 42

Proof

lim

i→∞ E

A (i) − µ

2 = lim

i→∞ E

A (i) − E
A (i)

2 = lim

i→∞ Var

A (i)
= lim

i→∞

σ2 i = 0

SLIDE 43

Strong law of large numbers

Let X be an iid discrete random process with mean µ

X := µ and

bounded variance σ2 The average A of X converges with probability one to µ

SLIDE 44

iid standard Gaussian

10 20 30 40 50 i 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

Moving average Mean of iid seq.

SLIDE 45

iid standard Gaussian

100 200 300 400 500 i 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

Moving average Mean of iid seq.

SLIDE 46

iid standard Gaussian

1000 2000 3000 4000 5000 i 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

Moving average Mean of iid seq.

SLIDE 47

iid geometric with p = 0.4

10 20 30 40 50 i 2 4 6 8 10 12

Moving average Mean of iid seq.

SLIDE 48

iid geometric with p = 0.4

100 200 300 400 500 i 2 4 6 8 10 12

Moving average Mean of iid seq.

SLIDE 49

iid geometric with p = 0.4

1000 2000 3000 4000 5000 i 2 4 6 8 10 12

Moving average Mean of iid seq.

SLIDE 50

iid Cauchy

10 20 30 40 50 i 5 5 10 15 20 25 30

Moving average Median of iid seq.

SLIDE 51

iid Cauchy

100 200 300 400 500 i 10 5 5 10

Moving average Median of iid seq.

SLIDE 52

iid Cauchy

1000 2000 3000 4000 5000 i 60 50 40 30 20 10 10 20 30

Moving average Median of iid seq.

SLIDE 53

Types of convergence Law of Large Numbers Central Limit Theorem Monte Carlo simulation

SLIDE 54

Central Limit Theorem

Let X be an iid discrete random process with mean µ

X := µ and

bounded variance σ2 √n

A − µ
converges in distribution to a Gaussian random variable

with mean 0 and variance σ2 The average A is approximately Gaussian with mean µ and variance σ2/i

SLIDE 55

Height data

◮ Example: Data from a population of 25 000 people ◮ We compare the histogram of the heights and the pdf of a Gaussian

random variable fitted to the data

SLIDE 56

Height data

60 62 64 66 68 70 72 74 76

Height (inches)

0.05 0.10 0.15 0.20 0.25

Gaussian distribution Real data

SLIDE 57

Sketch of proof

Pdf of sum of two independent random variables is the convolution

f their pdfs

fX+Y (z) = ∞

y=−∞

fX (z − y) fY (y) dy Repeated convolutions of any pdf with bounded variance result in a Gaussian!

SLIDE 58

Repeated convolutions

i = 1 i = 2 i = 3 i = 4 i = 5

SLIDE 59

Repeated convolutions

i = 1 i = 2 i = 3 i = 4 i = 5

SLIDE 60

iid exponential λ = 2, i = 102

0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 1 2 3 4 5 6 7 8 9

SLIDE 61

iid exponential λ = 2, i = 103

0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 5 10 15 20 25 30

SLIDE 62

iid exponential λ = 2, i = 104

0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 10 20 30 40 50 60 70 80 90

SLIDE 63

iid geometric p = 0.4, i = 102

1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 0.5 1.0 1.5 2.0 2.5

SLIDE 64

iid geometric p = 0.4, i = 103

1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 1 2 3 4 5 6 7

SLIDE 65

iid geometric p = 0.4, i = 104

1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 5 10 15 20 25

SLIDE 66

iid Cauchy, i = 102

20 15 10 5 5 10 15 0.05 0.10 0.15 0.20 0.25 0.30

SLIDE 67

iid Cauchy, i = 103

20 15 10 5 5 10 15 0.05 0.10 0.15 0.20 0.25 0.30

SLIDE 68

iid Cauchy, i = 104

20 15 10 5 5 10 15 0.05 0.10 0.15 0.20 0.25 0.30

SLIDE 69

Gaussian approximation to the binomial

X is binomial with parameters n and p Computing the probability that X is in a certain interval requires summing its pmf over the interval Central limit theorem provides a quick approximation X =

n

i=1

Bi, E (Bi) = p, Var (Bi) = p (1 − p)

1 nX is approximately Gaussian with mean p and variance p (1 − p) /n

X is approximately Gaussian with mean np and variance np (1 − p)

SLIDE 70

Gaussian approximation to the binomial

Basketball player makes shot with probability p = 0.4 (shots are iid) Probability that she makes more than 420 shots out of 1000? Exact answer: P (X ≥ 420) =

1000

x=420

pX (x) =

1000

x=420

1000 x

0.4x0.6(n−x) = 10.4 10−2

Approximation: P (X ≥ 420)

SLIDE 71

Gaussian approximation to the binomial

Basketball player makes shot with probability p = 0.4 (shots are iid) Probability that she makes more than 420 shots out of 1000? Exact answer: P (X ≥ 420) =

1000

x=420

pX (x) =

1000

x=420

1000 x

0.4x0.6(n−x) = 10.4 10−2

Approximation: P (X ≥ 420) ≈ P

np (1 − p)U + np ≥ 420

SLIDE 72

Gaussian approximation to the binomial

Basketball player makes shot with probability p = 0.4 (shots are iid) Probability that she makes more than 420 shots out of 1000? Exact answer: P (X ≥ 420) =

1000

x=420

pX (x) =

1000

x=420

1000 x

0.4x0.6(n−x) = 10.4 10−2

Approximation: P (X ≥ 420) ≈ P

np (1 − p)U + np ≥ 420
= P (U ≥ 1.29)

SLIDE 73

Gaussian approximation to the binomial

Basketball player makes shot with probability p = 0.4 (shots are iid) Probability that she makes more than 420 shots out of 1000? Exact answer: P (X ≥ 420) =

1000

x=420

pX (x) =

1000

x=420

1000 x

0.4x0.6(n−x) = 10.4 10−2

Approximation: P (X ≥ 420) ≈ P

np (1 − p)U + np ≥ 420
= P (U ≥ 1.29)

= 1 − Φ (1.29) = 9.85 10−2

SLIDE 74

Types of convergence Law of Large Numbers Central Limit Theorem Monte Carlo simulation

SLIDE 75

Monte Carlo simulation

Simulation is a powerful tool in probability and statistics Models are too complex to derive closed-form solutions (life is not a homework problem!) Example: Game of solitaire

SLIDE 76

Game of solitaire

Aim: Compute the probability that you win at solitaire If every permutation of the cards has the same probability P (Win) = Number of permutations that lead to a win Total number Problem: Characterizing permutations that lead to a win is very difficult without playing out the game We can’t just check because there are 52! ≈ 8 1067 permutations! Solution: Sample many permutations and compute the fraction of wins

SLIDE 77

In the words of Stanislaw Ulam

The first thoughts and attempts I made to practice (the Monte Carlo Method) were suggested by a question which occurred to me in 1946 as I was convalescing from an illness and playing solitaires. The question was what are the chances that a Canfield solitaire laid out with 52 cards will come out successfully? After spending a lot of time trying to estimate them by pure combinatorial calculations, I wondered whether a more practical method than "abstract thinking" might not be to lay it out say

ne hundred times and simply observe and count the number of successful

plays.This was already possible to envisage with the beginning of the new era of fast computers.

SLIDE 78

Monte Carlo approximation

Main principle: Use simulation to approximate quantities that are challenging to compute exactly To approximate the probability of an event E

1. Generate n independent samples from 1E: I1, I2, . . . , In
2. Compute the average of the n samples
A (n) := 1

n

i=1

Ii By the law of large numbers A converges to P (E) as n → ∞ since E (1E) = P (E)

SLIDE 79

Basketball league

Basketball league with m teams In a season every pair of teams plays once Teams are ordered: team 1 is best, team m is worst Model: For 1 ≤ i < j ≤ m P (team j beats team i) := 1 j − i + 1 Games are independent

SLIDE 80

Basketball league

Aim: Compute probability of team ranks at the end of the season The rank of team i is modeled as a random variable Ri Pmf of R1, R2, . . . , Rm?

SLIDE 81

m = 3

Game outcomes Rank Probability 1-2 1-3 2-3 R1 R2 R3 1 1 2 1 2 3 1/6 1 1 3 1 3 2 1/6 1 3 2 1 1 1 1/12 1 3 3 2 3 1 1/12 2 1 2 2 1 3 1/6 2 1 3 1 1 1 1/6 2 3 2 3 1 2 1/12 2 3 3 3 2 1 1/12

SLIDE 82

m = 3

Probability mass function R1 R2 R3 1 7/12 1/2 5/12 2 1/4 1/4 1/4 3 1/6 1/4 1/3

SLIDE 83

Basketball league

Problem: Number of possible outcomes is 2m(m−1)/2! For m = 10 this is larger than 1013 Solution: Apply Monte Carlo approximation

SLIDE 84

m = 3

Game outcomes Rank 1-2 1-3 2-3 R1 R2 R3 1 3 2 1 1 1 1 1 3 1 3 2 2 1 2 2 1 3 2 3 2 3 1 2 2 1 3 1 1 1 1 1 2 1 2 3 2 1 3 1 1 1 2 3 2 3 1 2 1 1 2 1 2 3 2 3 2 3 1 2

SLIDE 85

m = 3

Estimated pmf (n = 10) R1 R2 R3 1 0.6 (0.583) 0.7 (0.5) 0.3 (0.417) 2 0.1 (0.25) 0.2 (0.25) 0.4 (0.25) 3 0.3 (0.167) 0.1 (0.25) 0.3 (0.333)

SLIDE 86

m = 3

Estimated pmf (n = 2, 000) R1 R2 R3 1 0.582 (0.583) 0.496 (0.5) 0.417 (0.417) 2 0.248 (0.25) 0.261 (0.25) 0.244 (0.25) 3 0.171 (0.167) 0.245 (0.25) 0.339 (0.333)

SLIDE 87

Running times

2 4 6 8 10 12 14 16 18 20 10−3 10−2 10−1 100 101 102 103 Number of teams m Running time (seconds) Exact computation Monte Carlo approx.

SLIDE 88

Error

m Average error 3 9.28 10−3 4 12.7 10−3 5 7.95 10−3 6 7.12 10−3 7 7.12 10−3

SLIDE 89

m = 5

SLIDE 90

m = 20

SLIDE 91