E [ Y | X = x ] = yPr [ Y = y | X = x ] Warning: This lecture is - - PowerPoint PPT Presentation

▶

Feb 09, 2024 154 likes •225 views

CS70: Jean Walrand: Lecture 35. Conditional Expectation Properties of Conditional Expectation Conditional Expectation, Continuous Probability E [ Y | X = x ] = yPr [ Y = y | X = x ] Warning: This lecture is rated R. Definition Let X and Y be

SLIDE 1

CS70: Jean Walrand: Lecture 35.

Conditional Expectation, Continuous Probability Warning: This lecture is rated R.

1. Conditional Expectation

◮ Review ◮ Going Viral ◮ Walt’s Identity ◮ CE = MMSE

2. Continuous Probability

◮ Motivation. ◮ Continuous Random Variables. ◮ Cumulative Distribution Function. ◮ Probability Density Function ◮ Expectation and Variance

Conditional Expectation

Definition Let X and Y be RVs on Ω. The conditional expectation of Y given X is defined as E[Y|X] = g(X) where g(x) := E[Y|X = x] := ∑

y

yPr[Y = y|X = x].

Properties of Conditional Expectation

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y].

Application: Going Viral

Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird). You have d friends. Each of your friend retweets w.p. p. Each of your friends has d friends, etc. Does the rumor spread? Does it die out (mercifully)? In this example, d = 4.

Application: Going Viral

Fact: Let X = ∑∞

n=1 Xn. Then, E[X] < ∞ iff pd < 1.

Proof:

Given Xn = k, Xn+1 = B(kd,p). Hence, E[Xn+1|Xn = k] = kpd. Thus, E[Xn+1|Xn] = pdXn. Consequently, E[Xn] = (pd)n−1,n ≥ 1. If pd < 1, then E[X1 +···+Xn] ≤ (1−pd)−1 = ⇒ E[X] ≤ (1−pd)−1. If pd ≥ 1, then for all C one can find n s.t. E[X] ≥ E[X1 +···+Xn] ≥ C. In fact, one can show that pd ≥ 1 = ⇒ Pr[X = ∞] > 0.

Application: Going Viral

An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds. To see this, note that given Xn = k, and given the numbers of friends D1 = d1,...,Dk = dk of these Xn people, one has Xn+1 = B(d1 +···+dk,p). Hence, E[Xn+1|Xn = k,D1 = d1,...,Dk = dk] = p(d1 +···+dk). Thus, E[Xn+1|Xn = k,D1,...,Dk] = p(D1 +···+Dk). Consequently, E[Xn+1|Xn = k] = E[p(D1 +···+Dk)] = pdk. Finally, E[Xn+1|Xn] = pdXn, and E[Xn+1] = pdE[Xn]. We conclude as before.

SLIDE 2

Application: Wald’s Identity

Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity Assume that X1,X2,... and Z are independent, where Z takes values in {0,1,2,...} and E[Xn] = µ for all n ≥ 1. Then, E[X1 +···+XZ] = µE[Z]. Proof: E[X1 +···+XZ|Z = k] = µk. Thus, E[X1 +···+XZ|Z] = µZ. Hence, E[X1 +···+XZ] = E[µZ] = µE[Z].

CE = MMSE

Theorem E[Y|X] is the ‘best’ guess about Y based on X. Specifically, it is the function g(X) of X that minimizes E[(Y −g(X))2].

CE = MMSE

Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2] . Proof: First recall the projection property of CE: E[(Y −E[Y|X])h(X)] = 0,∀h(·). That is, the error Y −E[Y|X] is orthogonal to any h(X).

CE = MMSE

Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2] . Proof: Let h(X) be any function of X. Then E[(Y −h(X))2] = E[(Y −g(X)+g(X)−h(X))2] = E[(Y −g(X))2]+E[(g(X)−h(X))2] +2E[(Y −g(X))(g(X)−h(X))]. But, E[(Y −g(X))(g(X)−h(X))] = 0 by the projection property. Thus, E[(Y −h(X))2] ≥ E[(Y −g(X))2].

E[Y|X] and L[Y|X] as projections

L[Y|X] is the projection of Y on {a+bX,a,b ∈ ℜ}: LLSE E[Y|X] is the projection of Y on {g(X),g(·) : ℜ → ℜ}: MMSE.

Continuous Probability - James Bond.

◮ Escapes from SPECTRE sometime during 1,000 mile

flight.

◮ Uniformly likely to be at any point along path.

What is the chance he is at any point along the path? Discrete Setting: Uniorm over Ω = {1,...,1000}. Continuous setting: probability at any point in [0,1000]? Probability at any one of an infinite number of points is .. ...uh ...0?

SLIDE 3

Continuous Probability: the interval!

Consider [a,b] ⊆ [0,ℓ] (for James, ℓ = 1000.) Let [a,b] also denote the event that point is in the interval [a,b]. Pr[[a,b]] = length of [a,b] length of [0,ℓ] = b −a ℓ = b −a 1000. Again, [a,b] ⊆ Ω = [0,ℓ] are events. Events in this space are unions of intervals. Example: the event A - “within 50 miles of base” is [0,50]∪[950,1000]. Pr[A] = Pr[[0,50]]+Pr[[950,10000]] = 1 10.

Shooting..

Another Bond example: Spectre is chasing him in a buggie. Bond shoots at buggy and hits it at random spot. What is the chance he hits gas tank? Gas tank is a one foot circle and the buggy is 4×5 rectangle. buggy gas Ω = {(x,y) : x ∈ [0,4],y ∈ [0,5]}. The size of the event is π(1)2 = π. The “size” of the sample space which is 4×5. Since uniform, probability of event is π

20.

Buffon’s needle.

Throw a needle on a board with horizontal lines at random. Lines 1 unit apart, needle has length 1. What is the probability that the needle hits a line? Clearly... 2 π .

Buffon’s needle.

Sample space: possible positions of needle. Position: center position (X,Y), orientation, Θ.

·

Θ (X,Y) Y Relevant: X coordinate .. doesn’t matter; Y coordinate := distance from closest line. Y ∈ [0, 1

2]; Θ := closest angle to

vertical [−π

2, π 2]. When Y ≤ 1 2 cosΘ: needle intersects line.

Pr[“intersects”] =

π/2

−π/2

Pr[Θ ∈ [θ,θ +dθ]]Pr[Y ≤ 1

2 cosθ]

π/2

−π/2

[dθ

π ]×[(1/2)cosθ 1/2 ]

π [1 2 sinθ]π/2

−π/2 = 2

π .

Continuous Random Variables: CDF

Pr[a ≤ X ≤ b] instead of Pr[X = a]. For all a and b specifies the behavior! Simpler: P[X ≤ x] for all x. Cumulative probability Distribution Function of X is F(x) = Pr[X ≤ x] Pr[a < X ≤ b] = Pr[X ≤ b]−Pr[X ≤ a] = F(b)−F(a). Idea: two events X ≤ b and X ≤ a. Difference is the event a ≤ X ≤ b.

Example: CDF

Example: Bond’s position. F(x) = Pr[X ≤ x] =    for x < 0

x 1000

for 0 ≤ x ≤ 1000 1 for x > 1000 Probability that Bond is within 50 miles of center: Pr[450 < X ≤ 550] = Pr[X ≤ 550]−Pr[X ≤ 450] = 550 1000 − 450 1000 = 100 1000 = 1 10

SLIDE 4

Example: CDF

Example: hitting random location on gas tank. Random location on circle. y 1 Random Variable: Y distance from center. Probability within y of center: Pr[Y ≤ y] = area of small circle area of dartboard = πy2 π = y2. Hence, FY(y) = Pr[Y ≤ y] =    for y < 0 y2 for 0 ≤ y ≤ 1 1 for y > 1

Calculation of event with dartboard..

Probability between .5 and .6 of center? Recall CDF . FY(y) = Pr[Y ≤ y] =    for y < 0 y2 for 0 ≤ y ≤ 1 1 for y > 1 Pr[0.5 < Y ≤ 0.6] = Pr[Y ≤ 0.6]−Pr[Y ≤ 0.5] = FY(0.6)−FY(0.5) = .36−.25 = .11

Density function.

Is the dart more like to be (near) .5 or .1? Probability of “Near x” is Pr[x < X ≤ x +δ]. Goes to 0 as δ goes to zero. Try Pr[x < X ≤ x +δ] δ . The limit as δ goes to zero. lim

δ→0

Pr[x < X ≤ x +δ] δ = lim

δ→0

Pr[X ≤ x +δ]−Pr[X ≤ x] δ = lim

δ→0

FX(x +δ)−FX(x) δ = d(F(x)) dx .

Density

Definition: (Density) A probability density function for a random variable X with cdf FX(x) = Pr[X ≤ x] is the function fX(x) where FX(x) =

x

−∞ fX(x)dx.

Thus, Pr[X ∈ (x,x +δ]] = FX(x +δ)−FX(x) = fX(x)δ.

Examples: Density.

Example: uniform over interval [0,1000] fX(x) = F ′

X(x) =

   for x < 0

1 1000

for 0 ≤ x ≤ 1000 for x > 1000 Example: uniform over interval [0,ℓ] fX(x) = F ′

X(x) =

   for x < 0

1 ℓ

for 0 ≤ x ≤ ℓ for x > ℓ

Examples: Density.

Example: “Dart” board. Recall that FY(y) = Pr[Y ≤ y] =    for y < 0 y2 for 0 ≤ y ≤ 1 1 for y > 1 fY(y) = F ′

Y(y) =

   for y < 0 2y for 0 ≤ y ≤ 1 for y > 1 The cumulative distribution function (cdf) and probability distribution function (pdf) give full information. Use whichever is convenient.

SLIDE 5

U[a,b] Expo(λ)

The exponential distribution with parameter λ > 0 is defined by

fX(x) = λe−λx1{x ≥ 0} FX(x) = 0, if x < 0 1−e−λx, if x ≥ 0.

Expectation

Recall that Pr[X ∈ (iδ,i(δ +1)]] = fX(iδ)δ. Thus, E[X] =

∞

∑

i=−∞

(iδ)Pr[iδ < X ≤ (i +1)δ] =

∞

∑

i=−∞

(iδ)fX(iδ)δ =

∞

−∞ xfX(x)dx.

Definition The expectation, E[X] of a continuous random variable is defined as E[X] =

∞

−∞ x f(x)dx.

Expectation of U[a,b]

Let X = U[a,b]. That is, fX(x) = 1 b −a1{a ≤ x ≤ b}. Hence, E[X] =

∞

−∞ x fX(x)dx =

b

a x

1 b −adx = 1 b −a

b

a xdx =

1 b −a[x2 2 ]b

a

= 1 2(b −a)[b2 −a2] = a+b 2 .

Expectation: dartboard.

Example: distance from center on radius 1 dartboard. Recall: fY(y) = 2y1{0 ≤ y ≤ 1}. Hence,

∞

−∞ y f(y)dy

=

−∞ 0+

1 0 2y2dy +

∞

1 0dy

= 0+ 2y3 3

0 +0

= 2 3 Try whole process for general radius. What do you get?

Expectation: Exponential.

Let X = Expo(λ) Then, E[X] =

∞

−∞ xfX(x)dx =

∞

0 xλe−λxdx

= −

∞

0 xde−λx

=(∗) −{[xe−λx]∞

0 −e−λxdx}

=

∞

0 e−λxdx = − 1

λ

∞

0 de−λx

= − 1 λ [e−λx]∞

0 = 1

λ .

(∗) We used the integration by parts formula:

b

a f(x)dg(x) = [f(x)g(x)]b a −

b

a g(x)df(x),

which follows from [f(x)g(x)]′ = f ′(x)g(x)+f(x)g′(x).