Gov 51: Expectation, Variance, and Sample Means Matthew Blackwell - - PowerPoint PPT Presentation

gov 51 expectation variance and sample means
SMART_READER_LITE
LIVE PREVIEW

Gov 51: Expectation, Variance, and Sample Means Matthew Blackwell - - PowerPoint PPT Presentation

Gov 51: Expectation, Variance, and Sample Means Matthew Blackwell Harvard University 1 / 13 Remember our goal Population Sample probability inference We want to learn about the chance process that generated our data. Last time:


slide-1
SLIDE 1

Gov 51: Expectation, Variance, and Sample Means

Matthew Blackwell

Harvard University

1 / 13

slide-2
SLIDE 2

Remember our goal

Population Sample probability inference

  • We want to learn about the chance process that generated our data.
  • Last time: entire probability distributions. Is there something simpler?

2 / 13

slide-3
SLIDE 3

How can we summarize distributions?

  • Two numerical summaries of the distribution are useful.
  • 1. Mean/expectaion: where the center of the distribution is.
  • 2. Variance/standard deviation: how spread out the distribution is around

the center.

  • These are population parameters so we don’t get to observe them.
  • We won’t get to observe them…
  • but we’ll use our sample to learn about them

3 / 13

slide-4
SLIDE 4

Two ways to calculate averages

  • Calculate the average of: {𝟤, 𝟤, 𝟤, 𝟦, 𝟧, 𝟧, 𝟨, 𝟨}

𝟤 + 𝟤 + 𝟤 + 𝟦 + 𝟧 + 𝟧 + 𝟨 + 𝟨 𝟫 = 𝟦

  • Alternative way to calculate average based on frequency weights:

𝟤 × 𝟦 𝟫 + 𝟦 × 𝟤 𝟫 + 𝟧 × 𝟥 𝟫 + 𝟨 × 𝟥 𝟫 = 𝟦

  • Each value times how often that value occurs in the data.
  • We’ll use this intuition to create an average/mean for r.v.s.

4 / 13

slide-5
SLIDE 5

Expectation

  • We write 𝔽(𝘠) for the mean of an r.v. 𝘠.
  • For discrete 𝘠 ∈ {𝘺𝟤, 𝘺𝟥, … , 𝘺𝘭} with 𝘭 levels:

𝔽[𝘠] =

𝘭

𝘬=𝟤

𝘺𝘬ℙ(𝘠 = 𝘺𝘬)

  • Weighted average of the values of the r.v. weighted by the probability of

each value occurring.

  • If 𝘠 is age of randomly selected registered voter, then 𝔽(𝘠) is the

average age in the population of registered voters.

  • Notation notes:
  • Lots of other ways to refer to this: expectation or expected value
  • Often called the population mean to distinguish from the sample mean.

5 / 13

slide-6
SLIDE 6

Properties of the expected value

  • We use properties of 𝔽(𝘠) to avoid using the formula every time.
  • Let 𝘠 and 𝘡 be r.v.s and 𝘣 and 𝘤 be constants.
  • 1. 𝔽(𝘣) = 𝘣
  • Constants don’t vary.
  • 2. 𝔽(𝘣𝘠) = 𝘣𝔽(𝘠)
  • Suppose 𝘠 is income in dollars, income in $10k is just: 𝘠/𝟤𝟣𝟣𝟣𝟣
  • Mean of this new variable is mean of income in dollars divided by 10,000.
  • 3. 𝔽(𝘣𝘠 + 𝘤𝘡 ) = 𝘣𝔽(𝘠) + 𝘤𝔽(𝘡 )
  • Expectations can be distributed across sums.
  • 𝘠 is partner 1’s income, 𝘡 is partner 2’s income.
  • Mean household income is the sum of the each partner’s income.

6 / 13

slide-7
SLIDE 7

Variance

  • The variance measures the spread of the distribution:

𝕎[𝘠] = 𝔽[(𝘠 − 𝔽[𝘠])𝟥]

  • Weighted average of the squared distances from the mean.
  • Larger deviations (+ or −) ⇝ higher variance
  • If 𝘠 is the age of a randomly selected registered voter, 𝕎[𝘠] is the

usual sample variance of age in the population.

  • Sometimes called population variance to contrast with sample variance.
  • Standard deviation: square root of the variance: 𝘛𝘌(𝘠) = √𝕎[𝘠].
  • Useful because it’s on the scale of the original variable.

7 / 13

slide-8
SLIDE 8

Properties of variances

  • Some properties of variance useful for calculation.
  • 1. If 𝘤 is a constant, then 𝕎[𝘤] = 𝟣.
  • 2. If 𝘣 and 𝘤 are constants, 𝕎[𝘣𝘠 + 𝘤] = 𝘣𝟥𝕎[𝘠].
  • 3. In general, 𝕎[𝘠 + 𝘡 ] ≠ 𝕎[𝘠] + 𝕎[𝘡 ].
  • If 𝘠 and 𝘡 are independent, then 𝕎[𝘠 + 𝘡 ] = 𝕎[𝘠] + 𝕎[𝘡 ]

8 / 13

slide-9
SLIDE 9

Sums and means are random variables

  • If 𝘠𝟤 and 𝘠𝟥 are r.v.s, then 𝘠𝟤 + 𝘠𝟥 is a r.v.
  • Has a mean 𝔽[𝘠𝟤 + 𝘠𝟥] and a variance 𝕎[𝘠𝟤 + 𝘠𝟥]
  • The sample mean is a function of sums and so it is a r.v. too:

𝘠 = 𝘠𝟤 + 𝘠𝟥 𝟥

  • Example: the average age of two randomly selected respondents.

9 / 13

slide-10
SLIDE 10

Distribution of sums/means

𝘠𝟤 𝘠𝟥 𝘠𝟤 + 𝘠𝟥 𝘠

draw 1 44 32 76 38 draw 2 27 50 77 38.5 draw 3 34 48 82 41 draw 4 68 28 96 48

⋮ ⋮ ⋮ ⋮ ⋮

distribution

  • f the sum

distribution

  • f the mean

10 / 13

slide-11
SLIDE 11

Independent and identical r.v.s

  • Independent and identically distributed r.v.s, 𝘠𝟤, … , 𝘠𝘰
  • Random sample of 𝘰 respondents on a survey question.
  • Written “i.i.d.”
  • Independent: value that 𝘠𝘫 takes doesn’t afgect distribution of 𝘠𝘬
  • Identically distributed: distribution of 𝘠𝘫 is the same for all 𝘫
  • 𝔽(𝘠𝟤) = 𝔽(𝘠𝟥) = ⋯ = 𝔽(𝘠𝘰) = 𝜈
  • 𝕎(𝘠𝟤) = 𝕎(𝘠𝟥) = ⋯ = 𝕎(𝘠𝘰) = 𝜏 𝟥

11 / 13

slide-12
SLIDE 12

Distribution of the sample mean

  • Sample mean of i.i.d. random variables:

𝘠 𝘰 = 𝘠𝟤 + 𝘠𝟥 + ⋯ + 𝘠𝘰 𝘰

  • 𝘠 𝘰 is a random variable, what is its distribution?
  • What is the expectation of this distribution, 𝔽[𝘠 𝘰]?
  • What is the variance of this distribution, 𝕎[𝘠 𝘰]?

12 / 13

slide-13
SLIDE 13

Properties of the sample mean

Mean and variance of the sample mean

Suppose that 𝘠𝟤, … , 𝘠𝘰 are i.i.d. r.v.s with 𝔽[𝘠𝘫] = 𝜈 and 𝕎[𝘠𝘫] = 𝜏 𝟥. Then:

𝔽[𝘠 𝘰] = 𝜈 𝕎[𝘠 𝘰] = 𝜏 𝟥 𝘰

  • Key insights:
  • Sample mean is on average equal to the population mean
  • Variance of 𝘠 𝘰 depends on the population variance of 𝘠𝘫 and the

sample size

  • Standard deviation of the sample mean is called its standard error:

𝘛𝘍 = √𝕎[𝘠 𝘰] = 𝜏 √𝘰

13 / 13