[PPT] - Slide Set 5 CLRM: sample properties of OLS Pietro Coretto PowerPoint Presentation

SLIDE 1

Slide Set 5 CLRM: sample properties of OLS

Pietro Coretto pcoretto@unisa.it

Econometrics

Master in Economics and Finance (MEF) Università degli Studi di Napoli “Federico II”

Version: Saturday 28th December, 2019 (h16:05)

P. Coretto • MEF

CLRM: sample properties of OLS 1 / 16

By finite-sample we mean n < +∞, that is finite sample size Note: the terminology “finite sample” means a sample size n that is not “too large”. When n is “large enough”, the asymptotic regime takes over, but this will be the object of the second part of the course. We will investigate properties of both b and s2 A1–A4 are assumptions from previous slide sets

P. Coretto • MEF

CLRM: sample properties of OLS 2 / 16

Notes Notes

SLIDE 2

Finite sample properties of b

Proposition (unbiasedness of b) Assume A1, A2 and A3. Then E[b | X] = β, that is b is an unbiased estimator of β conditional on X. Proof. First note that by the linearity of expectations E[b | X] = β whenever E[b − β | X] = 0 Recall the decomposition of the estimation error b − β =

X′X −1X′ε

then E[b − β | X] = E

X′X −1X′ε | X

pull out what’s known and obtain

E[b − β | X] =

X′X −1X′ E [ε | X] = 0

P. Coretto • MEF

CLRM: sample properties of OLS 3 / 16

Note that this is a conditional unbiasedness statement, and it is stronger than the more traditional unconditional statement. This is stronger because E[b | X] = β = ⇒ E[E[b | X]] = E[b] = β

P. Coretto • MEF

CLRM: sample properties of OLS 4 / 16

Notes Notes

SLIDE 3

Proposition (variance-covariance of b) Assume A1, A2 and A3 and A4. Then the followings hold: (a) Var[b | X] = σ2(X′X)−1 (b) (Gauss-Markov theorem) Let b0 be any estimator of β that is linear in the y and unbiased. Then Var[b0 | X] Var[b | X] (c) Cov[b, e | X] = 0 Recall that means bigger or equal for certain matrices. Since of (b) the OLS estimator is also called BLUE = Best Linear Unbiased Estimator. Best here is in terms of mean square error (MSE). Since b is unbiased, its efficiency is equal to its variance.

P. Coretto • MEF

CLRM: sample properties of OLS 5 / 16

Proof of part (a). Since β is not a random variable, then its variance is zero, then Var[b | X] = Var[b − β | X] = Var[

X′X −1X′ε | X]

= Var[Aε | X], for A =

X′X −1X′

=AVar[ε | X]A′ =

X′X −1X′ Var[ε | X] X X′X −1

(∗) =

X′X −1X′ (σ2)In X X′X −1

=σ2X′X

−1 (X′X) X′X −1

=σ2X′X

−1

(*) this is because for a matrix B, we have that (B′)−1 = (B−1)′

P. Coretto • MEF

CLRM: sample properties of OLS 6 / 16

Notes Notes

SLIDE 4

Proof of part (b). First note that OLS is linear map of y. In fact, define A = (X′X)−1X′, then b = Ay. Define an alternative linear estimator b0 assuming that it exists b0 = Cy = (D + A)y with D = C − A b0 =(D + A)y =Dy + Ay =D(Xβ + ε) + b =DXβ + Dε + b By assumption b0 it is unbiased conditional on X, therefore E[b0 | X] = β.

P. Coretto • MEF

CLRM: sample properties of OLS 7 / 16

Therefore E[b0 | X] = E[ DXβ + Dε + b | X] =DXβ + E[Dε | X] + E[b | X] =DXβ + D E[ε | X] + β =DXβ + β The fact that b0 is unbiased implies DX = 0, hence b0 = Dε + b (5.1) Now substract β from both side of (5.1) and obtain the estimation error b0 b0 − β = Dε + (b − β) = (D + A)ε

P. Coretto • MEF

CLRM: sample properties of OLS 8 / 16

Notes Notes

SLIDE 5

In order to show that the OLS is BLUE we need to show that for any b0 (linear in y and unbiased) it holds true that Var[b0 | X] is bigger then Var[b | X] We need to show that Var[b0 | X] Var[b | X], that is Var[b0 | X] − Var[b | X] is PSD. Same trick as before: the variance of constant is zero, so we start from Var[b0 | X] = Var[(b0 − β) | X].

P. Coretto • MEF

CLRM: sample properties of OLS 9 / 16

Var[b0 | X] = Var[(b0 − β) | X] = Var[(D + A)ε | X] =(D + A) Var[ε | X] (D + A)′ =σ2(D + A)(D′ + A′) =σ2(DD′ + AD′ + DA′ + AA′) Now DA′ = DX

X′X −1 = 0

AA′ =

X′X −1X′X X′X −1 = X′X −1

Var[b0 | X] = σ2[DD′ +

X′X −1] = σ2DD′ + σ2X′X −1

P. Coretto • MEF

CLRM: sample properties of OLS 10 / 16

Notes Notes

SLIDE 6

Any variance covariance matrix is PSD, and the sum of PSD matrices is PSD, then DD′ is PSD. Finally Var[b0 | X] − Var[b | X] = σ2DD′ which is PSD. This completes the proof

P. Coretto • MEF

CLRM: sample properties of OLS 11 / 16

Proof of part (c). First note that e = Mε, in fact recall that MX = 0 and e = My = M(Xβ + ε) = MXβ + Mε = Mε Now work out the formula of covariance between vectors: Cov[b, ε | X] = E[(b − E[b|X])(e − E[e|X])′ | X] = E[

X′X −1X′ε(Mε)′ | X]

= E[

X′X −1X′εε′M | X]

=

X′X −1X′ E[εε′ | X]M

(pull out what’s known) =σ2X′X

−1X′M

Since X′M = (MX)′ = 0, then Cov[b, ε | X] = 0

P. Coretto • MEF

CLRM: sample properties of OLS 12 / 16

Notes Notes

SLIDE 7

Finite sample properties of s2

In order to do inference on β, we need the variance of b, but it depends on σ2 Since εi is recovered based on ei, the natural analog estimator of σ2 would be ˆ σ2 = 1 n

n

i=1

e2

i = e′e

n However, this is biased because residuals are a two stages estimates of εi. In fact, we need to pre-estimate β to get e. An unbiased estimator is obtained by correcting the biased one with the degrees of freedom as usual, that is s2 = 1 n − K

n

i=1

e2

i =

e′e n − K

P. Coretto • MEF

CLRM: sample properties of OLS 13 / 16

Proposition (unbiased estimation of σ2) Assume A1, A2, A3, A4, then s2 is an unbiased estimator of σ2. Proof. E[e′e | X] = E[(Mε)′Mε | X] = E[ε′M ′Mε | X] = E[ε′Mε | X] =

n

i=1

n

j=1

mij E[εiεj | X] (verify this!) =σ2

n

i=1

mii =σ2 tr(M)

P. Coretto • MEF

CLRM: sample properties of OLS 14 / 16

Notes Notes

SLIDE 8

tr(M) = tr(In − P ) = n − tr(P ) and tr(P ) = tr

X

X′X −1X′

= tr

X′X

X′X −1

= tr(Ik) = K Therefore E[e′e | X] = (n − K)σ2. And this proves that the analog estimator ˆ σ2 is biased, in fact E

ˆ

σ2 | X

= E

e′e

n

X
= n − K

n σ2 This also shows that s2 is unbiased, in fact E[s2 | X] = E

e′e

n − K

X
= n − K

n − K σ2 = σ2

P. Coretto • MEF

CLRM: sample properties of OLS 15 / 16

Estimated standard errors of b

We know that Var[b | X] = σ2X′X

−1

which depends on population quantities. We can estimate the variance matrix of b based on the plug-in principle:

Var (b) = s2X′X

−1

Let Se(bk) be the square root of kth element on the diagonal of Var (b), then Se(bk) is the standard error of the bk parameter estimate, that is Se(bk) = s

(X′X)−1

kk

P. Coretto • MEF

CLRM: sample properties of OLS 16 / 16