Gov 2000: 10. Multiple Regression in Matrix Form Matthew Blackwell - - PowerPoint PPT Presentation

โ–ถ
gov 2000 10 multiple regression in matrix form
SMART_READER_LITE
LIVE PREVIEW

Gov 2000: 10. Multiple Regression in Matrix Form Matthew Blackwell - - PowerPoint PPT Presentation

Gov 2000: 10. Multiple Regression in Matrix Form Matthew Blackwell Fall 2016 1 / 64 1. Matrix algebra review 2. Matrix Operations 3. Linear model in matrix form 4. OLS in matrix form 5. OLS inference in matrix form 2 / 64 Where are we?


slide-1
SLIDE 1

Gov 2000: 10. Multiple Regression in Matrix Form

Matthew Blackwell

Fall 2016

1 / 64

slide-2
SLIDE 2
  • 1. Matrix algebra review
  • 2. Matrix Operations
  • 3. Linear model in matrix form
  • 4. OLS in matrix form
  • 5. OLS inference in matrix form

2 / 64

slide-3
SLIDE 3

Where are we? Where are we going?

  • Last few weeks: regression estimation and inference with one

and two independent variables, varying efgects

  • This week: the general regression model with arbitrary

covariates

  • Next week: what happens when assumptions are wrong

3 / 64

slide-4
SLIDE 4

Nunn & Wantchekon

  • Are there long-term, persistent efgects of slave trade on

Africans today?

  • Basic idea: compare levels of interpersonal trust (๐‘๐‘—) across

difgerent levels of historical slave exports for a respondentโ€™s ethnic group

  • Problem: ethnic groups and respondents might difger in their

interpersonal trust in ways that correlate with the severity of slave exports

  • One solution: try to control for relevant difgerences between

groups via multiple regression

4 / 64

slide-5
SLIDE 5

Nunn & Wantchekon

  • Whaaaaa?

Bold letter, quotation marks, what is this?

  • Todayโ€™s goal is to decipher this type of writing

5 / 64

slide-6
SLIDE 6

Multiple Regression in R

nunn <- foreign::read.dta("../data/Nunn_Wantchekon_AER_2011.dta") mod <- lm(trust_neighbors ~ exports + age + male + urban_dum + malaria_ecology, data = nunn) summary(mod) ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.5030370 0.0218325 68.84 <2e-16 *** ## exports

  • 0.0010208

0.0000409

  • 24.94

<2e-16 *** ## age 0.0050447 0.0004724 10.68 <2e-16 *** ## male 0.0278369 0.0138163 2.01 0.044 * ## urban_dum

  • 0.2738719

0.0143549

  • 19.08

<2e-16 *** ## malaria_ecology 0.0194106 0.0008712 22.28 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.978 on 20319 degrees of freedom ## (1497 observations deleted due to missingness) ## Multiple R-squared: 0.0604, Adjusted R-squared: 0.0602 ## F-statistic: 261 on 5 and 20319 DF, p-value: <2e-16

6 / 64

slide-7
SLIDE 7

Why matrices and vectors?

7 / 64

slide-8
SLIDE 8

8 / 64

slide-9
SLIDE 9

Why matrices and vectors?

  • Hereโ€™s one way to write the full multiple regression model:

๐‘ง๐‘— = ๐›พ0 + ๐‘ฆ๐‘—1๐›พ1 + ๐‘ฆ๐‘—2๐›พ2 + โ‹ฏ + ๐‘ฆ๐‘—๐‘™๐›พ๐‘™ + ๐‘ฃ๐‘—

  • Notation is going to get needlessly messy as we add variables.
  • Matrices are clean, but they are like a foreign language.
  • You need to build intuitions over a long period of time.

9 / 64

slide-10
SLIDE 10

Quick note about interpretation

๐‘ง๐‘— = ๐›พ0 + ๐‘ฆ๐‘—1๐›พ1 + ๐‘ฆ๐‘—2๐›พ2 + โ‹ฏ + ๐‘ฆ๐‘—๐‘™๐›พ๐‘™ + ๐‘ฃ๐‘—

  • In this model, ๐›พ1 is the efgect of a one-unit change in ๐‘ฆ๐‘—1

conditional on all other ๐‘ฆ๐‘—๐‘˜.

  • Jargon โ€œpartial efgect,โ€ โ€œceteris paribus,โ€ โ€œall else equal,โ€

โ€œconditional on the covariates,โ€ etc

  • Notation change: lower-case letters here are random variables.

10 / 64

slide-11
SLIDE 11

1/ Matrix algebra review

11 / 64

slide-12
SLIDE 12

Vectors

  • A vector is just list of numbers (or random variables).
  • A 1 ร— ๐‘™ row vector has these numbers arranged in a row:

๐œ = [ ๐‘1 ๐‘2 ๐‘3 โ‹ฏ ๐‘๐‘™ ]

  • A ๐‘™ ร— 1 column vector arranges the numbers in a column:

๐› = โŽก โŽข โŽข โŽข โŽฃ ๐‘1 ๐‘2 โ‹ฎ ๐‘๐‘™ โŽค โŽฅ โŽฅ โŽฅ โŽฆ

  • Convention weโ€™ll assume that a vector is column vector and

vectors will be written with lowercase bold lettering (๐œ)

12 / 64

slide-13
SLIDE 13

Vector examples

  • Vector of all covariates for a particular unit ๐‘—:

๐ฒ๐‘— = โŽก โŽข โŽข โŽข โŽข โŽข โŽฃ 1 ๐‘ฆ๐‘—1 ๐‘ฆ๐‘—2 โ‹ฎ ๐‘ฆ๐‘—๐‘™ โŽค โŽฅ โŽฅ โŽฅ โŽฅ โŽฅ โŽฆ

  • For the Nunn-Wantchekon data, we might have:

๐ฒ๐‘— = โŽก โŽข โŽข โŽข โŽฃ 1 exports๐‘— age๐‘— male๐‘— โŽค โŽฅ โŽฅ โŽฅ โŽฆ

13 / 64

slide-14
SLIDE 14

Matrices

  • A matrix is just a rectangular array of numbers.
  • We say that a matrix is ๐‘œ ร— ๐‘™ (โ€œ๐‘œ by ๐‘™โ€) if it has ๐‘œ rows and ๐‘™

columns.

  • Uppercase bold denotes a matrix:

๐ = โŽก โŽข โŽข โŽข โŽฃ ๐‘11 ๐‘12 โ‹ฏ ๐‘1๐‘™ ๐‘21 ๐‘22 โ‹ฏ ๐‘2๐‘™ โ‹ฎ โ‹ฎ โ‹ฑ โ‹ฎ ๐‘๐‘œ1 ๐‘๐‘œ2 โ‹ฏ ๐‘๐‘œ๐‘™ โŽค โŽฅ โŽฅ โŽฅ โŽฆ

  • Generic entry: ๐‘๐‘—๐‘˜ where this is the entry in row ๐‘— and column ๐‘˜

14 / 64

slide-15
SLIDE 15

Examples of matrices

  • One example of a matrix that weโ€™ll use a lot is the design

matrix, which has a column of ones, and then each of the subsequent columns is each independent variable in the regression. ๐˜ = โŽก โŽข โŽข โŽข โŽฃ 1 exports1 age1 male1 1 exports2 age2 male2 โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ 1 exports๐‘œ age๐‘œ male๐‘œ โŽค โŽฅ โŽฅ โŽฅ โŽฆ

15 / 64

slide-16
SLIDE 16

Design matrix in R

head(model.matrix(mod), 8) ## (Intercept) exports age male urban_dum malaria_ecology ## 1 1 855 40 28.15 ## 2 1 855 25 1 28.15 ## 3 1 855 38 1 1 28.15 ## 4 1 855 37 1 28.15 ## 5 1 855 31 1 28.15 ## 6 1 855 45 28.15 ## 7 1 855 20 1 28.15 ## 8 1 855 31 28.15 dim(model.matrix(mod)) ## [1] 20325 6

16 / 64

slide-17
SLIDE 17

2/ Matrix Operations

17 / 64

slide-18
SLIDE 18

Transpose

  • The transpose of a matrix ๐ is the matrix created by

switching the rows and columns of the data and is denoted ๐โ€ฒ.

  • ๐‘™th column of ๐ becomes the ๐‘™th row of ๐โ€ฒ:

๐ = โŽก โŽข โŽข โŽฃ ๐‘11 ๐‘12 ๐‘21 ๐‘22 ๐‘31 ๐‘32 โŽค โŽฅ โŽฅ โŽฆ ๐โ€ฒ = [ ๐‘11 ๐‘21 ๐‘31 ๐‘12 ๐‘22 ๐‘32 ]

  • If ๐ is ๐‘œ ร— ๐‘™, then ๐โ€ฒ will be ๐‘™ ร— ๐‘œ.
  • Also written ๐๐”

18 / 64

slide-19
SLIDE 19

Transposing vectors

  • Transposing will turn a ๐‘™ ร— 1 column vector into a 1 ร— ๐‘™ row

vector and vice versa: ๐ฒ๐‘— = โŽก โŽข โŽข โŽข โŽข โŽข โŽฃ 1 ๐‘ฆ๐‘—1 ๐‘ฆ๐‘—2 โ‹ฎ ๐‘ฆ๐‘—๐‘™ โŽค โŽฅ โŽฅ โŽฅ โŽฅ โŽฅ โŽฆ ๐ฒโ€ฒ

๐‘— = [ 1

๐‘ฆ๐‘—1 ๐‘ฆ๐‘—2 โ‹ฏ ๐‘ฆ๐‘—๐‘™ ]

19 / 64

slide-20
SLIDE 20

Transposing in R

a <- matrix(1:6, ncol = 3, nrow = 2) a ## [,1] [,2] [,3] ## [1,] 1 3 5 ## [2,] 2 4 6 t(a) ## [,1] [,2] ## [1,] 1 2 ## [2,] 3 4 ## [3,] 5 6

20 / 64

slide-21
SLIDE 21

Write matrices as vectors

  • A matrix is just a collection of vectors (row or column)
  • As a row vector:

๐ = [ ๐‘11 ๐‘12 ๐‘13 ๐‘21 ๐‘22 ๐‘23 ] = [ ๐›โ€ฒ

1

๐›โ€ฒ

2

] with row vectors ๐›โ€ฒ

1 = [ ๐‘11

๐‘12 ๐‘13 ] ๐›โ€ฒ

2 = [ ๐‘21

๐‘22 ๐‘23 ]

  • Or we can defjne it in terms of column vectors:

๐‚ = โŽก โŽข โŽข โŽฃ ๐‘11 ๐‘12 ๐‘21 ๐‘22 ๐‘31 ๐‘32 โŽค โŽฅ โŽฅ โŽฆ = [ ๐œ๐Ÿ ๐œ๐Ÿ‘ ] where ๐œ๐Ÿ and ๐œ๐Ÿ‘ represent the columns of ๐‚.

  • ๐‘˜ subscripts columns of a matrix: ๐ฒ๐‘˜
  • ๐‘— and ๐‘ข will be used for rows ๐ฒโ€ฒ

๐‘—.

21 / 64

slide-22
SLIDE 22

Design matrix

  • Design matrix as a series of row vectors:

๐˜ = โŽก โŽข โŽข โŽข โŽฃ 1 exports1 age1 male1 1 exports2 age2 male2 โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ 1 exports๐‘œ age๐‘œ male๐‘œ โŽค โŽฅ โŽฅ โŽฅ โŽฆ = โŽก โŽข โŽข โŽข โŽฃ ๐ฒโ€ฒ

1

๐ฒโ€ฒ

2

โ‹ฎ ๐ฒโ€ฒ

๐‘œ

โŽค โŽฅ โŽฅ โŽฅ โŽฆ

  • Design matrix as a series of column vectors:

๐˜ = [ ๐Ÿ ๐ฒ1 ๐ฒ2 โ‹ฏ ๐ฒ๐‘™ ]

22 / 64

slide-23
SLIDE 23

Addition and subtraction

  • How do we add or subtract matrices and vectors?
  • First, the matrices/vectors need to be comformable, meaning

that the dimensions have to be the same.

  • Let ๐ and ๐‚ both be 2 ร— 2 matrices. Then, let ๐ƒ = ๐ + ๐‚,

where we add each cell together:

๐ + ๐‚ = [ ๐‘11 ๐‘12 ๐‘21 ๐‘22 ] + [ ๐‘11 ๐‘12 ๐‘21 ๐‘22 ] = [ ๐‘11 + ๐‘11 ๐‘12 + ๐‘12 ๐‘21 + ๐‘21 ๐‘22 + ๐‘22 ] = [ ๐‘‘11 ๐‘‘12 ๐‘‘21 ๐‘‘22 ] = ๐ƒ

23 / 64

slide-24
SLIDE 24

Scalar multiplication

  • A scalar is just a single number: you can think of it sort of

like a 1 by 1 matrix.

  • When we multiply a scalar by a matrix, we just multiply each

element/cell by that scalar:

๐‘๐ = ๐‘ [ ๐‘11 ๐‘12 ๐‘21 ๐‘22 ] = [ ๐‘ ร— ๐‘11 ๐‘ ร— ๐‘12 ๐‘ ร— ๐‘21 ๐‘ ร— ๐‘22 ]

24 / 64

slide-25
SLIDE 25

3/ Linear model in matrix form

25 / 64

slide-26
SLIDE 26

The linear model with new notation

  • Remember that we wrote the linear model as the following for

all ๐‘— โˆˆ {1, โ€ฆ , ๐‘œ}: ๐‘ง๐‘— = ๐›พ0 + ๐‘ฆ๐‘—๐›พ1 + ๐‘จ๐‘—๐›พ2 + ๐‘ฃ๐‘—

  • Imagine we had an ๐‘œ of 4. We could write out each formula:

๐‘ง1 = ๐›พ0 + ๐‘ฆ1๐›พ1 + ๐‘จ1๐›พ2 + ๐‘ฃ1 (unit 1) ๐‘ง2 = ๐›พ0 + ๐‘ฆ2๐›พ1 + ๐‘จ2๐›พ2 + ๐‘ฃ2 (unit 2) ๐‘ง3 = ๐›พ0 + ๐‘ฆ3๐›พ1 + ๐‘จ3๐›พ2 + ๐‘ฃ3 (unit 3) ๐‘ง4 = ๐›พ0 + ๐‘ฆ4๐›พ1 + ๐‘จ4๐›พ2 + ๐‘ฃ4 (unit 4)

26 / 64

slide-27
SLIDE 27

The linear model with new notation

๐‘ง1 = ๐›พ0 + ๐‘ฆ1๐›พ1 + ๐‘จ1๐›พ2 + ๐‘ฃ1 (unit 1) ๐‘ง2 = ๐›พ0 + ๐‘ฆ2๐›พ1 + ๐‘จ2๐›พ2 + ๐‘ฃ2 (unit 2) ๐‘ง3 = ๐›พ0 + ๐‘ฆ3๐›พ1 + ๐‘จ3๐›พ2 + ๐‘ฃ3 (unit 3) ๐‘ง4 = ๐›พ0 + ๐‘ฆ4๐›พ1 + ๐‘จ4๐›พ2 + ๐‘ฃ4 (unit 4)

  • We can write this as:

โŽก โŽข โŽข โŽข โŽฃ ๐‘ง1 ๐‘ง2 ๐‘ง3 ๐‘ง4 โŽค โŽฅ โŽฅ โŽฅ โŽฆ = โŽก โŽข โŽข โŽข โŽฃ 1 1 1 1 โŽค โŽฅ โŽฅ โŽฅ โŽฆ ๐›พ0 + โŽก โŽข โŽข โŽข โŽฃ ๐‘ฆ1 ๐‘ฆ2 ๐‘ฆ3 ๐‘ฆ4 โŽค โŽฅ โŽฅ โŽฅ โŽฆ ๐›พ1 + โŽก โŽข โŽข โŽข โŽฃ ๐‘จ1 ๐‘จ2 ๐‘จ3 ๐‘จ4 โŽค โŽฅ โŽฅ โŽฅ โŽฆ ๐›พ2 + โŽก โŽข โŽข โŽข โŽฃ ๐‘ฃ1 ๐‘ฃ2 ๐‘ฃ3 ๐‘ฃ4 โŽค โŽฅ โŽฅ โŽฅ โŽฆ

  • Outcome is a linear combination of the the ๐ฒ, ๐ด, and ๐ฏ vectors

27 / 64

slide-28
SLIDE 28

Grouping things into matrices

  • Can we write this in a more compact form? Yes! Let ๐˜ and ๐œธ

be the following: ๐˜

(4ร—3) =

โŽก โŽข โŽข โŽข โŽฃ 1 ๐‘ฆ1 ๐‘จ1 1 ๐‘ฆ2 ๐‘จ2 1 ๐‘ฆ3 ๐‘จ3 1 ๐‘ฆ4 ๐‘จ4 โŽค โŽฅ โŽฅ โŽฅ โŽฆ ๐œธ

(3ร—1)

= โŽก โŽข โŽข โŽฃ ๐›พ0 ๐›พ1 ๐›พ2 โŽค โŽฅ โŽฅ โŽฆ

28 / 64

slide-29
SLIDE 29

Matrix multiplication by a vector

  • We can write this more compactly as a matrix

(post-)multiplied by a vector: โŽก โŽข โŽข โŽข โŽฃ 1 1 1 1 โŽค โŽฅ โŽฅ โŽฅ โŽฆ ๐›พ0 + โŽก โŽข โŽข โŽข โŽฃ ๐‘ฆ1 ๐‘ฆ2 ๐‘ฆ3 ๐‘ฆ4 โŽค โŽฅ โŽฅ โŽฅ โŽฆ ๐›พ1 + โŽก โŽข โŽข โŽข โŽฃ ๐‘จ1 ๐‘จ2 ๐‘จ3 ๐‘จ4 โŽค โŽฅ โŽฅ โŽฅ โŽฆ ๐›พ2 = ๐˜๐œธ

  • Multiplication of a matrix by a vector is just the linear

combination of the columns of the matrix with the vector elements as weights/coeffjcients.

  • And the left-hand side here only uses scalars times vectors,

which is easy!

29 / 64

slide-30
SLIDE 30

General matrix by vector multiplication

  • ๐ is a ๐‘œ ร— ๐‘™ matrix
  • ๐œ is a ๐‘™ ร— 1 column vector
  • Columns of ๐ have to match rows of ๐œ
  • Let ๐›๐‘˜ be the ๐‘˜th column of ๐ต. Then we can write:

๐

(๐‘œร—1) = ๐๐œ = ๐‘1๐›1 + ๐‘2๐›2 + โ‹ฏ + ๐‘๐‘™๐›๐‘™

  • ๐ is linear combination of the columns of ๐

30 / 64

slide-31
SLIDE 31

Back to regression

  • ๐˜ is the ๐‘œ ร— (๐‘™ + 1) design matrix of independent variables
  • ๐œธ be the (๐‘™ + 1) ร— 1 column vector of coeffjcients.
  • ๐˜๐œธ will be ๐‘œ ร— 1:

๐˜๐œธ = ๐›พ0 + ๐›พ1๐ฒ1 + ๐›พ2๐ฒ2 + โ‹ฏ + ๐›พ๐‘™๐ฒ๐‘™

  • Thus, we can compactly write the linear model as the

following: ๐ณ

(๐‘œร—1) = ๐˜๐œธ (๐‘œร—1)

+ ๐ฏ

(๐‘œร—1)

31 / 64

slide-32
SLIDE 32

Inner product

  • The inner (or dot) product of a two column vectors ๐› and ๐œ

(of equal dimension, ๐‘™ ร— 1): โŸจ๐›, ๐œโŸฉ = ๐›โ€ฒ๐œ = ๐‘1๐‘1 + ๐‘2๐‘2 + โ‹ฏ + ๐‘๐‘™๐‘๐‘™

  • If ๐›โ€ฒ๐œ = 0 we say that the two vectors are orthogonal.
  • With ๐ = ๐๐œ, we can write the entries of ๐ as inner products:

๐‘‘๐‘— = ๐›โ€ฒ

๐‘—๐œ

  • If ๐ฒโ€ฒ

๐‘— is the ๐‘—th row of ๐˜, then we write the linear model as:

๐‘ง๐‘— = ๐ฒโ€ฒ

๐‘—๐œธ + ๐‘ฃ๐‘—

= ๐›พ0 + ๐‘ฆ๐‘—1๐›พ1 + ๐‘ฆ๐‘—2๐›พ2 + โ‹ฏ + ๐‘ฆ๐‘—๐‘™๐›พ๐‘™ + ๐‘ฃ๐‘—

32 / 64

slide-33
SLIDE 33

4/ OLS in matrix form

33 / 64

slide-34
SLIDE 34

Matrix multiplication

  • What if, instead of a column vector ๐‘, we have a matrix ๐‚

with dimensions ๐‘™ ร— ๐‘›.

  • How do we do multiplication like so ๐ƒ = ๐๐‚?
  • Each column of the new matrix is just matrix by vector

multiplication: ๐ƒ = [๐1 ๐2 โ‹ฏ ๐๐‘›] ๐๐‘˜ = ๐๐œ๐‘˜

  • Thus, each column of ๐ƒ is a linear combination of the

columns of ๐.

34 / 64

slide-35
SLIDE 35

Properties of matrix multiplication

  • Matrix multiplication is not commutative: ๐๐‚ โ‰  ๐‚๐
  • It is associative and distributive:

๐(๐‚๐ƒ) = (๐๐‚)๐ƒ ๐(๐‚ + ๐ƒ) = ๐๐‚ + ๐๐ƒ

  • The transpose: (๐๐‚)โ€ฒ = ๐‚โ€ฒ๐โ€ฒ

35 / 64

slide-36
SLIDE 36

Square matrices and the diagonal

  • A square matrix has equal numbers of rows and columns.
  • The identity matrix, ๐‰๐‘™ is a ๐‘™ ร— ๐‘™ square matrix, with 1s along

the diagonal and 0s everywhere else. ๐‰3 = โŽก โŽข โŽข โŽฃ 1 1 1 โŽค โŽฅ โŽฅ โŽฆ

  • The ๐‘™ ร— ๐‘™ identity matrix multiplied by any ๐‘› ร— ๐‘™ matrix

returns the matrix: ๐๐‰๐‘™ = ๐

36 / 64

slide-37
SLIDE 37

Identity matrix

  • To get the diagonal of a matrix in R, use the diag() function:

b <- matrix(1:4, nrow = 2, ncol = 2) b ## [,1] [,2] ## [1,] 1 3 ## [2,] 2 4 diag(b) ## [1] 1 4

  • diag() also creates identity matrices in R:

diag(3) ## [,1] [,2] [,3] ## [1,] 1 ## [2,] 1 ## [3,] 1

37 / 64

slide-38
SLIDE 38

Multiple linear regression in matrix form

  • Let ฬ‚

๐œธ be the matrix of estimated regression coeffjcients and ฬ‚ ๐ณ be the vector of fjtted values: ฬ‚ ๐œธ = โŽก โŽข โŽข โŽข โŽฃ ฬ‚ ๐›พ0 ฬ‚ ๐›พ1 โ‹ฎ ฬ‚ ๐›พ๐‘™ โŽค โŽฅ โŽฅ โŽฅ โŽฆ ฬ‚ ๐ณ = ๐˜ ฬ‚ ๐œธ

  • It might be helpful to see this again more written out:

ฬ‚ ๐ณ = โŽก โŽข โŽข โŽข โŽฃ ฬ‚ ๐‘ง1 ฬ‚ ๐‘ง2 โ‹ฎ ฬ‚ ๐‘ง๐‘œ โŽค โŽฅ โŽฅ โŽฅ โŽฆ = ๐˜ ฬ‚ ๐œธ = โŽก โŽข โŽข โŽข โŽฃ 1 ฬ‚ ๐›พ0 + ๐‘ฆ11 ฬ‚ ๐›พ1 + ๐‘ฆ12 ฬ‚ ๐›พ2 + โ‹ฏ + ๐‘ฆ1๐‘™ ฬ‚ ๐›พ๐‘™ 1 ฬ‚ ๐›พ0 + ๐‘ฆ21 ฬ‚ ๐›พ1 + ๐‘ฆ22 ฬ‚ ๐›พ2 + โ‹ฏ + ๐‘ฆ2๐‘™ ฬ‚ ๐›พ๐‘™ โ‹ฎ 1 ฬ‚ ๐›พ0 + ๐‘ฆ๐‘œ1 ฬ‚ ๐›พ1 + ๐‘ฆ๐‘œ2 ฬ‚ ๐›พ2 + โ‹ฏ + ๐‘ฆ๐‘œ๐‘™ ฬ‚ ๐›พ๐‘™ โŽค โŽฅ โŽฅ โŽฅ โŽฆ

38 / 64

slide-39
SLIDE 39

Residuals

  • We can easily write the residuals in matrix form:

ฬ‚ ๐ฏ = ๐ณ โˆ’ ๐˜ ฬ‚ ๐œธ

  • The norm or length of a vector generalizes Euclidean distance

and is just the square root of the squared entries, โ€–๐›โ€– = โˆš๐‘2

1 + ๐‘2 2 + โ‹ฏ + ๐‘2 ๐‘™

  • We can write the norm in terms of inner product: โ€–๐›โ€–2 = ๐›โ€ฒ๐›
  • Thus we can compactly write the sum of the squared residuals

as: โ€– ฬ‚ ๐ฏโ€–2 = ฬ‚ ๐ฏโ€ฒ ฬ‚ ๐ฏ =

๐‘œ

โˆ‘

๐‘—=1

ฬ‚ ๐‘ฃ2

๐‘—

39 / 64

slide-40
SLIDE 40

OLS estimator in matrix form

  • OLS still minimizes sum of the squared residuals

arg min

๐œโˆˆโ„๐‘™+1 โ€– ฬ‚

๐ฏโ€–2 = arg min

๐œโˆˆโ„๐‘™+1 โ€–๐ณ โˆ’ ๐˜๐œโ€–2

  • Take (matrix) derivatives, set equal to 0
  • Resulting fjrst order conditions:

๐˜โ€ฒ(๐ณ โˆ’ ๐˜ ฬ‚ ๐œธ) = 0

  • Rearranging:

๐˜โ€ฒ๐˜ ฬ‚ ๐œธ = ๐˜โ€ฒ๐ณ

  • In order to isolate ฬ‚

๐œธ, we need to move the ๐˜โ€ฒ๐˜ term to the

  • ther side of the equals sign.
  • Weโ€™ve learned about matrix multiplication, but what about

matrix โ€œdivisionโ€?

40 / 64

slide-41
SLIDE 41

Scalar inverses

  • What is division in its simplest form? 1

๐‘ is the value such that

๐‘ 1

๐‘ = 1:

  • For some algebraic expression: ๐‘๐‘ฃ = ๐‘, letโ€™s solve for ๐‘ฃ:

1 ๐‘๐‘๐‘ฃ = 1 ๐‘๐‘ ๐‘ฃ = ๐‘ ๐‘

  • Need a matrix version of this: 1

๐‘.

41 / 64

slide-42
SLIDE 42

Matrix inverses

  • Defjnition If it exists, the inverse of square matrix ๐, denoted

๐โˆ’1, is the matrix such that ๐โˆ’1๐ = ๐‰.

  • We can use the inverse to solve (systems of) equations:

๐๐ฏ = ๐œ ๐โˆ’๐Ÿ๐๐ฏ = ๐โˆ’๐Ÿ๐œ ๐‰๐ฏ = ๐โˆ’๐Ÿ๐œ ๐ฏ = ๐โˆ’๐Ÿ๐œ

  • If the inverse exists, we say that ๐ is invertible or nonsingular.

42 / 64

slide-43
SLIDE 43

Back to OLS

  • Letโ€™s assume, for now, that the inverse of ๐˜โ€ฒ๐˜ exists (weโ€™ll

come back to this)

  • Then we can write the OLS estimator as the following:

ฬ‚ ๐œธ = (๐˜โ€ฒ๐˜)โˆ’1๐˜โ€ฒ๐ณ

  • Memorize this: โ€œex prime ex inverse ex prime yโ€ sear it into

your soul.

43 / 64

slide-44
SLIDE 44

Understanding check

  • Suppose ๐ณ is ๐‘œ ร— 1 and ๐˜ is ๐‘œ ร— (๐‘™ + 1).
  • What are the dimensions of ๐˜โ€ฒ๐˜?
  • True/False: ๐˜โ€ฒ๐˜ is symmetric.

โ–ถ Note: A square matrix is symmetric if ๐ = ๐โ€ฒ.

  • What are the dimensions of (๐˜โ€ฒ๐˜)โˆ’1?
  • What are the dimensions of ๐˜โ€ฒ๐ณ?
  • What are the dimensions of ฬ‚

๐œธ?

44 / 64

slide-45
SLIDE 45

Implications of OLS

  • We can generalize some mechanical results about OLS.
  • The independent variables are orthogonal to the residuals:

๐˜โ€ฒ ฬ‚ ๐ฏ = ๐˜โ€ฒ(๐ณ โˆ’ ๐˜ ฬ‚ ๐œธ) = 0

  • The fjtted values are orthogonal to the residuals:

ฬ‚ ๐ณโ€ฒ ฬ‚ ๐ฏ = (๐˜ ฬ‚ ๐œธ)โ€ฒ ฬ‚ ๐ฏ = ฬ‚ ๐œธโ€ฒ๐˜โ€ฒ ฬ‚ ๐ฏ = 0

45 / 64

slide-46
SLIDE 46

OLS by hand in R

ฬ‚ ๐œธ = (๐˜โ€ฒ๐˜)โˆ’1๐˜โ€ฒ๐ณ

  • First we need to get the design matrix and the response:

X <- model.matrix(trust_neighbors ~ exports + age + male + urban_dum + malaria_ecology, data = nunn) dim(X) ## [1] 20325 6 ## model.frame always puts the response in the first column y <- model.frame(trust_neighbors ~ exports + age + male + urban_dum + malaria_ecology, data = nunn)[,1] length(y) ## [1] 20325

46 / 64

slide-47
SLIDE 47

OLS by hand in R

ฬ‚ ๐œธ = (๐˜โ€ฒ๐˜)โˆ’1๐˜โ€ฒ๐ณ

  • Use the solve() for inverses and %*% for matrix

multiplication:

solve(t(X) %*% X) %*% t(X) %*% y ## (Intercept) exports age male urban_dum ## [1,] 1.503 -0.001021 0.005045 0.02784

  • 0.2739

## malaria_ecology ## [1,] 0.01941 coef(mod) ## (Intercept) exports age male ## 1.503037

  • 0.001021

0.005045 0.027837 ## urban_dum malaria_ecology ##

  • 0.273872

0.019411

47 / 64

slide-48
SLIDE 48

Intuition for the OLS in matrix form

ฬ‚ ๐œธ = (๐˜โ€ฒ๐˜)โˆ’1๐˜โ€ฒ๐ณ

  • Whatโ€™s the intuition here?
  • โ€œNumeratorโ€ ๐˜โ€ฒ๐ณ: is roughly composed of the covariances

between the columns of ๐˜ and ๐ณ

  • โ€œDenominatorโ€ ๐˜โ€ฒ๐˜ is roughly composed of the sample

variances and covariances of variables within ๐˜

  • Thus, we have something like:

ฬ‚ ๐œธ โ‰ˆ (variance of ๐˜)โˆ’1(covariance of ๐˜ & ๐ณ)

  • This is a rough sketch and isnโ€™t strictly true, but it can

provide intuition.

48 / 64

slide-49
SLIDE 49

5/ OLS inference in matrix form

49 / 64

slide-50
SLIDE 50

Random vectors

  • A random vector is a vector of random variables:

๐ฒ๐‘— = [ ๐‘ฆ๐‘—1 ๐‘ฆ๐‘—2 ]

  • Here, ๐ฒ๐‘— is a random vector and ๐‘ฆ๐‘—1 and ๐‘ฆ๐‘—2 are random

variables.

  • When we talk about the distribution of ๐ฒ๐‘—, we are talking

about the joint distribution of ๐‘ฆ๐‘—1 and ๐‘ฆ๐‘—2.

50 / 64

slide-51
SLIDE 51

Distribution of random vectors

  • Expectation of random vectors:

๐”ฝ[๐ฒ๐‘—] = [ ๐”ฝ[๐‘ฆ๐‘—1] ๐”ฝ[๐‘ฆ๐‘—2] ]

  • Variance of random vectors:

๐•Ž[๐ฒ๐‘—] = [ ๐•Ž[๐‘ฆ๐‘—1] Cov[๐‘ฆ๐‘—1, ๐‘ฆ๐‘—2] Cov[๐‘ฆ๐‘—1, ๐‘ฆ๐‘—2] ๐•Ž[๐‘ฆ๐‘—2] ]

  • Properties of this variance-covariance matrix:

โ–ถ if ๐› is constant, then ๐•Ž[๐›โ€ฒ๐ฒ๐‘—] = ๐›โ€ฒ๐•Ž[๐ฒ๐‘—]๐›. โ–ถ if matrix ๐ and vector ๐œ are constant, then

๐•Ž[๐๐ฒ๐‘— + ๐œ] = ๐๐•Ž[๐ฒ๐‘—]๐โ€ฒ

51 / 64

slide-52
SLIDE 52

Most general OLS assumptions

  • 1. Linearity: ๐‘ง๐‘— = ๐ฒโ€ฒ

๐‘—๐œธ + ๐‘ฃ๐‘—

  • 2. Random/iid sample: (๐‘ง๐‘—, ๐ฒโ€ฒ

๐‘—) are a iid sample from the

population.

  • 3. No perfect collinearity: ๐˜ is an ๐‘œ ร— (๐‘™ + 1) matrix with rank

๐‘™ + 1

  • 4. Zero conditional mean: ๐”ฝ[๐‘ฃ๐‘—|๐ฒ๐‘—] = 0
  • 5. Homoskedasticity: ๐•Ž[๐‘ฃ๐‘—|๐ฒ๐‘—] = ๐œ2

๐‘ฃ

  • 6. Normality: ๐‘ฃ๐‘—|๐ฒ๐‘— โˆผ ๐‘‚(0, ๐œ2

๐‘ฃ)

52 / 64

slide-53
SLIDE 53

Matrix rank

  • Defjnition The rank of a matrix is the maximum number of

linearly independent columns.

  • Defjnition The columns of a matrix ๐˜ are linearly

independent if ๐˜๐œ = 0 if and only if ๐œ = 0: ๐‘1๐ฒ1 + ๐‘2๐ฒ๐Ÿ‘ + โ‹ฏ + ๐‘๐‘™๐ฒ๐‘™ = 0

  • Example violation: one column is a linear function of the
  • thers.

โ–ถ 3 covariates with ๐ฒ1 = ๐ฒ2 + ๐ฒ3

0 = ๐‘1๐ฒ1 + ๐‘2๐ฒ2 + ๐‘3๐ฒ3 = ๐‘1(๐ฒ2 + ๐ฒ3) + ๐‘2๐ฒ2 + ๐‘3๐ฒ3 = (๐‘1 + ๐‘2)๐ฒ2 + (๐‘1 + ๐‘3)๐ฒ3

  • โ€ฆequals 0 when ๐‘1 = โˆ’๐‘2 = โˆ’๐‘3 โ‡ not linearly independent!

53 / 64

slide-54
SLIDE 54

Rank and matrix inversion

  • If ๐˜ is ๐‘œ ร— (๐‘™ + 1) has rank ๐‘™ + 1, then all of its columns are

linearly independent

โ–ถ Generalization of no perfect collinearity to arbitrary ๐‘™.

  • ๐˜ has rank ๐‘™ + 1 โ‡ (๐˜โ€ฒ๐˜) has rank ๐‘™ + 1
  • If a square (๐‘™ + 1) ร— (๐‘™ + 1) matrix has rank ๐‘™ + 1, then it is

invertible.

  • ๐˜ has rank ๐‘™ + 1 โ‡ (๐˜โ€ฒ๐˜)โˆ’1 exists and is unique.

54 / 64

slide-55
SLIDE 55

Zero conditional mean error

  • Combining zero mean conditional error and iid we have:

๐”ฝ[๐‘ฃ๐‘—|๐˜] = ๐”ฝ[๐‘ฃ๐‘—|๐ฒ๐‘—] = 0

  • Stacking these into the vector of errors:

๐”ฝ[๐ฏ|๐˜] = โŽก โŽข โŽข โŽข โŽฃ ๐”ฝ[๐‘ฃ1|๐˜] ๐”ฝ[๐‘ฃ2|๐˜] โ‹ฎ ๐”ฝ[๐‘ฃ๐‘œ|๐˜] โŽค โŽฅ โŽฅ โŽฅ โŽฆ = โŽก โŽข โŽข โŽข โŽฃ โ‹ฎ โŽค โŽฅ โŽฅ โŽฅ โŽฆ

55 / 64

slide-56
SLIDE 56

Expectation of OLS

  • Useful to write OLS as:

ฬ‚ ๐œธ = (๐˜โ€ฒ๐˜)โˆ’1 ๐˜โ€ฒ๐ณ = (๐˜โ€ฒ๐˜)โˆ’1 ๐˜โ€ฒ(๐˜๐œธ + ๐ฏ) = (๐˜โ€ฒ๐˜)โˆ’1 ๐˜โ€ฒ๐˜๐œธ + (๐˜โ€ฒ๐˜)โˆ’1 ๐˜โ€ฒ๐ฏ = ๐œธ + (๐˜โ€ฒ๐˜)โˆ’1 ๐˜โ€ฒ๐ฏ

  • Under assumptions 1-4, OLS is conditionally unbiased for ๐œธ:

๐”ฝ[ ฬ‚ ๐œธ|๐˜] = ๐œธ + (๐˜โ€ฒ๐˜)โˆ’1 ๐˜โ€ฒ๐”ฝ[๐ฏ|๐˜] = ๐œธ + (๐˜โ€ฒ๐˜)โˆ’1 ๐˜โ€ฒ๐Ÿ = ๐œธ

  • Implies that OLS is unconditionally unbiased: ๐”ฝ[ ฬ‚

๐œธ] = ๐œธ

56 / 64

slide-57
SLIDE 57

Variance of OLS

  • What about ๐•Ž[ ฬ‚

๐œธ|๐˜]?

  • Using some facts about variances and matrices, can derive:

๐•Ž[ ฬ‚ ๐œธ|๐˜] = (๐˜โ€ฒ๐˜)โˆ’1 ๐˜โ€ฒ๐•Ž[๐ฏ|๐˜]๐˜ (๐˜โ€ฒ๐˜)โˆ’1

  • What the covariance matrix of the errors, ๐•Ž[๐ฏ|๐˜]?

๐•Ž[๐ฏ|๐˜] = โŽก โŽข โŽข โŽข โŽฃ ๐•Ž[๐‘ฃ1|๐˜] cov[๐‘ฃ1, ๐‘ฃ2|๐˜] โ€ฆ cov[๐‘ฃ1, ๐‘ฃ๐‘œ|๐˜] cov[๐‘ฃ2, ๐‘ฃ1|๐˜] ๐•Ž[๐‘ฃ2|๐˜] โ€ฆ cov[๐‘ฃ2, ๐‘ฃ๐‘œ|๐˜] โ‹ฎ โ‹ฑ cov[๐‘ฃ๐‘œ, ๐‘ฃ1|๐˜] cov[๐‘ฃ๐‘œ, ๐‘ฃ2|๐˜] โ€ฆ ๐•Ž[๐‘ฃ๐‘œ|๐˜] โŽค โŽฅ โŽฅ โŽฅ โŽฆ

  • This matrix is symmetric since cov(๐‘ฃ๐‘—, ๐‘ฃ๐‘˜) = cov(๐‘ฃ๐‘—, ๐‘ฃ๐‘˜)

57 / 64

slide-58
SLIDE 58

Homoskedasicity

  • By homoskedasticity and iid, for any units ๐‘—, ๐‘ก, ๐‘ข:

โ–ถ ๐•Ž[๐‘ฃ๐‘—|๐˜] = ๐•Ž[๐‘ฃ๐‘—|๐ฒ๐‘—] = ๐œ2

๐‘ฃ (constant variance)

โ–ถ cov[๐‘ฃ๐‘ก, ๐‘ฃ๐‘ข|๐˜] = 0 (uncorrelated errors)

  • Then, the covariance matrix of the errors is simply:

๐•Ž[๐ฏ|๐˜] = ๐œ2

๐‘ฃ๐‰๐‘œ =

โŽก โŽข โŽข โŽข โŽฃ ๐œ2

๐‘ฃ

โ€ฆ ๐œ2

๐‘ฃ

โ€ฆ โ‹ฎ โ€ฆ ๐œ2

๐‘ฃ

โŽค โŽฅ โŽฅ โŽฅ โŽฆ

  • Thus, we have the following:

๐•Ž[ ฬ‚ ๐œธ|๐˜] = (๐˜โ€ฒ๐˜)โˆ’1 ๐˜โ€ฒ๐•Ž[๐ฏ|๐˜]๐˜ (๐˜โ€ฒ๐˜)โˆ’1 = (๐˜โ€ฒ๐˜)โˆ’1 ๐˜โ€ฒ(๐œ2

๐‘ฃ๐‰๐‘œ)๐˜ (๐˜โ€ฒ๐˜)โˆ’1

= ๐œ2

๐‘ฃ (๐˜โ€ฒ๐˜)โˆ’1 ๐˜โ€ฒ๐˜ (๐˜โ€ฒ๐˜)โˆ’1

= ๐œ2 (๐˜โ€ฒ๐˜)โˆ’1

58 / 64

slide-59
SLIDE 59

Sampling variance for OLS estimates

  • Under assumptions 1-5, the sampling variance of the OLS

estimator can be written in matrix form as the following: ๐•Ž[ ฬ‚ ๐œธ|๐˜] = ๐œ2

๐‘ฃ(๐˜โ€ฒ๐˜)โˆ’1

  • This symmetric matrix looks like this:

โŽก โŽข โŽข โŽข โŽข โŽฃ ๐•Ž [ ฬ‚ ๐›พ0|๐˜] Cov [ ฬ‚ ๐›พ0, ฬ‚ ๐›พ1|๐˜] โ‹ฏ Cov [ ฬ‚ ๐›พ0, ฬ‚ ๐›พ๐‘™|๐˜] Cov [ ฬ‚ ๐›พ0, ฬ‚ ๐›พ1|๐˜] ๐•Ž [ ฬ‚ ๐›พ1|๐˜] โ‹ฏ Cov [ ฬ‚ ๐›พ1, ฬ‚ ๐›พ๐‘™|๐˜] โ‹ฎ โ‹ฎ โ‹ฑ โ‹ฎ Cov [ ฬ‚ ๐›พ0, ฬ‚ ๐›พ๐‘™|๐˜] Cov [ ฬ‚ ๐›พ๐‘™, ฬ‚ ๐›พ1|๐˜] โ‹ฏ ๐•Ž [ ฬ‚ ๐›พ๐‘™|๐˜] โŽค โŽฅ โŽฅ โŽฅ โŽฅ โŽฆ

59 / 64

slide-60
SLIDE 60

Inference in the general setuing

  • Under assumption 1-5 in large samples:

ฬ‚ ๐›พ๐‘˜ โˆ’ ๐›พ๐‘˜ ฬ‚ se[ ฬ‚ ๐›พ๐‘˜] โˆผ ๐‘‚(0, 1)

  • In small samples, under assumptions 1-6,

ฬ‚ ๐›พ๐‘˜ โˆ’ ๐›พ๐‘˜ ฬ‚ se[ ฬ‚ ๐›พ๐‘˜] โˆผ ๐‘ข๐‘œโˆ’(๐‘™+1)

  • Thus, under the null of ๐ผ0 โˆถ ๐›พ๐‘˜ = 0, we know that

ฬ‚ ๐›พ๐‘˜ ฬ‚ se[ ฬ‚ ๐›พ๐‘˜] โˆผ ๐‘ข๐‘œโˆ’(๐‘™+1)

  • Here, the estimated SEs come from:

ฬ‚ ๐•Ž[ ฬ‚ ๐œธ] = ฬ‚ ๐œ2

๐‘ฃ(๐˜โ€ฒ๐˜)โˆ’1

ฬ‚ ๐œ2

๐‘ฃ =

ฬ‚ ๐ฏโ€ฒ ฬ‚ ๐ฏ ๐‘œ โˆ’ (๐‘™ + 1)

60 / 64

slide-61
SLIDE 61

Covariance matrix in R

  • We can access this estimated covariance matrix, ฬ‚

๐œ2

๐‘ฃ(๐˜โ€ฒ๐˜)โˆ’1,

in R:

vcov(mod) ## (Intercept) exports age male ## (Intercept) 0.0004766593 1.164e-07 -7.956e-06 -6.676e-05 ## exports 0.0000001164 1.676e-09 -3.659e-10 7.283e-09 ## age

  • 0.0000079562 -3.659e-10

2.231e-07 -7.765e-07 ## male

  • 0.0000667572

7.283e-09 -7.765e-07 1.909e-04 ## urban_dum

  • 0.0000965843 -4.861e-08

7.108e-07 -1.711e-06 ## malaria_ecology -0.0000069094 -2.124e-08 2.324e-10 -1.017e-07 ## urban_dum malaria_ecology ## (Intercept)

  • 9.658e-05
  • 6.909e-06

## exports

  • 4.861e-08
  • 2.124e-08

## age 7.108e-07 2.324e-10 ## male

  • 1.711e-06
  • 1.017e-07

## urban_dum 2.061e-04 2.724e-09 ## malaria_ecology 2.724e-09 7.590e-07

61 / 64

slide-62
SLIDE 62

Standard errors from the covariance matrix

  • Note that the diagonal are the variances. So the square root
  • f the diagonal is are the standard errors:

sqrt(diag(vcov(mod))) ## (Intercept) exports age male ## 0.02183253 0.00004094 0.00047237 0.01381627 ## urban_dum malaria_ecology ## 0.01435491 0.00087123 coef(summary(mod))[, "Std. Error"] ## (Intercept) exports age male ## 0.02183253 0.00004094 0.00047237 0.01381627 ## urban_dum malaria_ecology ## 0.01435491 0.00087123

62 / 64

slide-63
SLIDE 63

Nunn & Wantchekon

63 / 64

slide-64
SLIDE 64

Wrapping up

  • You have the full power of matrices.
  • Key to writing the OLS estimator and discussing higher level

concepts in regression and beyond.

  • Next week: diagnosing and fjxing problems with the linear

model.

64 / 64