Introduction to Big Data and Machine Learning OLS matrix derivation - - PowerPoint PPT Presentation

▶

Nov 13, 2022 362 likes •460 views

Introduction to Big Data and Machine Learning OLS matrix derivation Dr. Mihail August 26, 2019 (Dr. Mihail) Intro Big Data August 26, 2019 1 / 9 Ordinary least squares Matrix form Let X be n x k , where each row ( n of them) is an

SLIDE 1

Introduction to Big Data and Machine Learning OLS matrix derivation

Dr. Mihail

August 26, 2019

(Dr. Mihail) Intro Big Data August 26, 2019 1 / 9

SLIDE 2

Ordinary least squares

Matrix form

Let X be n x k, where each row (n of them) is an observation of k

variables. We will assume models have a constant (bias), so first

column will be 1’s Let y be an n x 1 vector of observations on the dependent variable Let ǫ be an n x 1 vector of disturbances or errors Let β be a k x 1 vector of unknown population parameters that we wish to estimate         Y1 Y2 . . . . . . Yn        

nx1

=         1 X11 X21 . . . X21 1 X12 X22 . . . Xk2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 X1n X2n . . . Xkn        

nxk

        β1 β2 . . . . . . βn        

kx1

+         ǫ1 ǫ2 . . . . . . ǫn        

nx1

(1)

(Dr. Mihail) Intro Big Data August 26, 2019 2 / 9

SLIDE 3

Ordinary least squares

Matrix form

        Y1 Y2 . . . . . . Yn        

nx1

=         1 X11 X21 . . . X21 1 X12 X22 . . . Xk2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 X1n X2n . . . Xkn        

nxk

        β1 β2 . . . . . . βn        

kx1

+         ǫ1 ǫ2 . . . . . . ǫn        

nx1

Or more succintly

y = Xβ + ǫ (2)

(Dr. Mihail) Intro Big Data August 26, 2019 3 / 9

SLIDE 4

Ordinary least squares

Matrix form

We wish to estimate ˆ β ˆ β minimizes the sum of the squared residuals e2

i

The vector of residuals is given by e = y − X ˆ β The sum of squared residuals is given by e′ea

aNot to be confused with ee′, the covariance of residuals

Sum of squared residuals

e2 . . . . . . en

        e1 e2 . . . . . . en         =

e1 × e1

e2 × e2 . . . en × en

(Dr. Mihail) Intro Big Data August 26, 2019 4 / 9

SLIDE 5

Ordinary least squares

Sum of squares

e′e = (y − X ˆ β)′(y − X ˆ β) = y′y − ˆ β′y − y′X ˆ β + ˆ β′X ′X ˆ β = y′y − 2ˆ β′X ′y + ˆ β′X ′X ˆ β (4) We used this identity: y′X ˆ β = (y′X ˆ β)′ = ˆ β′X ′y

(Dr. Mihail) Intro Big Data August 26, 2019 5 / 9

SLIDE 6

Ordinary least squares

Matrix differentiation review

∂a′b ∂b = ∂b′a ∂b = a (5) where a and b are Kx1 vectors ∂b′Ab ∂b = 2Ab = 2b′A (6) where A is any symmetric matrix. Note that you can write the derivative as 2Ab or 2b′a

(Dr. Mihail) Intro Big Data August 26, 2019 6 / 9

SLIDE 7

Ordinary least squares

Matrix differentiation review

∂2β′X ′y ∂b = ∂2β′(X ′y) ∂b = 2X ′y (7) and ∂2β′X ′Xβ ∂b = ∂2β′Aβ ∂b = 2Aβ = 2X ′Xβ (8) when X ′X is a KxK matrix.

(Dr. Mihail) Intro Big Data August 26, 2019 7 / 9

SLIDE 8

Ordinary least squares

Parameter estimation

The ˆ β that minimizes the sum of squared residuals is obtained by computing the derivative of e′e with respect to ˆ β ∂e′e ∂ ˆ β = −2X ′y + 2X ′X ˆ β (9) Setting the derivative equal to 0 and solving for ˆ β −2X ′y + 2X ′X ˆ β = 0 (10) (X ′X)ˆ β = X ′y (11) X ′X is always square (k x k) and symmetric. Both X and y are known from our data

(Dr. Mihail) Intro Big Data August 26, 2019 8 / 9

SLIDE 9

Ordinary least squares

Parameter estimation

(X ′X)ˆ β = X ′y (12) X ′X is always square (k x k) and symmetric. Both X and y are known from our data, so we can multiply both sides by the inverse (X ′X)−1, yielding: (X ′X)−1(X ′X)ˆ β = (X ′X)−1X ′y (13) I ˆ β = (X ′X)−1X ′y (14)

r finally:

ˆ β = (X ′X)−1X ′y (15)

(Dr. Mihail) Intro Big Data August 26, 2019 9 / 9