Least Squares (outline) Standard regression: Fit data with - - PDF document

least squares outline
SMART_READER_LITE
LIVE PREVIEW

Least Squares (outline) Standard regression: Fit data with - - PDF document

Mathematical Tools for Neural and Cognitive Science Fall semester, 2018 Section 2: Least Squares Least Squares (outline) Standard regression: Fit data with weighted sum of regressors. Solution via calculus, orthogonality, SVD


slide-1
SLIDE 1

Section 2: Least Squares

Mathematical Tools for Neural and Cognitive Science Fall semester, 2018

Least Squares (outline)

  • Standard regression: Fit data with weighted sum of
  • regressors. Solution via calculus, orthogonality, SVD
  • Choosing regressors, overfitting
  • Robustness: weighted regression, iterative outlier

trimming, robust error functions, iterative re-weighting

  • Constrained regression: linear, quadratic constraints
  • Total Least Squares (TLS) regression, and Principle

Components Analysis (PCA)

min

β

X

n

(yn − βxn)2 Least squares regression:

In the space of measurements:

y

x “objective” or “error" function

[Gauss, 1795 - age 18]

slide-2
SLIDE 2

β

Error y x

β

Error y x

β

Error y x

β

Error y x

β

Error y x

β

Error y x

β

Error y x

β

Error y x

β

Error y x

Optimum “objective function”

can solve this with calculus...

[on board]

min

β

X

n

(yn − βxn)2

slide-3
SLIDE 3

min

β

X

n

(yn − βxn)2 = min

β ||~

y − ~ x||2 ... or linear algebra:

β

=

~ x

5 10 15 20 25 30 −2 2 5 10 15 20 25 30 0.5 1 5 10 15 20 25 30 −2 2 4

Regressor Observation Residual error ~ y

min

β

X

n

(yn − βxn)2 = min

β ||~

y − ~ x||2 ... or linear algebra:

βopt

Geometry:

Note: this is not the 2D (x,y) measurement space

  • f previous plots!
slide-4
SLIDE 4

β1 β2

5 10 15 20 25 30 0.5 1 5 10 15 20 25 30 −0.5 0.5 5 10 15 20 25 30 0.5 1

β0

~ x0 ~ x1

min

~

  • ||~

y − X

k

k~ xk||2 = min

~

  • ||~

y − X~ ||2

~ x2

Observation

~ y

Multiple regression:

5 10 15 20 25 30 0.1 0.2

XT ⇣ ~ y − X~

= ~

Alternatively, use SVD... Solution via the Orthogonality Principle

Construct matrix , containing columns and

X ~ x1 ~ x2

Orthogonality condition: ~ y ~ x

1

~ x2 X~

  • pt

2D vector space containing all linear combinations of and ~ x1 ~ x2

{

Error vector

~ ∗

  • pt = S#~

y∗ β∗

  • pt,k = y∗

k/sk,

for each k

[on board: transformations, elliptical geometry]

Solution:

  • r
slide-5
SLIDE 5

statAnMod - 9/12/07 - E.P. Simoncelli

Optimization problems

Smooth (C2) Convex Quadratic Closed-form guaranteed Iterative descent, (possibly) nonunique Iterative descent, guaranteed Heuristics, exhaustive search, (pain & suffering)

[Anscombe, 1973]

Interpretation: what does it mean?

Note that these all give the same regression fit: Polynomial regression - how many terms?

slide-6
SLIDE 6

Model complexity Error Empirical (data) error “True” model error

(to be continued, when we get to “statistics”...)

Weighted Least Squares

min

β

X

n

[wn(yn − βxn)]2 = min

β ||W(~

y − ~ x)||2

diagonal matrix Solution via simple extensions of basic regression solution (i.e., let and and solve for ) ~ y ∗ = W~ y

~ x ∗ = W~ x β

−2 −1 1 2 −0.5 0.5 1 true value

  • bserved data

regression fit

“outlier”

Solution 1: “trimming”...

slide-7
SLIDE 7

−2 −1 1 2 −0.5 0.5 1 true value

  • bserved data

regression fit trimmed regression

When done iteratively (discard the outlier, re-fit, repeat), this is a so-called “greedy” method. When do you stop?

Solution 2: Use a “robust” error metric. For example:

“Lorentzian” f(d) = log(c2 + d2) f(d) = d2 Note: generally can’t obtain solution directly (i.e., requires an iterative optimization procedure). In some cases, can use iteratively re-weighted least squares (IRLS)...

f(d) d2

w(0)

n

= 1 β(i) = arg min

β

X

n

w(i)

n [(yn − βxn)]2

Iteratively Re-weighted Least Squares (IRLS)

initialize: w(i+1)

n

= f(yn − β(i)xn) (yn − β(i)xn)2

iterate (one of many variants)

slide-8
SLIDE 8

min

~

  • ||~

y − X~ ||2, where ||C~ ||2 = 1 min

~

  • ||~

y − X~ ||2, where ~ c · ~ = ↵

Constrained Least Squares

Linear constraint: Quadratic constraint:

Both can be solved exactly using linear algebra (SVD)... [on board, with geometry]

y x

min

β ||~

y − ~ x||2

Standard Least Squares regression objective: Squared error of the “dependent” variable

min

ˆ u ||Dˆ

u||2, where ||ˆ u||2 = 1

Total Least Squares Regression

(a.k.a “orthogonal regression”)

y x

Error is squared distance from the fitted line... Note: “data” now includes both x and y coordinates

slide-9
SLIDE 9

First two components

  • f (rest are zero!),

for three example ’s.

V T S ˆ u ˆ u∗

~ u∗∗ ||USV T ˆ u||2 = ||SV T ˆ u||2 = ||Sˆ u∗||2 = ||~ u∗∗||2, where D = USV T , ˆ u∗ = V T ˆ u, ~ u∗∗ = Sˆ u∗

V S

Set of ’s of length 1 (i.e., unit vectors)

min max

Set of ’s of length 1 (i.e., unit vectors)

ˆ u

ˆ u∗ ~ u∗∗

Eigenvectors/eigenvalues

Define symmetric matrix: C = M T M = (USV T )T (USV T ) = V ST U T USV T = V (ST S)V T C~ vk = V (ST S)V T~ vk = V (ST S)ˆ ek = s2

kV ˆ

ek = s2

k~

vk , the kth columns of , is called an eigenvector of C: ~ vk V

  • “rotate, stretch, rotate back”
  • matrix C “summarizes” the

shape of the data

  • output is a rescaled copy of input
  • scale factor is called the

eigenvalue associated with s2

k

~ vk

Eigenvectors/eigenvalues

Define symmetric matrix: And, for arbitrary vectors : C = M T M = (USV T )T (USV T ) = V ST U T USV T = V (ST S)V T C~ x = X

k

s2

k(~

vT

k ~

x)~ vk C~ vk = V (ST S)V T~ vk = V (ST S)ˆ ek = s2

kV ˆ

ek = s2

k~

vk If is the kth column of then: ~ vk V ~ x