[PPT] - a 11 a 1 n . . ... . . A = . . a n 1 a nn PowerPoint Presentation

SLIDE 1

LEAST SQUARES

Lesson 16

SLIDE 2

Let

A =

a11

· · · a1n . . . ... . . . an1 · · · ann

be an n × n matrix with complex entries (i.e., A ∈ Cn×n)

It is helpful to view A as a row-vector whose columns are in Cn: A = a1 | · · · | an

Recall that if

is nonsingular, then we can always solve the linear system for . . . . . .

What if

is singular? Can we find so that is "close" to ?

In other words, for

. . . , we want

SLIDE 3

Let

A =

a11

· · · a1n . . . ... . . . an1 · · · ann

be an n × n matrix with complex entries (i.e., A ∈ Cn×n)

It is helpful to view A as a row-vector whose columns are in Cn: A = a1 | · · · | an

Recall that if A is nonsingular, then we can always solve the linear system

Ac = b, for b =

b1

. . . bn

, c =
c1

. . . cn

∈ Cn
What if

is singular? Can we find so that is "close" to ?

In other words, for

. . . , we want

SLIDE 4

Let

A =

a11

· · · a1n . . . ... . . . an1 · · · ann

be an n × n matrix with complex entries (i.e., A ∈ Cn×n)

It is helpful to view A as a row-vector whose columns are in Cn: A = a1 | · · · | an

Recall that if A is nonsingular, then we can always solve the linear system

Ac = b, for b =

b1

. . . bn

, c =
c1

. . . cn

∈ Cn
What if A is singular? Can we find c so that Ac is "close" to b?
In other words, for c =
c1

. . . cn

, we want

c1a1 + · · · + cnan ≈ b

SLIDE 5

More generally, let A Cm×n:

A = a1 | · · · | an

for

ak Cm

Can we numerically compute c1, . . . , cn so that

c1a1 + · · · cnan b

More precisely, we find

such that takes its minimal value

SLIDE 6

More generally, let A Cm×n:

A = a1 | · · · | an

for

ak Cm

Can we numerically compute c1, . . . , cn so that

c1a1 + · · · cnan b

More precisely, we find c such that

Ac b2 takes its minimal value

SLIDE 7

Let's review the real A Rmn, c Rn and b Rm
Minimizing Ac b is equivalent to minimizing Ac b2
We simplify
We can heuristically assume that the minimum is a stationary point of this equation;

i.e., we want

SLIDE 8

Let's review the real A Rmn, c Rn and b Rm
Minimizing Ac b is equivalent to minimizing Ac b2
We simplify

Ac b2 = (Ac b)(Ac b)

We can heuristically assume that the minimum is a stationary point of this equation;

i.e., we want

SLIDE 9

Let's review the real A Rmn, c Rn and b Rm
Minimizing Ac b is equivalent to minimizing Ac b2
We simplify

Ac b2 = (Ac b)(Ac b) = Ac2 (Ac)b bAc + b2

We can heuristically assume that the minimum is a stationary point of this equation;

i.e., we want

SLIDE 10

Let's review the real A Rmn, c Rn and b Rm
Minimizing Ac b is equivalent to minimizing Ac b2
We simplify

Ac b2 = (Ac b)(Ac b) = Ac2 (Ac)b bAc + b2 = cAAc 2cAb + b2

We can heuristically assume that the minimum is a stationary point of this equation;

i.e., we want

SLIDE 11

Let's review the real A Rmn, c Rn and b Rm
Minimizing Ac b is equivalent to minimizing Ac b2
We simplify

Ac b2 = (Ac b)(Ac b) = Ac2 (Ac)b bAc + b2 = cAAc 2cAb + b2

We can heuristically assume that the minimum is a stationary point of this equation;

i.e., we want 0 = c Ac b2

SLIDE 12

: Suppose A has linearly independent columns. The vector c = (AA)1Ab is the unique minimizer of Ac b :

We first remark that

is positive definite, i.e., for all (real) . (Why?)

Minimizing

is equivalent to minimizing .

For all

, we have

SLIDE 13

: Suppose A has linearly independent columns. The vector c = (AA)1Ab is the unique minimizer of Ac b :

We first remark that AA is positive definite, i.e., xAAx > 0 for all (real) x.

(Why?)

Minimizing

is equivalent to minimizing .

For all

, we have

SLIDE 14

: Suppose A has linearly independent columns. The vector c = (AA)1Ab is the unique minimizer of Ac b :

We first remark that AA is positive definite, i.e., xAAx > 0 for all (real) x.

(Why?)

Minimizing Ac b is equivalent to minimizing Ac b2.
For all

, we have

SLIDE 15

: Suppose A has linearly independent columns. The vector c = (AA)1Ab is the unique minimizer of Ac b :

We first remark that AA is positive definite, i.e., xAAx > 0 for all (real) x.

(Why?)

Minimizing Ac b is equivalent to minimizing Ac b2.
For all x, we have

A(c + x) b2 = (c + x)AA(c + x) 2(c + x)Ab + b2

SLIDE 16

: Suppose A has linearly independent columns. The vector c = (AA)1Ab is the unique minimizer of Ac b :

We first remark that AA is positive definite, i.e., xAAx > 0 for all (real) x.

(Why?)

Minimizing Ac b is equivalent to minimizing Ac b2.
For all x, we have

A(c + x) b2 = (c + x)AA(c + x) 2(c + x)Ab + b2 = cAA(c + x) + xAA(c + x) 2(c + x)b + b2

SLIDE 17

: Suppose A has linearly independent columns. The vector c = (AA)1Ab is the unique minimizer of Ac b :

We first remark that AA is positive definite, i.e., xAAx > 0 for all (real) x.

(Why?)

Minimizing Ac b is equivalent to minimizing Ac b2.
For all x, we have

A(c + x) b2 = (c + x)AA(c + x) 2(c + x)Ab + b2 = cAA(c + x) + xAA(c + x) 2(c + x)b + b2 = xAAx + cAAc + 2xAAc 2(c + x)Ab + b2

SLIDE 18

: Suppose A has linearly independent columns. The vector c = (AA)1Ab is the unique minimizer of Ac b :

We first remark that AA is positive definite, i.e., xAAx > 0 for all (real) x.

(Why?)

Minimizing Ac b is equivalent to minimizing Ac b2.
For all x, we have

A(c + x) b2 = (c + x)AA(c + x) 2(c + x)Ab + b2 = cAA(c + x) + xAA(c + x) 2(c + x)b + b2 = xAAx + cAAc + 2xAAc 2(c + x)Ab + b2 = xAAx + cAAc + 2xAA(AA)1Ab 2(c + x)Ab + b2

SLIDE 19

: Suppose A has linearly independent columns. The vector c = (AA)1Ab is the unique minimizer of Ac b :

We first remark that AA is positive definite, i.e., xAAx > 0 for all (real) x.

(Why?)

Minimizing Ac b is equivalent to minimizing Ac b2.
For all x, we have

A(c + x) b2 = (c + x)AA(c + x) 2(c + x)Ab + b2 = cAA(c + x) + xAA(c + x) 2(c + x)b + b2 = xAAx + cAAc + 2xAAc 2(c + x)Ab + b2 = xAAx + cAAc + 2xAA(AA)1Ab 2(c + x)Ab + b2 = xAAx + cAAc + 2xAb 2(c + x)Ab + b2

SLIDE 20

: Suppose A has linearly independent columns. The vector c = (AA)1Ab is the unique minimizer of Ac b :

We first remark that AA is positive definite, i.e., xAAx > 0 for all (real) x.

(Why?)

Minimizing Ac b is equivalent to minimizing Ac b2.
For all x, we have

A(c + x) b2 = (c + x)AA(c + x) 2(c + x)Ab + b2 = cAA(c + x) + xAA(c + x) 2(c + x)b + b2 = xAAx + cAAc + 2xAAc 2(c + x)Ab + b2 = xAAx + cAAc + 2xAA(AA)1Ab 2(c + x)Ab + b2 = xAAx + cAAc + 2xAb 2(c + x)Ab + b2 = xAAx + cAAc 2cAb + b2

SLIDE 21

: Suppose A has linearly independent columns. The vector c = (AA)1Ab is the unique minimizer of Ac b :

We first remark that AA is positive definite, i.e., xAAx > 0 for all (real) x.

(Why?)

Minimizing Ac b is equivalent to minimizing Ac b2.
For all x, we have

A(c + x) b2 = (c + x)AA(c + x) 2(c + x)Ab + b2 = cAA(c + x) + xAA(c + x) 2(c + x)b + b2 = xAAx + cAAc + 2xAAc 2(c + x)Ab + b2 = xAAx + cAAc + 2xAA(AA)1Ab 2(c + x)Ab + b2 = xAAx + cAAc + 2xAb 2(c + x)Ab + b2 = xAAx + cAAc 2cAb + b2

Independent of x! Minimized when x is zero!

SLIDE 22

GENERAL INNER PRODUCT SPACES

SLIDE 23

Consider a row vector of elements v1, . . . , vn V :

A = a1 | · · · | an

We can associate with A a Gram matrix

K =

a1, a1

· · · a1, an . . . ... . . . a1, an · · · an, an

In the case where V = Rm, the Gram matrix is precisely the matrix we used in

least squares K = AA

In the case where V = Cm, we get the similar

K = AA

SLIDE 24

: The Gram matrix K is Hermitian: K = K. :

Follows from the fact that u, v = u, v:

K =    a1, a1 · · · an, a1 . . . ... . . . a1, an · · · an, an     . . . ... . . . 

SLIDE 25

: The Gram matrix K is Hermitian: K = K. :

Follows from the fact that u, v = u, v:

K =    a1, a1 · · · an, a1 . . . ... . . . a1, an · · · an, an    =    a1, a1 · · · a1, an . . . ... . . . an, a1 · · · an, an    = K

SLIDE 26

: Let A =

a1 | · · · | an
and K denote the associated Gram matrix. For

x, y Cn we have Ay, Ax = yKx. :

.

. . ... . . . . . .

SLIDE 27

: Let A =

a1 | · · · | an
and K denote the associated Gram matrix. For

x, y Cn we have Ay, Ax = yKx. :

yKx = y
a1, a1

· · · a1, an . . . ... . . . an, a1 · · · an, an

x
.

. .

SLIDE 28

: Let A =

a1 | · · · | an
and K denote the associated Gram matrix. For

x, y Cn we have Ay, Ax = yKx. :

yKx = y
a1, a1

· · · a1, an . . . ... . . . an, a1 · · · an, an

x

= y

a1, a1x1 + · · · + anxn

. . . an, a1x1 + · · · + anxn

SLIDE 29

: Let A =

a1 | · · · | an
and K denote the associated Gram matrix. For

x, y Cn we have Ay, Ax = yKx. :

yKx = y
a1, a1

· · · a1, an . . . ... . . . an, a1 · · · an, an

x

= y

a1, a1x1 + · · · + anxn

. . . an, a1x1 + · · · + anxn

= a1y1 + · · · + anyn, a1x1 + · · · + anxn

SLIDE 30

: Let A =

a1 | · · · | an
and K denote the associated Gram matrix. For

x, y Cn we have Ay, Ax = yKx. :

yKx = y
a1, a1

· · · a1, an . . . ... . . . an, a1 · · · an, an

x

= y

a1, a1x1 + · · · + anxn

. . . an, a1x1 + · · · + anxn

= a1y1 + · · · + anyn, a1x1 + · · · + anxn

= Ay, Ax

SLIDE 31

: The Gram matrix K is positive semi-definite. If the vectors a1, . . . , an are linearly independent, then the Gram matrix is positive definite. :

Linear independence of

shows that if and only if

SLIDE 32

: The Gram matrix K is positive semi-definite. If the vectors a1, . . . , an are linearly independent, then the Gram matrix is positive definite. :

xKx = Ax, Ax = Ax2 0
Linear independence of

shows that if and only if

SLIDE 33

: The Gram matrix K is positive semi-definite. If the vectors a1, . . . , an are linearly independent, then the Gram matrix is positive definite. :

xKx = Ax, Ax = Ax2 0
Linear independence of a1, . . . , an shows that

Ax = a1x1 + · · · + anxn = 0 if and only if x =

SLIDE 34

Ax, b = n

k=1

xkak, b

=

n

k=1

¯ xk ak, b = x

a1, b

. . . an, b

Two more useful relationships:

SLIDE 35

Ax, b = n

k=1

xkak, b

=

n

k=1

¯ xk ak, b = x

a1, b

. . . an, b

b, Ax =
b,

n

k=1

xkak

=

n

k=1

xk b, ak =

n

k=1

xkak, b =

a1, b

. . . an, b

x

Two more useful relationships:

SLIDE 36

: Suppose the columns of A = a1 | . . . | an

are linearly independent

in an inner product space V . Let K be the associated Gram matrix. The vector c = K−1

a1, b

. . . an, b

is the unique minimizer of

Ac b (where the norm is the norm associated with the inner product) :

Exercise

SLIDE 37

ORTHONORMAL VECTORS AND CALCULATING

LEAST SQUARES APPROXIMATIONS

SLIDE 38

A set of nonzero vectors v1, . . . , vn are called orthogonal if

vi, vj = 0 whenever i = k.

They are called orthonormal if they are orthogonal and all vectors are of unit norm:

1 = vi , or equivalently, vi, vi = 1.

The Gram matrix of orthonormal vectors is the identity!

K =    v1, v1 · · · v1, vn . . . ... . . . vn, v1 · · · vn, vn    =    1 ... 1    = I

SLIDE 39

ORTHOGONAL AND UNITARY MATRICES

SLIDE 40

A matrix Q ∈ Rnn is orthogonal provided that

QQ = I

A matrix Q ∈ Cnn is unitary provided that

QQ = I Note that every orthogonal matrix is also unitary but not vice-versa

In other words Q1 = Q.
Every unitary matrix also satisfies

QQ = I

SLIDE 41

Multiplying a vector by a unitary matrix does not change the 2 norm:

Qv2 = (Qv)(Qv) = vQQv = vv = v2

Thus we view unitary matrices as generalizations of rotations and reflections

Exercise: show that every rotation and reflection in R2 corresponds to a

rthogonal matrix
This means that, for least squares, if

is minimal if and only if is also minimal

Question: can we compute a

so that the latter is easier?

SLIDE 42

Multiplying a vector by a unitary matrix does not change the 2 norm:

Qv2 = (Qv)(Qv) = vQQv = vv = v2

Thus we view unitary matrices as generalizations of rotations and reflections

Exercise: show that every rotation and reflection in R2 corresponds to a

rthogonal matrix
This means that, for least squares, if

Av b is minimal if and only if QAv Qb is also minimal

Question: can we compute a Q so that the latter is easier?

SLIDE 43

LEAST SQUARES FOR UPPER TRIANGULAR

MATRICES

SLIDE 44

Suppose rectangular matrix R Cmn for m > n is upper triangular:

R =

r11

· · · r1n ... . . . rnn . . .

=

ˆ R

where ˆ

R is n n

Then

is minimized by

Thus least squares can be solved by simple backsubstitution

SLIDE 45

Suppose rectangular matrix R Cmn for m > n is upper triangular:

R =

r11

· · · r1n ... . . . rnn . . .

=

ˆ R

where ˆ

R is n n

Then Rc b is minimized by

c = (RR)1Rb = ( ˆ R ˆ R)1Rb

Thus least squares can be solved by simple backsubstitution

SLIDE 46

Suppose rectangular matrix R Cmn for m > n is upper triangular:

R =

r11

· · · r1n ... . . . rnn . . .

=

ˆ R

where ˆ

R is n n

Then Rc b is minimized by

c = (RR)1Rb = ( ˆ R ˆ R)1Rb = ˆ R1 ˆ R ˆ R,

b
Thus least squares can be solved by simple backsubstitution

SLIDE 47

Suppose rectangular matrix R Cmn for m > n is upper triangular:

R =

r11

· · · r1n ... . . . rnn . . .

=

ˆ R

where ˆ

R is n n

Then Rc b is minimized by

c = (RR)1Rb = ( ˆ R ˆ R)1Rb = ˆ R1 ˆ R ˆ R,

b

=

ˆ

R1,

b

Thus least squares can be solved by simple backsubstitution

SLIDE 48

Suppose we have a QR decomposition of a rectangular matrix A Cm×n:

A = QR where Q Cm×m is unitary and R Cm×n is upper triangular

Then minimizing

is equivalent to minimizing

Thus we have reduced solving least squares problems to calculating a QR decom-

position We will see this is much more accurate than constructing and inverting

Next lecture we will discuss computation of QR decompositions.

SLIDE 49

Suppose we have a QR decomposition of a rectangular matrix A Cm×n:

A = QR where Q Cm×m is unitary and R Cm×n is upper triangular

Then minimizing Ac b is equivalent to minimizing

QAc Qb = Rc Qb

Thus we have reduced solving least squares problems to calculating a QR decom-

position We will see this is much more accurate than constructing and inverting

Next lecture we will discuss computation of QR decompositions.

SLIDE 50

Suppose we have a QR decomposition of a rectangular matrix A Cm×n:

A = QR where Q Cm×m is unitary and R Cm×n is upper triangular

Then minimizing Ac b is equivalent to minimizing

QAc Qb = Rc Qb

Thus we have reduced solving least squares problems to calculating a QR decom-

position We will see this is much more accurate than constructing and inverting (AA)−1A

Next lecture we will discuss computation of QR decompositions.