Overview of Innovations Topics Introduction Most of the lecture - - PowerPoint PPT Presentation

overview of innovations topics introduction most of the
SMART_READER_LITE
LIVE PREVIEW

Overview of Innovations Topics Introduction Most of the lecture - - PowerPoint PPT Presentation

Overview of Innovations Topics Introduction Most of the lecture materials on the Kalman filter will be drawn Notation from [1] Normal equations and MMSE revisited (generalized) Best book on the Kalman filter that Im aware of


slide-1
SLIDE 1

Notation ˆ x = Koy

  • Bold face will be used to denote random variables (vectors) in this

set of slides

  • Normal face will be used to denote non-random variables (known

constants, scalars, vectors, and matrices)

  • Upper case letters will be reserved for matrices, as before
  • The time index will be indicated by a subscript, rather than a

parenthetical expression, e.g., xn

  • Our goal is still to estimate a random vector xn from the observed

random vector yn – Thus the role of xn and yn have been reversed

  • The operator ∗ will be used to denote conjugate transpose

(formerly H)

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

3

Overview of Innovations Topics

  • Notation
  • Normal equations and MMSE revisited (generalized)
  • Derivation
  • Equivalence
  • Time-updates
  • Computational savings?
  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

1

Linear Estimation Problem Definition Revisited ˆ x = Koy

  • Note again that the roles of x and y have been reversed
  • Also, the coefficient vector (formerly co) is now a matrix Ko
  • Suppose we wish to estimate a random vector x ∈ Cℓ×1 from a

random vector y ∈ Cp×1

  • The estimator will be denoted as ˆ

x for now

  • As before, we will restrict our attention to linear estimators for the

time being

  • However, this is more general than our earlier discussion because

x is a vector

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

4

Introduction

  • Most of the lecture materials on the Kalman filter will be drawn

from [1]

  • Best book on the Kalman filter that I’m aware of
  • Much more thorough than most books on statistical signal

processing

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

2

slide-2
SLIDE 2

The Error Covariance Matrix E[˜ z2] = E[˜ z ˜ z∗] = E[(a∗ ˜ x)(a∗ ˜ x)∗] = a∗ E[˜ x˜ x∗]a = a∗P(K)a

  • For any given vector a, the MSE is minimized by finding the error

covariance matrix such that a∗P(K)a ≥ a∗P(Ko)a a∗[P(K) − P(Ko)]a ≥ 0

  • This means that an equivalent problem is to find Ko such that

P(K) − P(Ko) is nonnegative definite for all K

  • The solution to this problem is independent of a!
  • Commonly denoted as P(K) ≥ P(Ko)
  • Thus we can generalize the notion of minimizing MSE in the

vector case to mean minimize the matrix error covariance P(K) = E[˜ x˜ x∗]

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

7

Inner Products x, y E [xy∗] x2 x, x x⊥y : x, y = 0

  • The · notation is used to denote an inner product
  • It is unusual that our inner product is actually an (expected) outer

product of vectors

  • Nonetheless, all of the properties of a inner product space are

satisfied and the projection theorem holds

  • Thus the optimal estimate is still orthogonal to the residuals

(errors) y, x − Koy = 0

  • Generalizes all of the solutions to any inner product space
  • For all of our applications just denotes the expected outer product
  • f random vectors
  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

5

Solving the Normal Equations (again) P(K) = E [(x − Ky)(x − Ky)∗] = E[xx∗] − K E[yx∗] − E[xy∗]K∗ + K E[yy∗]K∗ = Rx − KRyx − RxyK∗ + KRyK =

  • I

−K Rx Rxy Ryx Ry I −K∗

  • =
  • I

−K

  • R
  • I

−K∗

  • Suppose Ry > 0 (positive definite, invertible) then we can factor the

joint covariance matrix in uper-lower block triangular form as

  • Rx

Rxy Ryx Ry

  • =

I RxyR−1

y

I R˜

x

Ry I R−1

y Ryx

I

  • where

Rxy = R∗

yx

x Rx − RxyR−1 y Ryx

is called the Schur complement of Rx in R

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

8

Problem Definition: Revisiting the MSE P(K) E [(x − ˆ x)(x − ˆ x)∗] = E [˜ x˜ x∗]

  • If we have multiple outputs that we are estimating simultaneously,

how do we calculate the mean squared error (MSE)?

  • What if estimating some elements of x is more important than
  • thers?
  • Let’s say we were interested in a linear combination of the

individual errors for each element of x: z a∗x ˆ z = E[z|y] = E[a∗x|y] = a∗ E[x|y] = a∗ ˆ x ˜ z z − ˆ z = a∗(x − ˆ x) = a∗ ˜ x

  • How is the minimum of E[˜

z2] related to the error covariance matrix P(K)?

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

6

slide-3
SLIDE 3

Normal Equations Revisited More generally, you can show that the MMSE estimator is given by any solution of the normal equations even when Ry is not invertible KoRy = Rxy These look quite different from the normal equations we obtained earlier Rco = d but they become the same if we consider that we are now estimating x from y. Suppose x is a scalar, then we can express the normal equations in a form similar to the one we worked with before: Ko

1×p Ry p×p

= Rxy

1×p

Ry

p×p

K∗

  • p×1

= Ryx

p×1

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

11

Normal Equations: Completing the Square Then P(K) = I −K Rx Rxy Ryx Ry I −K∗

  • =

I −K I RxyR−1

y

I R˜

x

Ry I R−1

y Ryx

I I −K∗

  • =

I

  • RxyR−1

y

− K R˜

x

Ry I

  • R−1

y Ryx − K∗

  • =

x

(RxyR−1

y

− K)Ry I (RxyR−1

y

− K)∗

  • = R˜

x + (RxyR−1 y

− K)Ry(RxyR−1

y

− K)∗ = Rx − RxyR−1

y Ryx + (RxyR−1 y

− K)Ry(RxyR−1

y

− K)∗

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

9

Exploiting Structure

  • So far we have tried to estimate one random variable (formerly y,

now xn) from another (formerly x, now y)

  • In many applications the random variables have additional

structure that we can exploit – More insightful solutions – Can frame problem in terms of known quantities and not just the auto- and cross-correlations – Can reduce the computation

  • Example: Stationary FIR filters
  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

12

Normal Equations: Finding the Optimal Solution P(K) = R˜

x + (RxyR−1 y

− K)Ry(RxyR−1

y

− K)∗ a∗P(K)a = a∗R˜

xa + a∗(RxyR−1 y

− K)Ry(RxyR−1

y

− K)∗a ´ a (RxyR−1

y

− K)∗a P(K) = a∗R˜

xa + ´

a∗Ry´ a ´ a∗Ry´ a ≥ 0 for any vector ´ a P(K) ≥ a∗R˜

xa

Since the final expression doesn’t depend on K, P(K) is minimized when K = Ko RxyR−1

y

and the minimum error covariance matrix is P(Ko) = R˜

x = Rx − RxyR−1 y Ryx = Rx − RxyKo

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

10

slide-4
SLIDE 4

Large Linear Estimation Problems Recall from the projection theorem that our estimator is given by the

  • rthogonal projection of x ∈ Cℓ×1 onto the set of all observations

{y0, y1, . . . , yN} where yn ∈ Cp×1: KoRy = Rxy Ry

(N+1)p×(N+1)p

= [yi, yj]N

i,j=0 =

⎡ ⎢ ⎢ ⎢ ⎣ y02 y0, y1 . . . y0, yN y1, y0 y12 . . . y1, yN . . . . . . ... . . . yN, y0 y1, y0 . . . yN2 ⎤ ⎥ ⎥ ⎥ ⎦ Rxy

ℓ×(N+1)p

= [x, yi]N

i=0

=

  • x, y0

x, y1 . . . x, yN

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

15

Normal Equations for Random Processes KoRy = Rxy

  • The signal processing context has a bit more structure than the

general linear estimation problem

  • Suppose

– We can make sequential observations vector-valued random process yn – We have made N + 1 observations of yn

  • Our goal is then to estimate a random vector x ∈ Cℓ×1 from the

entire set of observations yn ∈ Cp×1 for n = 0 . . . N

  • If we collect all N + 1 observation vectors into a composite

(N + 1) × p vector y = col{y0, . . . , yN}, then the solution is still given by the normal equations

  • But now the auto-correlation matrix Ry is enormous,

(N + 1)p × (N + 1)p

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

13

Property 1: Linearity of Linear MMSE Estimators There are two important properties of projections onto a linear space L{y}. The first is for any two matrices A1 and A2 of appropriate dimensions ´ x A1x1 + A2x2 R´

xy = ´

x, y = A1x1 + A2x2, y = A1x1, y + A2x2, y = A1Rx1y + A2Rx2y ˆ ´ x =

  • (A1x1 + A2x2)

= R´

xyR−1 y y

= (A1Rx1y + A2Rx2y)R−1

y y

= A1Rx1yR−1

y y + A2Rx2yR−1 y y

= A1 ˆ x1 + A2 ˆ x2 Thus

  • (A1x1 + A2x2) = A1 ˆ

x1 + A2 ˆ x2

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

16

Normal Equations: Opportunities KoRy = Rxy

  • In general, if Ry is of dimension M, it requires O(M 3) operations

to solve the normal equations

  • In modern computers this can be slow (seconds) if M ≥ 2000 (7 s
  • n my computer)
  • For large M, there may also be storage problems
  • If M = (N + 1)p is growing (the signal processing context), these

problems are much worse

  • Ideally, we could update our estimator each time a new
  • bservation arrived and discard the observation
  • Only works when the problem has some special structure
  • We know in the stationary FIR filter case, the solution requires
  • nly O(N 2) operations
  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

14

slide-5
SLIDE 5

Equivalent, but Orthogonal Observations LN L{y0, . . . , yN} = L{e0, . . . , eN}

  • Fortunately, we can transform {y0, . . . , yN} to an equivalent set
  • f orthogonal vectors {e0, . . . , eN} such that ei⊥ej for all i = j
  • It would be nicer still if we could efficiently calculate eN+1 from

yN+1 and eN without having to recalculate all of the other {e0, . . . , eN}

  • Thus, we want the random process en to have a recursive

relationship to yn

  • Then we can calculate eN+1 as

eN+1 = yN+1 − ˆ yN+1|N ˆ yN+1|N the linear MMSE estimator of yN+1 given {y0, . . . , yN} = yN+1, yy−2y = RyN+1yR−1

y y

  • This is just the forward prediction error filter (generalized)
  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

19

Property 2: The Orthogonality Condition ˆ x|y1,y2 = RxyR−1

y y

= Rxy1 Rxy2 Ry1 Ry1y2 Ry2y1 Ry2 −1 y1 y2

  • =

Rxy1 Rxy2 Ry1 Ry2 −1 y1 y2

  • =
  • Rxy1

Rxy2 R−1

y1

R−1

y2

y1 y2

  • =

Rxy1R−1

y1

Rxy2R−1

y2

y1 y2

  • = Rxy1R−1

y1 y1 + Rxy2R−1 y2 y2

= ˆ x|y1 + ˆ x|y2 ˆ x|y1,y2 = ˆ x|y1 + ˆ x|y2 if and only if y1⊥y2 (Ry1y2 = 0)

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

17

Finding Equivalent, Orthogonal Observations The estimates can be obtained easily from the prior estimates using the orthogonality condition ˆ yN+1|N the linear MMSE estimator of yN+1 given {y0, . . . , yN} = yN+1, yy−2y = the linear MMSE estimator of yN+1 given {e0, . . . , eN} = yN+1, ee−2e =

N

  • n=0

ˆ yN+1|en since en⊥ek for n = k =

N

  • n=0

yN+1, enen−2en

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

20

Linear Estimation with Orthogonal Observations Ko

ℓ×(N+1)p

Ry

(N+1)p×(N+1)p

= Rxy

ℓ×(N+1)p

  • If Ry were a diagonal matrix this would be much easier to solve
  • Only O ((N + 1)p) operations, rather than O
  • [(N + 1)p]3
  • Equivalently, would be much easier to invert if Ry were block

diagonal, O

  • (N + 1)p3
  • perations
  • In other words, yi⊥yj for all i = j
  • In this case

ˆ xN the linear MMSE estimator of x given {y0, . . . , yN} = RxyR−1

y y = x, yy−2y

=

N

  • n=0

RxynR−1

yn yn

  • This would not happen in most applications
  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

18

slide-6
SLIDE 6

Finding the Innovations

  • If our only goal was to diagonalize the covariance matrix Ry, there

are many possible transformations ǫ Ay Rǫ = ǫ, ǫ = ARyA∗ = a diagonal matrix

  • If we want ǫ to contain the same information, A must be

nonsingular Ry = A−1RǫA−∗, Rǫ diagonal

  • If A is an M × M matrix, there are M 2 free parameters in A and
  • nly M(M + 1)/2 constraints imposed by requiring Rǫ to be a

diagonal matrix

  • However, we also desire A to be lower triangular so that en is a

causal function of yn and vice versa

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

23

Gram-Schmidt Orthogonalization Then the next orthogonalized random vector can be obtained from eN+1 = yN+1 − ˆ yN+1|N = yN+1 −

N

  • n=0

yN+1, enen−2en which can be initialized with e0 = y0.

  • This is the Gram-Schmidt orthogonalization procedure for

random vectors

  • Note, however, that this does not tell you how to calculate the

cross-covariance yN+1, en or autocovariance matrices Re = en2 from Ry = yn2

  • To obtain these matrices requires O([(N + 1)p]3) operations
  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

21

Finding the Innovations Continued Ry = LDL∗ e = L−1y Re = L−1R−∗

y

  • We have three constraints on the linear transformation ǫ Ay
  • 1. The linear transformation of y be invertible (A is nonsingular)
  • 2. The linear transform be causal (A is lower triangular)
  • 3. That Re is diagonal
  • If L has a unit diagonal and Ry is positive definite, this

factorization is unique

  • These are the same innovations that are obtained via the

Gram-Schmidt procedure ǫ = e A = L−1

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

24

Innovations eN+1 = yN+1 −

N

  • n=0

yN+1, enen−2en

  • Note that eN+1 is the portion of the random variation in yN+1

that cannot be estimated from the previous observations {y0, . . . , yN}

  • The remainder contains the “new information”
  • It is in this sense that they are called innovations

{en} = the innovations process associated with {yn}

  • Each innovation en is uncorrelated with the prior innovations and

thus only contains the “new information”

  • In other words, it’s a white noise process
  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

22

slide-7
SLIDE 7

Estimation with Innovations The estimate of x is now straight-forward ˆ x|N the MMSE estimator of x given {y0, . . . , yN} = the MMSE estimator of x given {e0, . . . , eN} =

N

  • n=0

x, enen−2en

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

27

Equivalence of Causal, Invertible Innovations and Gram-Schmidt yn = en +

n

  • i=0

yn, eiei−2ei y = Le L = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ I y1, e0e0−2 I y2, e0e0−2 y1, e1e1−2 I . . . . . . . . . ... yN, e0e0−2 yN, e1e1−2 yN, e2e2−2 . . . I ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ Ry = LDL∗ D = diag{en2}

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

25

Recursive Estimation with Innovations The really cool thing is that we can update our estimate easily when yN+1 shows up eN+1 = yN+1 − ˆ yN+1 = yN+1 −

N

  • n=0

yN+1, enen−2en ˆ x|N+1 =

N

  • i=0

x, enen−2en + x, eN+1eN+1−2eN+1 = ˆ x|N + xn, eN+1eN+1−2eN+1

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

28

Innovations and Whitening y = Le yn = en +

n

  • i=0

yn, eiei−2ei Ry = LDL∗ D = diag{en2}

  • Thus a white-noise innovations process can be obtained by

performing and LDL (block matrix) decomposition of Ry

  • Sometimes L is called the canonical modeling filter
  • Similarly L−1 is called the canonical whitening filter
  • The representation y = Le is called the innovations

representation of yn

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

26

slide-8
SLIDE 8

Summary eN+1 = yN+1 − ˆ yN+1 ˆ x|N+1 = ˆ x|N + xn, eN+1eN+1−2eN+1

  • The innovations process will ultimately help us

– Reduce computation – Reduce storage requirements – Exploit structure in the covariance matrix Ry – Give us insight into the solution

  • This is essentially a whitening filter followed by an estimator
  • Same architecture used for order recursive algorithms and linear

FIR/IIR estimation problems

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

31

Similarity of Time- and Order-Updates eN+1 = yN+1 − ˆ yN+1 ˆ x|N+1 = ˆ x|N + xn, eN+1eN+1−2eN+1

  • Note the similarity of these time updates with the order updates
  • btained from the Levinson algorithm
  • 1. Estimate the new observation/variables from all of the previous
  • bservatons/variables
  • 2. The residuals are white and orthogonal
  • 3. Estimate the variable of interest with the residuals
  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

29

References

[1] T. Kailath, A. H. Sayed, and B. Hassibi. Linear Estimation. Prentice Hall, 2000.

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

32

Computational Savings? KoRy = Rxy Ry = LDL∗

  • So far the innovations have given us insight
  • They have not saved us any computation
  • Performing an LDL decomposition of an M × M matrix requires

O(M 3) operations — same amount required to solve the normal equations directly

  • As always, we need additional structure in the matrices to reduce

the computation

  • The state-space formulation gives us a structure that permits us

to solve the normal equations much more efficiently

  • J. McNames

Portland State University ECE 539/639 Innovations

  • Ver. 1.02

30