[PPT] - Predictor-corrector ensemble filters for high-dimensional nonlinear PowerPoint Presentation

SLIDE 1

Predictor-corrector ensemble filters for high-dimensional nonlinear systems and sparse data and regularized ensemble Kalman filter for a wildfire model Jan Mandel University of Colorado at Denver and Health Sciences Center NCAR MMM and IMAGe The DDDAS wildfire team: Lynn Bennethum, Jonathan Beezley, Janice Coen, Craig Douglas, Leo Franca, Minjeong Kim, Bob Kremens, Deng Li, Guan Qin, Tony Vodacek

Supported by National Science Foundation Grant CNS-0325314 and NCAR ASP Faculty Fellowship

SLIDE 2

The challenge: data assimilation into highly nonlinear wildfire models Simulation code is a black box ⇒ ensemble filters

Highly non-gaussian ⇒ predictor-corrector ensemble filters Jan Mandel and

Jonathan Beezley, CCM Report 232, 2006

Need to avoid nonphysical solutions ⇒ regularized EnKF C. J. Johns and J.

Mandel, CCM Report 221, 2005; Env. Ecol. Stat., to appear

Efficient parallel EnKF for large data Jan Mandel, CCM Report 231, 2006
Data assimilation into a one-step reaction-diffusion fire model Jan Mandel,

Lynn S. Bennethum, Jonathan Beezley, Janice L. Coen, Craig C. Douglas, Leopoldo P. Franca, Minjeong Kim, and Anthony Vodacek, CCM Report 233, 2006 http://www-math.cudenver.edu/ccm/reports

SLIDE 3

Data Assimilation: Sequential Statistical Estimation State of the model is probability density p(u) of the system state u Analysis cycle: pa,tk−1 = ⇒ advance pf,tk = ⇒ Bayes pa,tk pf,tk: “forecast” probability density at time tk pa,tk: “analysis” probability density at time tk Model advanced in time by the model At analysis time, new data incorporated by the use of Bayes theorem pa,tk (u) ∝ p (d|u) pf,tk (u) ∝ means proportional; densities are always scaled so that

pdν = 1; ν is some

underlying measure (Lebesgue in finite dimension). p (d|u) is data likelihood = probability density that the data value is d if the system state is u

SLIDE 4

Data Likelihood data likelihood = probability density that the data value is d if the system state is u Data likelihood can be obtained from data error probability density pε (assumed known) and observation function h (u)=what the data would be if there are no errors and the system state is u: P (d − h (u) ∈ A) =

A pεdν ⇒ p (d|u) = pε (d − h (u))

This formula tacitly assumes that data error is additive and it does not depend

n d and u.

All data must be accompanied by an error estimate (=error probability distribution) otherwise it is meaningless.

SLIDE 5

Ensemble Approach Model state (= probability density of the system state) is represented by a weighted ensemble (uk, wk), k = 1, . . . , N uk are system state vectors, wk are positive weights,

N

k=1

wk = 1. Simple case: all weights are same wk = 1/N, {uk} is a sample from model state p, uk ∼ p (values of independent identically distributed random variables with density p)

SLIDE 6

Weighted ensemble How to represent a probability density by the weights of an ensemble? Ensemble members are a sample from some probability density, uk ∼ pπ. We want the weighted ensemble represent another probability density p. For given function f, consider Monte-Carlo approximation of the integral

Ω fpdν =
Ω f p

pπ pπdν ≈

N

k=1

wkf(uk) with the weights wk = 1

N p(uk) pπ(uk).

So, given a sample {uk} from pπ, p ∼ (uk, wk), wk ∝ p(uk)

pπ(uk),

uk ∼ pπ

SLIDE 7

Sequential Importance Sampling (SIS) a.k.a. Sequential Monte Carlo (SMC) a.k.a. Particle Filters (PF) Given forecast ensemble pf ∼ (uf

k, wf k)

Given sample from proposal density

ua

k

∼ pπ

Apply Bayes theorem: pa(u) = p (u|d) ∝ p (d|u) pf (u) = ⇒ (ua

k, wa k) ∼ pa,

wa

k ∝ pa(ua k)

pπ(ua

k) ∝ p (d|ua k)

pf

ua

k

pπ
ua

k

SIS chooses ua

k = uf k (does not change the ensemble) and only updates the

weights = ⇒ already

uf

k

∼ pπ, wf

k ∝ pf(uk) pπ(uk) =

⇒ wa

k ∝ p

d|uf

k

wf

k

SLIDE 8

Sequential Importance Sampling with Resampling (SIR) Trouble with SIS:

1. data likelihoods can be very small, then
2. the analysis ensemble represents the analysis density poorly
3. one or a few of the analysis weights will dominate

Standard solution: SIR (Gordon 1993,...)

1. Resample by choosing ua

k with probability ∝ wa k, set all weights equal

2. rely on stochastic behavior of the model to recover spread.

But there is still trouble:

1. huge ensembles (thousands) are needed because the analysis distribution

is effectively approximated only by those ensemble members that have large weights, and a vast majority of weights is infinitesimal 2. if the model is not stochastic, need artificial perturbation to recover ensemble spread Solution: Predictor-corrector filters (new) Place the analysis ensemble so that the weights are all reasonably large.

SLIDE 9

Predictor-Corrector Filters (new) Given forecast ensemble and a proposal ensemble obtained by a predictor pf ∼ (uf

ℓ , wf ℓ ),

pπ ∼ (ua

k),

Apply Bayes theorem: pa(u) = p (u|d) ∝ p (d|u) pf (u) = ⇒ (ua

k, wa k) ∼ pa,

wa

k ∝ pa(ua k)

pπ(ua

k) ∝ p (d|ua k)

pf

ua

k

pπ
ua

k

Estimate the ratio of densities from (uf

k, wf k) to get the analysis weights

(corrector): wa

k ∝ p

d|uf

k

ewa

k ∝ p

d|ua

k

ek,

ek ≈ pf(ua

k)

pπ(ua

k)

Trouble: 1. density estimates in high dimension are intractable

2. need to estimate far away from and outside of the span of the sample

Solution: 1. the probability densities are not arbitrary: they come from probability measures on spaces of smooth functions, low effective dimension

2. nonparametric estimation that depends only the concept of distance

SLIDE 10

Nonparametric Density Estimation Let {uℓ : ℓ = 1, . . . , N} ∼ p and Bh (x) = {y : y − x < h} be the ball with radius h and center x. The density estimate pN,h(y) = number of uℓ ∈ Bh (x) N measure of Bh (x) → p (y) in probability if p is continuous at y and h = h(N) is the distance to k(N)-th nearest uk from y k (N) → ∞, k (N) /N → 0, N → ∞ (Loftsgaarden-Quesenberry (1965) in Rn and Lebesgue measure), or h = h (N) → 0, N measure of Bh (x) → 0, N → ∞ (Rosenblatt (1956) in Rn, Dabo-Niang (2004) in Banach spaces)

SLIDE 11

Nonparametric Estimation of Importance Weights Recall the analysis weights: wa

k ∝ p

d|ua

k

ek,

ek ≈ pf(ua

k)

pπ(ua

k)

Generalize density estimation to weighted ensemble: (uk, wk) ∼ p,

ℓ:uℓ−x<h

wℓ measure of Bh (x) ≈ p (x) Recall forecast ensemble (uf

k, wf k)N k=1 and proposal ensemble (ua k)N k=1

Substitute and the (generally unknown) measure of Bh (x) falls out: pf

ua

k

pπ
ua

k

≈

ℓ:
uf

ℓ −ua k

<h wf

ℓ

ℓ:
uf

ℓ −ua k

<h

1 N

It remains to choose the norm · in the estimate. The bandwidth is chosen as on the previous page.

SLIDE 12

Choosing the norm in density ratio estimate Need performance independent of dimension = ⇒ use infinite dimension, model state is probability distribution on a space of functions The densities pf, pa, pπ need to exist w.r.t. some underlying measure ν. The ratio of densities pf/pπ does not depend on ν as long as the densities exist and are nonzero but the estimate does. The measure ν is linked to the norm via the measure of the ball, ν (Bh (x)), need Bh (x) > 0 and limh→0 ν (Bh (x)) = 0. The density of probability distribution ϕ of system state w.r.t. ν exists ⇐ ⇒ ∀A : v (A) = 0 = ⇒ ϕ (A) = 0 (Radon-Nikodym theorem)

SLIDE 13

Choosing the norm in density ratio estimate (translation) The norm is linked to the underlying measure via the measure of small balls, which need to have reasonable properties. The probability for a function in the state space to fall into a ball should be positive and have limit zero for small balls. The probability for a function in the state space to fall in a set of measure ν zero should be zero. In infinite dimension, this is far from automatic, and restricts the choice of the norm in the density estimate! Lebesgue measure in infinite dimension does not exist = ⇒ use gaussian measure At least the initial ensemble should be from probability distribution that has density w.r.t. the measure ν.

SLIDE 14

Smooth random fields The initial ensemble is constructed by a perturbation of initial condition by smooth random fields; this gives the underlying measure ν and associated norm Random smooth function (random field) from some space V u(x) =

∞

n=1

λnvnen (x) , νn ∼ N (0, 1) {en} is Fourier basis of V , 1D example: en (x) = sin (nx) λn > 0, λn − → 0 determine smoothness: faster decay of λn = ⇒ smoother u Well known in geosciences as “random fields by FFT” (Evensen 1994,...). λn and en are eigenvalues and eigenvectors of the covariance of the random

field. Theory of gaussian measures: must assume ∞

n=1 λ2 n < ∞.

SLIDE 15

Gaussian measures and Sobolev spaces Generating random fields by FFT is in fact sampling from a gaussian measure ν with the kernel K (u) = e−u2

2

n a space of functions with the norm

u2 =

∞

n=1

|cn|2 λn , u =

∞

n=1

cnen. For certain λn, the norm is well known in PDEs as Sobolev norm and

u : u2 < ∞
is Sobolev space. 1D example:

λn = 1 1 + n2m, u =

∞

n=1

cn sin (nx) u2 = 2 π

∞

n=1
1 + n2m

|cn|2 =

π

0 |u|2 + |u(m)|2dx

Solutions of PDEs are naturally in Sobolev spaces! Smoothness of function is measured by the highest order of the Sobolev space it is in. It is well known that the Sobolev norm as written here is equivalent to the norm involving all derivatives up to order m.

SLIDE 16

Definition of a gaussian measure, and its Cameron-Martin space The gaussian measure ν is determined uniquely by the measure of cylindrical sets with finite dimensional base: ν

  

∞

n=1

vnen :

d

n=1

vnen ∈ Sd

   =

Sd

d

n=1

e−|un|2

2λn

√2πλn du1 . . . dud = const(d)

Sd

e−u2

2 du1 . . . dud

In this sense, e−u2

2

is the “kernel” of the measure ν. The space V =

u : u2 < ∞
is known as the Cameron-Martin space of the

measure ν. A gaussian measure is concentrated on and rotation invariant on its Cameron-Martin space.

SLIDE 17

Numerical results for predictor-corrector ensemble filters Choose: predictor by EnKF (a new version of EnKF for weighted ensemble): algorighm called EnKF+SIS corrector with density estimation with bandwidth by k-th nearest in the proposal ensemble, k = N1/2 Norm (distance function) from absolute value for scalar problems, Sobolev norm for problems involving functions. For comparison, using SIS with very large ensemble for “exact” solution. Forecast also called prior, and analysis is posterior.

SLIDE 18

Predictor-Corrector EnKF+SIS

−1.5 −1 −0.5 0.5 1 1.5 0.2 0.4 0.6 0.8 1 1.2 1.4 EnKF −1.5 −1 −0.5 0.5 1 1.5 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Prior distribution −1.5 −1 −0.5 0.5 1 1.5 1 2 3 4 5 6 7 8 EnKF SIS −1.5 −1 −0.5 0.5 1 1.5 1 2 3 4 5 6 Data likelihood

EnKF Predictor Data Forecast SIS Proposal Analysis Corrector The images are from an actual numerical experiment (next page)

SLIDE 19

EnKF+SIS Numerical Experiment

State u in discrete version of Sobolev

space Hβ(0,2π) of periodic functions

Observation function h(u)=Hu=u(3π/4)
Ensemble size 100
Forecast generated by Gaussian

distribution in Hβ(0,2π) in frequency domain (Fourier transform)

Data distribution generated from a

histogram, bi-modal, so analysis is strongly non-Gaussian

−1.5 −1 −0.5 0.5 1 1.5 1 2 3 4 5 6 Data likelihood

0.071 0.192 10 100 0.078 0.195 10 30 0.078 0.190 10 10 0.078 0.223 5 100 0.063 0.214 5 30 0.063 0.200 5 10 0.064 0.214 4 100 0.063 0.218 4 30 0.064 0.215 4 10 0.088 0.244 3 100 0.065 0.257 3 30 0.067 0.261 3 10 0.078 0.305 2 100 0.074 0.318 2 30 0.060 0.331 2 10 0.060 0.456 1 100 0.054 0.425 1 30 0.078 0.414 1 10 0.081 0.524 100 0.080 0.510 30 0.075 0.483 10 Sis/enkf Enkf

β

Dof Magnitude of residual error h(u)-d (0.5=100%), averaged over 40 runs

SLIDE 20

Scalar bimodal prior - ensemble size 100

−3 −2 −1 1 2 3 0.2 0.4 0.6 0.8 1 1.2 1.4 Prior Likelihood −2 −1.5 −1 −0.5 0.5 1 1.5 2 0.2 0.4 0.6 0.8 1 1.2 1.4

SIS

Exact Estimated posterior −2 −1.5 −1 −0.5 0.5 1 1.5 2 0.2 0.4 0.6 0.8 1 1.2 1.4

ENKF

Exact Estimated posterior −2 −1.5 −1 −0.5 0.5 1 1.5 2 0.5 1 1.5

ENKF+SIS

Exact Estimated posterior

SIS was not close, EnKF did not see nongaussian density. EnKF-SIS was best.

SLIDE 21

High-dimensional example, bimodal prior Space of functions on [0, π] of the form u =

d

n=1

cn sin (nx) The ensemble size N = 100 The dimension of the state space d = 500 The eigenvalues of the covariance λn = n−3 to generate the initial ensemble and λn = n−2 for density estimation.

SLIDE 22

High-dimensional example: bimodal prior

Prior Ensemble

0.5 1 1.5 2 2.5 3 −8 −6 −4 −2 2 4 6 8

Posterior: SIS

0.5 1 1.5 2 2.5 3 −8 −6 −4 −2 2 4 6 8

Posterior: EnKF

0.5 1 1.5 2 2.5 3 −8 −6 −4 −2 2 4 6 8

Posterior: EnKF−SIS

0.5 1 1.5 2 2.5 3 −8 −6 −4 −2 2 4 6 8

SIS does OK, EnKF cannot see bimodal distribution, EnKF+SIS is OK.

SLIDE 23

High-dimensional example: sparse data, gaussian case

Prior Ensemble

0.5 1 1.5 2 2.5 3 −8 −6 −4 −2 2 4 6 8

Posterior: SIS

0.5 1 1.5 2 2.5 3 −8 −6 −4 −2 2 4 6 8

Posterior: EnKF

0.5 1 1.5 2 2.5 3 −8 −6 −4 −2 2 4 6 8

Posterior: EnKF−SIS

0.5 1 1.5 2 2.5 3 −8 −6 −4 −2 2 4 6 8

SIS cannot make a large update, EnKF and EnKF+SIS are fine.

SLIDE 24

Filtering for a Stochastic ODE Simplest ODE model problem (PDE fire model adds spatial diffusion) ˙ x = −f′ (x) + κ white noise, The potential f(x) = −2x2 + x4 has 3 equilibria and the solution switches between the stable equilibria ±1.

−2 −1 1 2 −1 1 2 3 4 time x

SLIDE 25

Filtering results for a stochastic ODE

1 2 3 4 5 6 7 −1.5 −1 −0.5 0.5 1 1.5 time x SIS ENKF ENKF+SIS Optimal Data

One transition from +1 to -1
data are sampled from one

“reference” solution

ensemble size 100
mean shown
optimal solution advances

probability distribution “exactly” by numerical solution of the Fokker-Planck equation and applies the Bayes theorem numerically to probability densities discretized by piecewise linear functions. EnKF+SIS tracks the solution far better than either SIS or EnKF, which lag the transition by more analysis cycles.

SLIDE 26

Reaction-convection-diffusion wildfire model Conservation of heat and conservation of fuel: dT dt = ∇ · (k∇T) + − → v · ∇T + A

Se−B/(T−Ta) − C (T − Ta)
,

dS dt = −CSSe−B/(T−Ta), T > Ta, T (K) is the temperature of the fire layer, S ∈ [0, 1] is the fuel supply mass fraction (the relative amount of fuel remaining), k (m2s−1) is the diffusion coefficient, A (Ks−1) is the temperature rise per second at the maximum burning rate with full initial fuel load and no cooling present, B (K) is the proportionality coefficient in the modified Arrhenius law, C (K−1) is the scaled coefficient of the heat transfer to the environment, CS (s−1) is the fuel relative disappearance rate, Ta (K) is the ambient temperature, − → v (ms−1) is the given wind speed from the atmospheric data or atmospheric model.

SLIDE 27

Reaction-convection-diffusion wildfire model Conservation of heat: dT dt = ∇ · (k∇T) + − → v · ∇T + U′(T), The potential U has 3 equilibria, similarly as the ODE model before:

200 400 600 800 1000 1200 1400 −60 −50 −40 −30 −20 −10 10 Ta=300 Tc=1200 Ti=670 Temperature(K) Potential U

The middle unstable equilibrium is auto-ignition temperature. In the absence of fuel balance equation, a combustion wave forms switching from the higher potential equilibrium (not burning) to the lower potential equilibrium (burning state). The diffusion drives the wave movement even with no convection (no wind).

SLIDE 28

Combustion front profile The fuel balance equation makes the fuel disappear and the combustion front will leave behind burned-out fuel at ambient temperature.

1.125 1.175 1.225 1.275 1.325 x 10

4

200 400 600 800 1000 time(seconds) Temperature(C)

Coefficients were identified in 1D from measured fire profile at fixed sensor location (solid line, Kremmens et al., 2003) and combustion wave

speed. Simulation is the dashed line.

SLIDE 29

Data assimilation by EnKF EnKF analysis step is the least squares fit to find the analysis ensemble

ua

k

to minimize the sum of:

(ua

k−uf k)TQ−1 (ua k−uf k) difference of forecast and analysis ensemble, weighted

by the ensemble covariance (dk − Hua

k)TR−1(dk − Hua k) the difference between data and the value of

bservation function (here, multiplication by observation matrix), weighed by

the data error covariance. Each member has data perturbed by sampling data error distribution.

SLIDE 30

Regularization and EnKF (new) EnKF produces analysis ensemble in the span of forecast ensemble. This results in nonphysical states esp. if the states in the span are far away from the data. For cheap numerical methods and highly nonlinear problem, EnKF can easily knock the state out of the stability region. Therefore, we add regularization (penalization) term to the least squares to prevent analysis with large gradients: (▽ua

k − ▽uf k)TD−1(▽ua k − ▽uf k)

This is easily implemented by running the EnKF formulas second time with the independent observation ▽ua

k = ▽uf k with error covariance D. This results

in the same posterior distribution as a direct implementation (though not the same analysis ensemble). This regularization is inspired by Tikhonov regularization in solving ill-posed problems. Penalizing large gradients seems more effective in preventing nonphysical solutions than penalizing the values of the solution.

SLIDE 31

Efficient parallel implementation of EnKF (new) Observation matrix H is not needed; instead, call observation function h for each ensemble member. Algebra gives exactly the same result. Much easier software development. Alternative EnKF implementation using the Sherman-Morrison-Woodbury formula, with complexity growing linearly in the number of data points, using cheap Choleski decomposition instead of SVD. Efficient parallel implementation using SCALAPACK and MPI. Each processor runs one or more ensemble members, and dense parallel linear algebra in EnKF analysis ties them together.

SLIDE 32

SLIDE 33

SLIDE 34

SLIDE 35

SLIDE 36

SLIDE 37

SLIDE 38

SLIDE 39

SLIDE 40

SLIDE 41

SLIDE 42

SLIDE 43

SLIDE 44

SLIDE 45

SLIDE 46

SLIDE 47

SLIDE 48

SLIDE 49

SLIDE 50

SLIDE 51

SLIDE 52

SLIDE 53

SLIDE 54

SLIDE 55

SLIDE 56

SLIDE 57

SLIDE 58

SLIDE 59

SLIDE 60

SLIDE 61

SLIDE 62

SLIDE 63

SLIDE 64

SLIDE 65

SLIDE 66

SLIDE 67

SLIDE 68

SLIDE 69

SLIDE 70

SLIDE 71

Conclusion and future work Predictor-corrector filter with EnKF as predictor combines the advantages of accurate large updates by EnKF with small ensembles and correct handing of nongaussian distributions by particle filters. Regularized EnKF works for the fire problem but finding the parameters is tricky - regularization, spread of initial ensemble, additional perturbation after each analysis step. Perturbation by random smooth deformation of space are

essential. Adding random fields alone does not work.

Future: Implementation of predictor-corrector filter for the fire model, to handle non-gaussian distributions, centered about burning - not burning states. Predictor-corrector filter with predictor by nudging. Future: new generation filter that modifies the forecast by deforming the space to move features of the state rather than making linear combinations. Linear combinations result in several fainter firelines... takes large ensembles to get rid of. Is there a collection of numerical indicators to evaluate filter performance?