[PPT] - A General Class of Score-Driven Smoothers Giuseppe Buccheri Scuola PowerPoint Presentation

SLIDE 1

A General Class of Score-Driven Smoothers

Giuseppe Buccheri

Scuola Normale Superiore Joint work with Giacomo Bormettia, Fulvio Corsib and Fabrizio Lilloa University of Bolognaa, University of Pisa and City University of Londonb

IAAE 2018 Montr´ eal

June 27, 2018

SLIDE 2

Key facts

Following Cox (1981), we divide time-varying parameter models into two classes:

1. Parameter-driven models: parameters evolve in time based on idiosyncratic

innovations (e.g. local level, stochastic volatility, stochastic intensity, etc)

2. Observation-driven models: parameters evolve in time based on nonlinear functions of

past observations (e.g. GARCH, MEM, DCC, Score-Driven models) We shall see that there is a trade-off between:

1. Estimation complexity and computational speed

◮ Here observation-driven models are superior

2. Flexibility

◮ Here parameter-driven models are superior

Why a difference in flexibility?

◮ Observation-driven: Var[ft+1|Ft] = 0 but Var[ft+1] > 0 ◮ Parameter-driven: Var[ft+1|Ft] > 0 and Var[ft+1] > 0

SLIDE 3

A different interpretation

Consider a standard GARCH(1,1) model: rt = σtǫt, ǫt ∼ N(0, 1) σ2

t+1 = c + ar 2 t + bσ2 t

There are two possible interpretations for the dynamic model σ2

t+1:

1. It is the true DGP of volatility
2. Since σ2

t+1 is Ft-measurable, it can be seen as a filter, i.e. σ2 t+1 = E[ζ2 t+1|Ft],

where ζt+1 is the volatility of the true, parameter-driven, DGP (e.g. the SV model) Assumption 1 is more common in the financial econometrics literature while 2 is closer to the filtering literature.

SLIDE 4

An example: ARCH filtering and smoothing

◮ ”Filtering and forecasting with misspecified ARCH models I. Getting the right

variance with the wrong model” . Nelson (1992), JoE

◮ ”Asymptotic filtering theory for univariate ARCH models”

. Nelson & Foster (1994), Ecta

◮ ”Filtering and forecasting with misspecified ARCH models II. Making the right

forecast with the wrong model” . Nelson & Foster (1995), JoE

◮ ”Asymptotically Optimal Smoothing with Arch Models”

. Nelson (1996), Ecta Quoting Nelson (1992): ”Note that our use of the term ‘estimate’ corresponds to its use in the filtering literature rather than the statistics literature; that is, an ARCH model with (given) fixed parameters produces ‘estimates’ of the true underlying conditional covariance matrix at each point in time in the same sense that a Kalman filter produces ‘estimates’ of unobserved state variables in a linear system”

SLIDE 5

Motivations and Objectives

A key observation

◮ Observation-driven models as DGP’s −

→ all relevant information is contained in past observations − → no room for smoothing

◮ Observation-driven models as filters −

→ can benefit from using all

bservations −

→ smoothing is useful Related literature

◮ Little attention has been paid to the problem of smoothing with misspecified

bservation-driven models. Harvey (2013) proposed a smoothing algorithm for

a dynamic Student t location model. Objective of this paper

◮ Filling the gap by proposing a methodology to smooth filtered estimates of a

general class of observation-driven models, namely Score-Driven models of Creal et al. (2013) and Harvey (2013)

SLIDE 6

Filtering and smoothing in linear Gaussian models

Consider the general linear Gaussian model: yt = Zαt + ǫt, ǫt ∼ N(0, H) αt+1 = c + Tαt + ηt, ηt ∼ N(0, Q) Kalman forward filter → at+1 = E[αt+1|Ft], Pt+1 = Var[αt+1|Ft] vt = yt − Zat, Ft = ZPtZ ′ + H at+1 = c + Tat + Ktvt, Pt+1 = TPt(T − KtZ)′ + Q Kt = TPtZ ′F −1

t

and t = 1, . . . , n Kalman backward smoother → ˆ αt = E[αt|FT ], ˆ Pt = Var[αt|FT ], t ≤ n. rt−1 = Z ′F −1

t

vt + L′

trt,

Nt−1 = Z ′F −1

t

Z + L′

tNtLt

ˆ αt = at + Ptrt−1, ˆ Pt = Pt − PtNt−1Pt Lt = T − KtZ, rn = 0, Nn = 0 and t = n, . . . , 1.

◮ The conditional density is written as log p(yt|Ft−1) = − 1 2 log |Ft| − 1 2v′ tF −1 t

vt

◮ As Z, H, T, Q are constant, the variance recursion has a fixed point solution ¯

P that is referred to as the steady state of the Kalman filter

SLIDE 7

A more general representation

Introduce the score and information matrix of the conditional density: ∇t =

∂ log p(yt|Ft−1)

∂a′

t

′

, It|t−1 = Et−1[∇t∇′

t]

After some simple algebra, we can re-write Kalman filter and smoothing recursions for the mean in the steady state as: at+1 = c + Tat + R∇t, (1) where R = T ¯ P, and: rt−1 = ∇t + L′

trt

(2) ˆ αt = at + T −1Rrt−1 (3) where Lt = T − RIt|t−1.

◮ Kalman recursions for the mean re-parametrized in terms of ∇t and It|t−1 ◮ The new representation is more general, as it only relies on the conditional density

p(yt|Ft−1), which is defined for any observation-driven model.

◮ The Kalman filter is a score-driven process

SLIDE 8

The Score-driven Smoother (SDS)

◮ Based on eq. (1), score-driven models can be viewed as approximate filters for

nonlinear non-Gaussian state-space models

◮ By analogy, we can regard eq. (2), (3) as an approximate smoother for nonlinear

non-Gaussian models Assume yt|Ft−1 ∼ p(yt|ft, Θ), where ft is a vector of t.v.p. and Θ collects all static

parameters. In score-driven models:

ft+1 = ω + Ast + Bft (4) where st = St∇t, ∇t = ∂ log p(yt|ft,Θ)

∂ft

and St = I−α

t|t−1, α ∈ [0, 1]. We generalize eq. (2),

(3) as: rt−1 = st + (B − AStIt|t−1)′rt (5) ˆ ft = ft + B−1Art−1 (6) t = n, . . . , 1, rn = 0. We name the smoother (5), (6) “Score-Driven Smoother” (SDS). It has same structure as Kalman backward smoothing recursions but uses the score of the non-Gaussian density and it is nonlinear in the observations

SLIDE 9

SDS methodology

yt|Ft−1 ∼ p(yt|ft, Θ) ft+1 = ω + Ast + Bft

1. Estimation of static parameters:

˜ Θ = argmax

Θ n

t=1

log p(yt|ft, Θ)

2. Forward filter:

ft+1 = ˜ ω + ˜ Ast + ˜ Bft

3. Backward smoother:

rt−1 = st + (˜ B − ˜ AStIt|t−1)′rt ˆ ft = ft + ˜ B−1 ˜ Art−1

◮ SDS is computationally simple (maximization of closed-form likelihood +

forward/smoothing recursion)

◮ SDS is general, in that it can handle any observation density p(yt|ft, Θ), with a

potentially large number of time-varying parameters

SLIDE 10

Example: GARCH-SDS

Consider the model: yt = σtǫt, ǫt ∼ NID(0, 1) The predictive density is thus: p(yt|σ2

t ) =

1 √2πσt e

−

y2 t 2σ2 t

Setting ft = σ2

t and St = I−1 t|t−1, eq. (4) reduces to:

ft+1 = ω + a(y 2

t − ft) + bft

i.e. the standard GARCH(1,1) model. The smoothing recursions (5), (6) reduce to: rt−1 = y 2

t − ft + (b − a)rt

ˆ ft = ft + b−1art−1

SLIDE 11

Example: GARCH-SDS

1000 2000 3000 4000 0.1 0.2 0.3 0.4 1000 2000 3000 4000 0.2 0.3 0.4 0.5 1000 2000 3000 4000 0.2 0.4 0.6 0.8 1 1000 2000 3000 4000 0.1 0.2 0.3 0.4

Figure: Filtered (blue dotted), and smoothed (red) estimates of GARCH(1,1) model

SLIDE 12

Other examples

◮ MEM (Engle, 2002)

yt = µtǫt, ǫt ∼ Gamma(α) where Gamma(α) = Γ(α)−1ǫα−1

t

ααe−αǫt

◮ AR(1) with a time-varying coefficient

yt = c + αtyt−1 + ǫt, ǫt ∼ N(0, q2)

◮ Wishart-GARCH (Gorgi et al. 2018)

rt|Ft−1 ∼ Nk(0, Vt) Xt|Ft−1 ∼ Wk(Vt/ν, ν) where Nk(0, Vt) is a multivariate zero-mean normal distribution with covariance matrix Vt and Wk(Vt/ν, ν) is a Wishart distribution with mean Vt and degrees of freedom ν ≥ p

SLIDE 13

Other example: MEM-SDS

1000 2000 3000 4000 0.1 0.2 0.3 0.4 1000 2000 3000 4000 0.1 0.2 0.3 0.4 0.5 0.6 1000 2000 3000 4000 0.2 0.4 0.6 0.8 1 1000 2000 3000 4000 0.1 0.2 0.3 0.4

Figure: Filtered (blue dotted), and smoothed (red) estimates of MEM(1,1) model

SLIDE 14

Other example: t.v. AR(1)-SDS

1000 2000 3000 4000 0.2 0.4 0.6 0.8 1 1000 2000 3000 4000 0.2 0.4 0.6 0.8 1000 2000 3000 4000 0.2 0.4 0.6 0.8 1000 2000 3000 4000 0.4 0.6 0.8 1

Figure: Filtered (blue dotted), and smoothed (red) estimates of autoregressive coefficient

f AR(1) model

SLIDE 15

Other example: Wishart-GARCH-SDS

100 200 300 400 0.5 1 1.5 2 2.5 3 10 -3 V(1,1) Simulated X(1,1) Simulated V(1,1) Wishart-GARCH Wishart-GARCH-SDS 100 200 300 400

4
3
2
1

1 2 3 10 -3 V(1,2) Simulated X(1,2) Simulated V(1,2) Wishart-GARCH Wishart-GARCH-SDS

Figure: Comparison among simulated observations of Xt (grey lines), simulated true covariances Vt (black lines), filtered (blue dotted lines) and smoothed (red lines) (co)variances of realized Wishart-GARCH model in the case k = 5. We show the variance corresponding to the first asset on the left and the covariance between the first and the second asset on the right.

SLIDE 16

Kalman smoother vs SDS in linear non-Gaussian models

AR(1) + noise process: yt = αt + ǫt, ǫt ∼ t(0, σ2

ǫ, ν)

αt+1 = c + φαt + ηt, ηt ∼ N(0, σ2

η)

Signal-to-noise ratio: δ =

σ2

η

σ2

ǫ , c = 0.01, φ = 0.95

◮ Observation density: p(yt|ft; ϕ, β) = Γ[(β+1)/2] Γ(β/2)ϕ√ πβ

1 + (yt−ft )2

βϕ2

−(β+1)/2

SDF-KF SDS-KS δ 0.1 1 10 0.1 1 10 ν = 3 MSE 0.8610 0.9522 0.9991 0.8093 0.8876 0.9618 MAE 0.9389 0.9859 1.0036 0.9128 0.9634 1.0169 ν = 5 MSE 0.9552 0.9912 1.0032 0.9376 0.9880 1.0058 MAE 0.9792 0.9973 0.9999 0.9698 0.9949 1.0112 ν = 8 MSE 0.9877 0.9981 1.0029 0.9844 0.9954 1.0117 MAE 0.9939 0.9992 1.0039 0.9917 0.9982 1.0136

SLIDE 17

Nonlinear models

1. Stochastic volatility with Gaussian measurement density

rt = e

θt 2 ǫt,

ǫt ∼ N(0, 1) θt+1 = γ + φθt + ηt, ηt ∼ N(0, σ2

η)

2. Stochastic volatility with non-Gaussian measurement density

rt = e

θt 2 ǫt,

ǫt ∼ t(0, 1, ν) θt+1 = γ + φθt + ηt, ηt ∼ N(0, σ2

η)

3. Stochastic intensity

p(yt|λt) = λyt

t e−λt

yt! , θt = log λt θt+1 = γ + φθt + ηt, ηt ∼ N(0, σ2

η)

Two estimation methods:

◮ NAIS method of Koopman et al. (2014), based on importance sampling, for 1, 2, 3 ◮ QML approximate method of Harvey et al. (1994) for 1, 2

Loss measures computed on N = 1000 simulations of n = 2000 observations.

SLIDE 18

Nonlinear models - SV with Gaussian disturbances

◮ Observation density: p(yt|ft) = Γ β+1

2

Γ β

2

√

πβeft

1 + 1

β

yt

e

ft 2

2− β+1

2

◮ Similar to the Beta-t-EGARCH model of Harvey and Chakravarty (2008) ◮ Coefficient of variation: CV = exp

σ2

η

1−φ2

− 1

CV 0.1 1 5 10 0.1 1 5 10 MSE MAE φ = 0.98 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 SDS 0.9988 1.0050 1.0001 1.0162 1.0004 1.0043 1.0017 1.0097 QML 1.4153 1.3880 1.3333 1.3138 1.1797 1.1739 1.1564 1.1475 φ = 0.95 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 SDS 1.0057 0.9983 0.9988 1.0059 1.0034 1.0023 1.0024 1.0059 QML 1.3131 1.3737 1.3246 1.3168 1.1450 1.1758 1.1567 1.1524 φ = 0.90 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 SDS 1.0076 0.9956 0.9974 1.0086 1.0044 1.0010 1.0033 1.0093 QML 1.2371 1.3157 1.2893 1.2750 1.1109 1.1508 1.1422 1.1370

SLIDE 19

Nonlinear models - SV with non-Gaussian disturbances

◮ Observation density: p(yt|ft) = Γ β+1

2

Γ β

2

√

πβeft

1 + 1

β

yt

e

ft 2

2− β+1

2

◮ Similar to the Beta-t-EGARCH model of Harvey and Chakravarty (2008) ◮ Coefficient of variation: CV = exp

σ2

η

1−φ2

− 1

CV 0.1 1 5 10 0.1 1 5 10 MSE MAE φ = 0.98, ν = 3 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 SDS 1.0026 0.9950 1.0140 1.0169 1.0015 0.9997 1.0077 1.0098 QML 1.3962 1.2553 1.2125 1.1998 1.1735 1.1184 1.1013 1.0939 φ = 0.95, ν = 3 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 SDS 1.0014 1.0049 1.0121 1.0200 1.0008 1.0031 1.0064 1.0105 QML 1.3058 1.2639 1.2447 1.2246 1.1354 1.1230 1.1158 1.1056 φ = 0.90, ν = 3 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 SDS 1.0020 1.0033 1.0149 1.0221 1.0016 1.0023 1.0081 1.0117 QML 1.2306 1.2325 1.2262 1.2200 1.1026 1.1075 1.1062 1.1034

SLIDE 20

Nonlinear models - Stochastic intensity

◮ Observation density: p(yt|ft) = e−eft eft yt yt! σ2

η × 100

0.1 0.5 1 0.1 0.5 1 MSE MAE φ = 0.98 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 SDS 1.0149 1.0281 1.0521 1.0067 1.0132 1.0244 φ = 0.95 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 SDS 1.0203 1.0176 1.0254 1.0097 1.0083 1.0120 φ = 0.90 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 SDS 1.0310 1.0160 1.0205 1.0142 1.0079 1.0099

SLIDE 21

Comparison with exact smoothers

200 400 600 800 1000 1200 1400 1600 1800 2000 5 10 SV Gaussian Simulated NAIS SDS 200 400 600 800 1000 1200 1400 1600 1800 2000 5 10 SV non-Gaussian Simulated NAIS SDS 200 400 600 800 1000 1200 1400 1600 1800 2000 0.6 0.8 1 1.2 1.4 1.6 Stochastic intensity Simulated NAIS SDS

SLIDE 22

Conclusions

The SDS extends score-driven filtered estimates to include the effect of present and future observations. Main features of the SDS

◮ Useful for off-line signal-reconstruction and analysis ◮ General, as it maintains the same form for any observation density ◮ Computationally simple, being a backward recursion ◮ Can easily handle models with high-dimensional t.v.p. vector ◮ Very close to exact smoothers based on IS (MSE loss lower, on average, than