A General Class of Score-Driven Smoothers Giuseppe Buccheri Scuola - - PowerPoint PPT Presentation
A General Class of Score-Driven Smoothers Giuseppe Buccheri Scuola - - PowerPoint PPT Presentation
A General Class of Score-Driven Smoothers Giuseppe Buccheri Scuola Normale Superiore Joint work with Giacomo Bormetti a , Fulvio Corsi b and Fabrizio Lillo a University of Bologna a , University of Pisa and City University of London b IAAE 2018
Key facts
Following Cox (1981), we divide time-varying parameter models into two classes:
- 1. Parameter-driven models: parameters evolve in time based on idiosyncratic
innovations (e.g. local level, stochastic volatility, stochastic intensity, etc)
- 2. Observation-driven models: parameters evolve in time based on nonlinear functions of
past observations (e.g. GARCH, MEM, DCC, Score-Driven models) We shall see that there is a trade-off between:
- 1. Estimation complexity and computational speed
◮ Here observation-driven models are superior
- 2. Flexibility
◮ Here parameter-driven models are superior
Why a difference in flexibility?
◮ Observation-driven: Var[ft+1|Ft] = 0 but Var[ft+1] > 0 ◮ Parameter-driven: Var[ft+1|Ft] > 0 and Var[ft+1] > 0
A different interpretation
Consider a standard GARCH(1,1) model: rt = σtǫt, ǫt ∼ N(0, 1) σ2
t+1 = c + ar 2 t + bσ2 t
There are two possible interpretations for the dynamic model σ2
t+1:
- 1. It is the true DGP of volatility
- 2. Since σ2
t+1 is Ft-measurable, it can be seen as a filter, i.e. σ2 t+1 = E[ζ2 t+1|Ft],
where ζt+1 is the volatility of the true, parameter-driven, DGP (e.g. the SV model) Assumption 1 is more common in the financial econometrics literature while 2 is closer to the filtering literature.
An example: ARCH filtering and smoothing
◮ ”Filtering and forecasting with misspecified ARCH models I. Getting the right
variance with the wrong model” . Nelson (1992), JoE
◮ ”Asymptotic filtering theory for univariate ARCH models”
. Nelson & Foster (1994), Ecta
◮ ”Filtering and forecasting with misspecified ARCH models II. Making the right
forecast with the wrong model” . Nelson & Foster (1995), JoE
◮ ”Asymptotically Optimal Smoothing with Arch Models”
. Nelson (1996), Ecta Quoting Nelson (1992): ”Note that our use of the term ‘estimate’ corresponds to its use in the filtering literature rather than the statistics literature; that is, an ARCH model with (given) fixed parameters produces ‘estimates’ of the true underlying conditional covariance matrix at each point in time in the same sense that a Kalman filter produces ‘estimates’ of unobserved state variables in a linear system”
Motivations and Objectives
A key observation
◮ Observation-driven models as DGP’s −
→ all relevant information is contained in past observations − → no room for smoothing
◮ Observation-driven models as filters −
→ can benefit from using all
- bservations −
→ smoothing is useful Related literature
◮ Little attention has been paid to the problem of smoothing with misspecified
- bservation-driven models. Harvey (2013) proposed a smoothing algorithm for
a dynamic Student t location model. Objective of this paper
◮ Filling the gap by proposing a methodology to smooth filtered estimates of a
general class of observation-driven models, namely Score-Driven models of Creal et al. (2013) and Harvey (2013)
Filtering and smoothing in linear Gaussian models
Consider the general linear Gaussian model: yt = Zαt + ǫt, ǫt ∼ N(0, H) αt+1 = c + Tαt + ηt, ηt ∼ N(0, Q) Kalman forward filter → at+1 = E[αt+1|Ft], Pt+1 = Var[αt+1|Ft] vt = yt − Zat, Ft = ZPtZ ′ + H at+1 = c + Tat + Ktvt, Pt+1 = TPt(T − KtZ)′ + Q Kt = TPtZ ′F −1
t
and t = 1, . . . , n Kalman backward smoother → ˆ αt = E[αt|FT ], ˆ Pt = Var[αt|FT ], t ≤ n. rt−1 = Z ′F −1
t
vt + L′
trt,
Nt−1 = Z ′F −1
t
Z + L′
tNtLt
ˆ αt = at + Ptrt−1, ˆ Pt = Pt − PtNt−1Pt Lt = T − KtZ, rn = 0, Nn = 0 and t = n, . . . , 1.
◮ The conditional density is written as log p(yt|Ft−1) = − 1 2 log |Ft| − 1 2v′ tF −1 t
vt
◮ As Z, H, T, Q are constant, the variance recursion has a fixed point solution ¯
P that is referred to as the steady state of the Kalman filter
A more general representation
Introduce the score and information matrix of the conditional density: ∇t =
- ∂ log p(yt|Ft−1)
∂a′
t
′
, It|t−1 = Et−1[∇t∇′
t]
After some simple algebra, we can re-write Kalman filter and smoothing recursions for the mean in the steady state as: at+1 = c + Tat + R∇t, (1) where R = T ¯ P, and: rt−1 = ∇t + L′
trt
(2) ˆ αt = at + T −1Rrt−1 (3) where Lt = T − RIt|t−1.
◮ Kalman recursions for the mean re-parametrized in terms of ∇t and It|t−1 ◮ The new representation is more general, as it only relies on the conditional density
p(yt|Ft−1), which is defined for any observation-driven model.
◮ The Kalman filter is a score-driven process
The Score-driven Smoother (SDS)
◮ Based on eq. (1), score-driven models can be viewed as approximate filters for
nonlinear non-Gaussian state-space models
◮ By analogy, we can regard eq. (2), (3) as an approximate smoother for nonlinear
non-Gaussian models Assume yt|Ft−1 ∼ p(yt|ft, Θ), where ft is a vector of t.v.p. and Θ collects all static
- parameters. In score-driven models:
ft+1 = ω + Ast + Bft (4) where st = St∇t, ∇t = ∂ log p(yt|ft,Θ)
∂ft
and St = I−α
t|t−1, α ∈ [0, 1]. We generalize eq. (2),
(3) as: rt−1 = st + (B − AStIt|t−1)′rt (5) ˆ ft = ft + B−1Art−1 (6) t = n, . . . , 1, rn = 0. We name the smoother (5), (6) “Score-Driven Smoother” (SDS). It has same structure as Kalman backward smoothing recursions but uses the score of the non-Gaussian density and it is nonlinear in the observations
SDS methodology
yt|Ft−1 ∼ p(yt|ft, Θ) ft+1 = ω + Ast + Bft
- 1. Estimation of static parameters:
˜ Θ = argmax
Θ n
- t=1
log p(yt|ft, Θ)
- 2. Forward filter:
ft+1 = ˜ ω + ˜ Ast + ˜ Bft
- 3. Backward smoother:
rt−1 = st + (˜ B − ˜ AStIt|t−1)′rt ˆ ft = ft + ˜ B−1 ˜ Art−1
◮ SDS is computationally simple (maximization of closed-form likelihood +
forward/smoothing recursion)
◮ SDS is general, in that it can handle any observation density p(yt|ft, Θ), with a
potentially large number of time-varying parameters
Example: GARCH-SDS
Consider the model: yt = σtǫt, ǫt ∼ NID(0, 1) The predictive density is thus: p(yt|σ2
t ) =
1 √2πσt e
−
y2 t 2σ2 t
Setting ft = σ2
t and St = I−1 t|t−1, eq. (4) reduces to:
ft+1 = ω + a(y 2
t − ft) + bft
i.e. the standard GARCH(1,1) model. The smoothing recursions (5), (6) reduce to: rt−1 = y 2
t − ft + (b − a)rt
ˆ ft = ft + b−1art−1
Example: GARCH-SDS
1000 2000 3000 4000 0.1 0.2 0.3 0.4 1000 2000 3000 4000 0.2 0.3 0.4 0.5 1000 2000 3000 4000 0.2 0.4 0.6 0.8 1 1000 2000 3000 4000 0.1 0.2 0.3 0.4
Figure: Filtered (blue dotted), and smoothed (red) estimates of GARCH(1,1) model
Other examples
◮ MEM (Engle, 2002)
yt = µtǫt, ǫt ∼ Gamma(α) where Gamma(α) = Γ(α)−1ǫα−1
t
ααe−αǫt
◮ AR(1) with a time-varying coefficient
yt = c + αtyt−1 + ǫt, ǫt ∼ N(0, q2)
◮ Wishart-GARCH (Gorgi et al. 2018)
rt|Ft−1 ∼ Nk(0, Vt) Xt|Ft−1 ∼ Wk(Vt/ν, ν) where Nk(0, Vt) is a multivariate zero-mean normal distribution with covariance matrix Vt and Wk(Vt/ν, ν) is a Wishart distribution with mean Vt and degrees of freedom ν ≥ p
Other example: MEM-SDS
1000 2000 3000 4000 0.1 0.2 0.3 0.4 1000 2000 3000 4000 0.1 0.2 0.3 0.4 0.5 0.6 1000 2000 3000 4000 0.2 0.4 0.6 0.8 1 1000 2000 3000 4000 0.1 0.2 0.3 0.4
Figure: Filtered (blue dotted), and smoothed (red) estimates of MEM(1,1) model
Other example: t.v. AR(1)-SDS
1000 2000 3000 4000 0.2 0.4 0.6 0.8 1 1000 2000 3000 4000 0.2 0.4 0.6 0.8 1000 2000 3000 4000 0.2 0.4 0.6 0.8 1000 2000 3000 4000 0.4 0.6 0.8 1
Figure: Filtered (blue dotted), and smoothed (red) estimates of autoregressive coefficient
- f AR(1) model
Other example: Wishart-GARCH-SDS
100 200 300 400 0.5 1 1.5 2 2.5 3 10 -3 V(1,1) Simulated X(1,1) Simulated V(1,1) Wishart-GARCH Wishart-GARCH-SDS 100 200 300 400
- 4
- 3
- 2
- 1
1 2 3 10 -3 V(1,2) Simulated X(1,2) Simulated V(1,2) Wishart-GARCH Wishart-GARCH-SDS
Figure: Comparison among simulated observations of Xt (grey lines), simulated true covariances Vt (black lines), filtered (blue dotted lines) and smoothed (red lines) (co)variances of realized Wishart-GARCH model in the case k = 5. We show the variance corresponding to the first asset on the left and the covariance between the first and the second asset on the right.
Kalman smoother vs SDS in linear non-Gaussian models
AR(1) + noise process: yt = αt + ǫt, ǫt ∼ t(0, σ2
ǫ, ν)
αt+1 = c + φαt + ηt, ηt ∼ N(0, σ2
η)
Signal-to-noise ratio: δ =
σ2
η
σ2
ǫ , c = 0.01, φ = 0.95
◮ Observation density: p(yt|ft; ϕ, β) = Γ[(β+1)/2] Γ(β/2)ϕ√ πβ
- 1 + (yt−ft )2
βϕ2
−(β+1)/2
SDF-KF SDS-KS δ 0.1 1 10 0.1 1 10 ν = 3 MSE 0.8610 0.9522 0.9991 0.8093 0.8876 0.9618 MAE 0.9389 0.9859 1.0036 0.9128 0.9634 1.0169 ν = 5 MSE 0.9552 0.9912 1.0032 0.9376 0.9880 1.0058 MAE 0.9792 0.9973 0.9999 0.9698 0.9949 1.0112 ν = 8 MSE 0.9877 0.9981 1.0029 0.9844 0.9954 1.0117 MAE 0.9939 0.9992 1.0039 0.9917 0.9982 1.0136
Nonlinear models
- 1. Stochastic volatility with Gaussian measurement density
rt = e
θt 2 ǫt,
ǫt ∼ N(0, 1) θt+1 = γ + φθt + ηt, ηt ∼ N(0, σ2
η)
- 2. Stochastic volatility with non-Gaussian measurement density
rt = e
θt 2 ǫt,
ǫt ∼ t(0, 1, ν) θt+1 = γ + φθt + ηt, ηt ∼ N(0, σ2
η)
- 3. Stochastic intensity
p(yt|λt) = λyt
t e−λt
yt! , θt = log λt θt+1 = γ + φθt + ηt, ηt ∼ N(0, σ2
η)
Two estimation methods:
◮ NAIS method of Koopman et al. (2014), based on importance sampling, for 1, 2, 3 ◮ QML approximate method of Harvey et al. (1994) for 1, 2
Loss measures computed on N = 1000 simulations of n = 2000 observations.
Nonlinear models - SV with Gaussian disturbances
◮ Observation density: p(yt|ft) = Γ β+1
2
- Γ β
2
√
πβeft
- 1 + 1
β
- yt
e
ft 2
2− β+1
2
◮ Similar to the Beta-t-EGARCH model of Harvey and Chakravarty (2008) ◮ Coefficient of variation: CV = exp
- σ2
η
1−φ2
- − 1
CV 0.1 1 5 10 0.1 1 5 10 MSE MAE φ = 0.98 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 SDS 0.9988 1.0050 1.0001 1.0162 1.0004 1.0043 1.0017 1.0097 QML 1.4153 1.3880 1.3333 1.3138 1.1797 1.1739 1.1564 1.1475 φ = 0.95 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 SDS 1.0057 0.9983 0.9988 1.0059 1.0034 1.0023 1.0024 1.0059 QML 1.3131 1.3737 1.3246 1.3168 1.1450 1.1758 1.1567 1.1524 φ = 0.90 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 SDS 1.0076 0.9956 0.9974 1.0086 1.0044 1.0010 1.0033 1.0093 QML 1.2371 1.3157 1.2893 1.2750 1.1109 1.1508 1.1422 1.1370
Nonlinear models - SV with non-Gaussian disturbances
◮ Observation density: p(yt|ft) = Γ β+1
2
- Γ β
2
√
πβeft
- 1 + 1
β
- yt
e
ft 2
2− β+1
2
◮ Similar to the Beta-t-EGARCH model of Harvey and Chakravarty (2008) ◮ Coefficient of variation: CV = exp
- σ2
η
1−φ2
- − 1
CV 0.1 1 5 10 0.1 1 5 10 MSE MAE φ = 0.98, ν = 3 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 SDS 1.0026 0.9950 1.0140 1.0169 1.0015 0.9997 1.0077 1.0098 QML 1.3962 1.2553 1.2125 1.1998 1.1735 1.1184 1.1013 1.0939 φ = 0.95, ν = 3 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 SDS 1.0014 1.0049 1.0121 1.0200 1.0008 1.0031 1.0064 1.0105 QML 1.3058 1.2639 1.2447 1.2246 1.1354 1.1230 1.1158 1.1056 φ = 0.90, ν = 3 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 SDS 1.0020 1.0033 1.0149 1.0221 1.0016 1.0023 1.0081 1.0117 QML 1.2306 1.2325 1.2262 1.2200 1.1026 1.1075 1.1062 1.1034
Nonlinear models - Stochastic intensity
◮ Observation density: p(yt|ft) = e−eft eft yt yt! σ2
η × 100
0.1 0.5 1 0.1 0.5 1 MSE MAE φ = 0.98 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 SDS 1.0149 1.0281 1.0521 1.0067 1.0132 1.0244 φ = 0.95 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 SDS 1.0203 1.0176 1.0254 1.0097 1.0083 1.0120 φ = 0.90 NAIS 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 SDS 1.0310 1.0160 1.0205 1.0142 1.0079 1.0099
Comparison with exact smoothers
200 400 600 800 1000 1200 1400 1600 1800 2000 5 10 SV Gaussian Simulated NAIS SDS 200 400 600 800 1000 1200 1400 1600 1800 2000 5 10 SV non-Gaussian Simulated NAIS SDS 200 400 600 800 1000 1200 1400 1600 1800 2000 0.6 0.8 1 1.2 1.4 1.6 Stochastic intensity Simulated NAIS SDS
Conclusions
The SDS extends score-driven filtered estimates to include the effect of present and future observations. Main features of the SDS
◮ Useful for off-line signal-reconstruction and analysis ◮ General, as it maintains the same form for any observation density ◮ Computationally simple, being a backward recursion ◮ Can easily handle models with high-dimensional t.v.p. vector ◮ Very close to exact smoothers based on IS (MSE loss lower, on average, than