Analysing geoadditive regression data: a mixed model approach - - PowerPoint PPT Presentation

analysing geoadditive regression data a mixed model
SMART_READER_LITE
LIVE PREVIEW

Analysing geoadditive regression data: a mixed model approach - - PowerPoint PPT Presentation

Analysing geoadditive regression data: a mixed model approach Thomas Kneib Institut f ur Statistik, Ludwig-Maximilians-Universit at M unchen Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Thomas Kneib Spatio-temporal


slide-1
SLIDE 1

Analysing geoadditive regression data: a mixed model approach

Thomas Kneib Institut f¨ ur Statistik, Ludwig-Maximilians-Universit¨ at M¨ unchen Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005

slide-2
SLIDE 2

Thomas Kneib Spatio-temporal regression data

Spatio-temporal regression data

  • Regression in a general sense:

– Generalised linear models, – Multivariate (categorical) generalised linear models, – Regression models for survival times (Cox-type models, AFT models).

  • Common structure:

Model a quantity of interest in terms of categorical and continuous covariates, e.g. E(y|u) = h(u′γ) (GLM)

  • r

λ(t|u) = λ0(t) exp(u′γ) (Cox model)

  • Spatio-temporal data: Temporal and spatial information as additional covariates.

Analysing geoadditive regression data: a mixed model approach 1

slide-3
SLIDE 3

Thomas Kneib Spatio-temporal regression data

  • Spatio-temporal regression models should allow

– to account for spatial and temporal correlations, – for time- and space-varying effects, – for non-linear effects of continuous covariates, – for flexible interactions, – to account for unobserved heterogeneity.

Analysing geoadditive regression data: a mixed model approach 2

slide-4
SLIDE 4

Thomas Kneib Example I: Forest health data

Example I: Forest health data

  • Yearly forest health inventories carried out from 1983 to 2004.
  • 83 beeches within a 15 km times 10 km area.
  • Response: defoliation degree of beech i in year t, measured in three ordered categories:

yit = 1 no defoliation, yit = 2 defoliation 25% or less, yit = 3 defoliation above 25%.

  • Covariates:

t calendar time, si site of the beech, ait age of the tree in years, uit further (mostly categorical) covariates.

Analysing geoadditive regression data: a mixed model approach 3

slide-5
SLIDE 5

Thomas Kneib Example I: Forest health data

1985 1990 1995 2000 0.0 0.2 0.4 0.6 0.8 1.0 calendar time no damage medium damage severe damage

Empirical time trends. Empirical spatial effect.

1 Analysing geoadditive regression data: a mixed model approach 4

slide-6
SLIDE 6

Thomas Kneib Example I: Forest health data

  • Cumulative probit model:

P(yit ≤ r) = Φ

  • θ(r) − ηit
  • with standard normal cdf Φ, thresholds −∞ = θ(0) < θ(1) < θ(2) < θ(3) = ∞ and

ηit = f1(t) + f2(ageit) + f3(t, ageit) + fspat(si) + u′

itγ

θ(1) θ(2) θ(3) η θ(1) θ(2) θ(3) η

Analysing geoadditive regression data: a mixed model approach 5

slide-7
SLIDE 7

Thomas Kneib Example I: Forest health data Analysing geoadditive regression data: a mixed model approach 6

slide-8
SLIDE 8

Thomas Kneib Example I: Forest health data

−6 −3 3 5 30 55 80 105 130 155 180 205 230 age in years −2 −1 1 2 1983 1990 1997 2004 calendar time

calendar time 1985 1990 1995 2000 a g e i n y e a r s 50 100 150 200 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

Analysing geoadditive regression data: a mixed model approach 7

slide-9
SLIDE 9

Thomas Kneib Example I: Forest health data

  • Category-specific trends:

P(yit ≤ r) = Φ

  • θ(r) − f (r)

1 (t) − f2(ageit) − fspat(si) − u′ itγ

  • More complicated constraints:

−∞ < θ(1) − f (1)

1 (t) < θ(2) − f (2) 1 (t) < ∞

for all t.

−2 −1 1 2 1983 1990 1997 2004 calendar time

time trend 1

−2 −1 1 2 1983 1990 1997 2004 calendar time

time trend 2

Analysing geoadditive regression data: a mixed model approach 8

slide-10
SLIDE 10

Thomas Kneib Structured additive regression

Structured additive regression

  • General Idea:

Replace usual parametric predictor with a flexible semiparametric predictor containing – Nonparametric effects of time scales and continuous covariates, – Spatial effects, – Interaction surfaces, – Varying coefficient terms (continuous and spatial effect modifiers), – Random intercepts and random slopes.

  • All effects can be cast into one general framework.

Analysing geoadditive regression data: a mixed model approach 9

slide-11
SLIDE 11

Thomas Kneib Structured additive regression

  • Penalised splines.

– Approximate f(x) by a weighted sum of B-spline basis functions. – Employ a large number of basis functions to enable flexibility. – Penalise differences between parameters of adjacent basis functions to ensure smoothness.

−2 −1 1 2 −3 −1.5 1.5 3 −2 −1 1 2 −3 −1.5 1.5 3 −2 −1 1 2 −3 −1.5 1.5 3

Analysing geoadditive regression data: a mixed model approach 10

slide-12
SLIDE 12

Thomas Kneib Structured additive regression

  • Bivariate penalised splines.

❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ✉ ✉ ✉ ✉

  • Varying coefficient models.

– Effect of covariate x varies smoothly over the domain of a second covariate z: f(x, z) = x · g(z) – Spatial effect modifier ⇒ Geographically weighted regression.

Analysing geoadditive regression data: a mixed model approach 11

slide-13
SLIDE 13

Thomas Kneib Structured additive regression

  • Spatial effect for regional data: Markov random fields.

– Bivariate extension of a first order random walk on the real line. – Define appropriate neighbourhoods for the regions. – Assume that the expected value of fspat(s) is the average of the function evaluations of adjacent sites.

τ2 2

t−1 t t+1 f(t−1) E[f(t)|f(t−1),f(t+1)] f(t+1)

Analysing geoadditive regression data: a mixed model approach 12

slide-14
SLIDE 14

Thomas Kneib Structured additive regression

  • Spatial effect for point-referenced data: Stationary Gaussian random fields.

– Well-known as Kriging in the geostatistics literature. – Spatial effect follows a zero mean stationary Gaussian stochastic process. – Correlation of two arbitrary sites is defined by an intrinsic correlation function. – Can be interpreted as a basis function approach with radial basis functions.

Analysing geoadditive regression data: a mixed model approach 13

slide-15
SLIDE 15

Thomas Kneib Mixed model based inference

Mixed model based inference

  • Each term in the predictor is associated with a vector of regression coefficients with

multivariate Gaussian prior / random effects distribution: p(ξj|τ 2

j ) ∝ exp

  • − 1

2τ 2

j

ξ′

jKjξj

  • Kj is a penalty matrix, τ 2

j a smoothing parameter.

  • In most cases Kj is rank-deficient.

⇒ Reparametrise the model to obtain a mixed model with proper distributions.

Analysing geoadditive regression data: a mixed model approach 14

slide-16
SLIDE 16

Thomas Kneib Mixed model based inference

  • Decompose

ξj = Xjβj + Zjbj, where p(βj) ∝ const and bj ∼ N(0, τ 2

j I).

⇒ βj is a fixed effect and bj is an i.i.d. random effect.

  • This yields the variance components model

η = x′β + z′b, where in turn p(β) ∝ const and b ∼ N(0, Q).

Analysing geoadditive regression data: a mixed model approach 15

slide-17
SLIDE 17

Thomas Kneib Mixed model based inference

  • Obtain empirical Bayes estimates / penalised likelihood estimates via iterating

– Penalised maximum likelihood for the regression coefficients β and b. – Restricted Maximum / Marginal likelihood for the variance parameters in Q: L(Q) =

  • L(β, b, Q)p(b)dβdb → max

Q .

Analysing geoadditive regression data: a mixed model approach 16

slide-18
SLIDE 18

Thomas Kneib Software

Software

  • Implemented in the software package BayesX.
  • Available from

http://www.stat.uni-muenchen.de/~bayesx

Analysing geoadditive regression data: a mixed model approach 17

slide-19
SLIDE 19

Thomas Kneib Childhood mortality in Nigeria

Childhood mortality in Nigeria

  • Data from the 2003 Demographic and Health Survey (DHS) in Nigeria.
  • Retrospective questionnaire on the health status of women in reproductive age and

their children.

  • Survival time of n = 5323 children.
  • Numerous covariates including spatial information.
  • Analysis based on the Cox model:

λ(t; u) = λ0(t) exp(u′γ).

Analysing geoadditive regression data: a mixed model approach 18

slide-20
SLIDE 20

Thomas Kneib Childhood mortality in Nigeria

  • Limitations of the classical Cox model:

– Restricted to right censored observations. – Post-estimation of the baseline hazard. – Proportional hazards assumption. – Parametric form of the predictor. – No spatial correlations. ⇒ Geoadditive hazard regression.

Analysing geoadditive regression data: a mixed model approach 19

slide-21
SLIDE 21

Thomas Kneib Interval censored survival times

Interval censored survival times

  • In theory, survival times should be available in days.
  • Retrospective questionnaire ⇒ most uncensored survival times are rounded (Heaping).

50 100 150 200 250 300 6 12 18 24 30 36 42 48 54

  • In contrast: censoring times are given in days.

⇒ Treat survival times as interval censored.

Analysing geoadditive regression data: a mixed model approach 20

slide-22
SLIDE 22

Thomas Kneib Interval censored survival times

C T Tlower Tupper uncensored right censored interval censored

Analysing geoadditive regression data: a mixed model approach 21

slide-23
SLIDE 23

Thomas Kneib Interval censored survival times

  • Likelihood contributions:

P(T > C) = S(C) = exp

C λ(t)dt

  • .

P(T ∈ [Tlower, Tupper]) = S(Tlower) − S(Tupper) = exp

Tlower λ(t)dt

  • − exp

Tupper λ(t)dt

  • .
  • Derivatives of the log-likelihood become much more complicated for interval censored

survival times.

  • Numerical integration techniques have to be used in both cases.
  • Piecewise constant time-varying covariates and left truncation can easily be included.

Analysing geoadditive regression data: a mixed model approach 22

slide-24
SLIDE 24

Thomas Kneib Interval censored survival times

−0.6 0.5

Spatial effect without covariates. Spatial effect including covariates.

−0.05 0.05

Analysing geoadditive regression data: a mixed model approach 23

slide-25
SLIDE 25

Thomas Kneib Interval censored survival times

−2 −1 1 2 15 25 35 45 age of the mother at birth

Age of the mother at birth. Body mass index of the mother.

−2 −1 1 2 10 20 30 40 50 body mass index of the mother

Analysing geoadditive regression data: a mixed model approach 24

slide-26
SLIDE 26

Thomas Kneib Interval censored survival times 500 1000 1500 −50 −40 −30 −20 −10 survival time in days log(baseline) without interval censoring with interval censoring Analysing geoadditive regression data: a mixed model approach 25

slide-27
SLIDE 27

Thomas Kneib Discussion

Discussion

  • Empirical Bayesian treatment of complex geoadditive regression models:

– Based on mixed model representation. – Applicable for a wide range of regression models. – Does not rely on MCMC simulation techniques. ⇒ No questions on convergence and mixing of Markov chains, no hyperpriors. – Closely related to penalised likelihood estimation in a frequentist setting.

  • Future work:

– Extended modelling for categorical responses, e.g. based on correlated latent utilities. – Multi state models. – Interval censoring for multi state models.

Analysing geoadditive regression data: a mixed model approach 26

slide-28
SLIDE 28

Thomas Kneib References

References

  • Fahrmeir, L., Kneib, T. & Lang, S. (2004): Penalized structured additive regression

for space-time data: A Bayesian perspective. Statistica Sinica, 14, 715-745.

  • Kneib, T. & Fahrmeir, L. (2005):

Structured additive regression for categorical space-time data: A mixed model approach. Biometrics, to appear.

  • Kneib, T. (2005): Geoadditive hazard regression for interval censored survival times.

SFB 386 Discussion Paper 447, University of Munich.

  • Software and preprints:

http://www.stat.uni-muenchen.de/~kneib

Analysing geoadditive regression data: a mixed model approach 27