From Yates Algorithm to Multidimensional Smoothing with Plan of - - PowerPoint PPT Presentation

from yates algorithm to multidimensional smoothing with
SMART_READER_LITE
LIVE PREVIEW

From Yates Algorithm to Multidimensional Smoothing with Plan of - - PowerPoint PPT Presentation

From Yates Algorithm to Multidimensional Smoothing with Plan of talk GLAM Classical regression Factorial designs and Yates algorithm Iain Currie Yates algorithm as an array computation Heriot Watt University Generalised linear


slide-1
SLIDE 1

From Yates Algorithm to Multidimensional Smoothing with GLAM Iain Currie

Heriot Watt University with James Kirkby, Maria Durban & Paul Eilers University of Edinburgh December, 2006

1

Plan of talk

  • Classical regression
  • Factorial designs and Yates algorithm
  • Yates algorithm as an array computation
  • Generalised linear models
  • Array regression
  • Examples

Themes

  • Structure of data and models
  • Algorithms

2

Classical regression

Structure

  • Data: a data vector y, n × 1
  • Model: a model matrix X, n × c

a parameter vector θ, c × 1 µ = E(y) = Xθ

  • Error distribution: normal

Algorithm

  • Normal equations

X′X ˆ θ = X′y

3

Classical regression

Universal recipe

Fit <- lm(y ∼ X - 1)

A model algebra

Fit <- lm(y ∼ x + x2) Fit <- lm(y ∼ A + B + A:B) Fit <- lm(y ∼ x + x2 + A∗B)

4

slide-2
SLIDE 2

Factorial designs

Example: 23 design, John & Quenoullie, p64

Treatment Yield Effect Estimate*8 1 94.1 1 867.6 n 107.5 N 68.8 p 97.0 P 24.8 np 113.5 NP

  • 10.0

k 96.9 K 43.4 nk 122.9 NK 9.0 pk 111.4 PK 7.0 npk 124.3 NPK

  • 16.2

Model yijk = µ + αi + βj + γk + (αβ)ij + (αγ)ik + (βγ)jk + (αβγ)ijk i, j, k = 1, 2

5

Analysis in R

Fit <- lm(y ∼ N∗P∗K) X <- model.matrix(y ∼ N∗P∗K) 1 N P NP K NK PK NPK 1

  • 1
  • 1

1

  • 1

1 1

  • 1

1 1

  • 1
  • 1
  • 1
  • 1

1 1 1

  • 1

1

  • 1
  • 1

1

  • 1

1 1 1 1 1

  • 1
  • 1
  • 1
  • 1

1

  • 1
  • 1

1 1

  • 1
  • 1

1 1 1

  • 1
  • 1

1 1

  • 1
  • 1

1

  • 1

1

  • 1

1

  • 1

1

  • 1

1 1 1 1 1 1 1 1 X′X = 8I8 ⇒ ˆ θ =

1 8X′y 6

Yates Algorithm

Yield (1) (2) (3) 1 94.1 201.6 412.1 867.6 1 n 107.5 210.5 455.5 68.8 N p 97.0 219.8 29.9 24.8 P np 113.5 235.7 38.9

  • 10.0

NP k 96.9 13.4 8.9 43.4 K nk 122.9 16.5 15.9 9.0 NK pk 111.4 26.0 3.1 7.0 PK npk 124.3 12.9

  • 13.1
  • 16.2

NPK

7

Kronecker products

1 N P NP K NK PK NPK 1

  • 1
  • 1

1

  • 1

1 1

  • 1

1 1

  • 1
  • 1
  • 1
  • 1

1 1 1

  • 1

1

  • 1
  • 1

1

  • 1

1 1 1 1 1

  • 1
  • 1
  • 1
  • 1

1

  • 1
  • 1

1 1

  • 1
  • 1

1 1 1

  • 1
  • 1

1 1

  • 1
  • 1

1

  • 1

1

  • 1

1

  • 1

1

  • 1

1 1 1 1 1 1 1 1

8

slide-3
SLIDE 3

Kronecker products

X =   X1 −X1 X1 X1   ⇒ X =   1 −1 1 1   ⊗ X1 where X1 =   1 −1 1 1   ⊗ X2 ⇒ X =   1 −1 1 1   ⊗   1 −1 1 1   ⊗   1 −1 1 1  

9

Restatement of problem

Evaluate X′y where X = Z ⊗ Z ⊗ Z and Z =   1 −1 1 1   .

Model structure: Kronecker product Data structure: ??

10

1 94.1 n 107.5 n 107.5 p 97.0 k 96.9 np 113.5 pk 111.4 npk 124.3 nk 122.9

11

Restatement of problem

Evaluate X′y where

Data structure: Y , a 3-d array Model structure: X = Z ⊗ Z ⊗ Z

Evaluate X′y where

Data structure: Y , n1 × n2 Model structure: X = X2 ⊗ X1, Xi, ni × ci

(X2 ⊗ X1)′y ≡ X′

1Y X2 12

slide-4
SLIDE 4

Efficient computation

(X2 ⊗ X1)′y ≡ X′

1Y X2 = (X′ 2(X′ 1Y )′)′

Example: X1, X2, 103 × 102

  • No need to compute X
  • LHS: 1010 multiplications

RHS: 108 multiplications

  • Reformat RHS

13

The rotated H-transform

(X2 ⊗ X1)′y ≡ (X′

2(X′ 1Y )′)′

(X3 ⊗ X2 ⊗ X1)′y ≡ (X′

3(X′ 2(X′ 1Y )′)′)′

??? (X3 ⊗ X2 ⊗ X1)′y ≡ ρ(X3, ρ(X2, ρ(X1, Y )))

Definition: X, n1 × c1 matrix;

A, c1 × c2 × c3 array. ρ(X, A) → XAc1×c2c3 = An1×c2c3 → An1×c2×c3 → Ac2×c3×n1

Example: 23 factorial design.

Evaluate X′y, X = Z ⊗ Z ⊗ Z, Z2×2, Y 2×2×2. ρ(Z′, ρ(Z′, ρ(Z′, Y ))) = ˆ Θ2×2×2

14

Yates Algorithm

Yield (1) (2) (3) 1 94.1 201.6 412.1 867.6 1 n 107.5 210.5 455.5 68.8 N p 97.0 219.8 29.9 24.8 P np 113.5 235.7 38.9

  • 10.0

NP k 96.9 13.4 8.9 43.4 K nk 122.9 16.5 15.9 9.0 NK pk 111.4 26.0 3.1 7.0 PK npk 124.3 12.9

  • 13.1
  • 16.2

NPK

15

1 867.6 N 68.8 P 24.8 K 43.4 NP −10 PK 7.0 NPK −16.2 NK 9.0

16

slide-5
SLIDE 5

Generalised linear models

Structure

  • Data: a data vector y
  • Model: a model matrix X = X3 ⊗ X2 ⊗ X1

a parameter vector θ a link function µ = E(y), g(µ) = Xθ

  • Error distribution: normal, Poisson, binomial, . . .

Algorithm

  • Scoring algorithm

X′ ˜ WδXˆ θ = X′ ˜ Wδ˜ z where ˜ z = X˜ θ + ˜ Wδ

−1(y − ˜

µ).

17

Computation

X˜ θ X′ ˜ Wδ˜ z X′ ˜ WδX In 1-d, let X = (x1, . . . , xc), n × c. (X′WδX)jk =

n

  • i=1

wixijxik = (x1jx1k, . . . , xnjxnk)w = (xj ∗ xk)′w

  • Definition. The row tensor of X, n × c, is

G(X) = (X ⊗ 1′) ∗ (1′ ⊗ X), n × c2 where 1 is vector of 1’s of length c. X′WδX, c × c ≡ G(X)′w, c2 × 1

18

Computation of X′WδX

Xi, ni × ci, i = 1, 2, 3. X = X3 ⊗ X2 ⊗ X1, n1n2n3 × c1c2c3 Wδ is diagonal, n1n2n3 × n1n2n3 W is the corresponding array, n1 × n2 × n3 X′WδX, c1c2c3 × c1c2c3 ≡ ρ(G(X3)′, ρ(G(X2)′, ρ(G(X1)′, W ))), c2

1 × c2 2 × c2 3 19

Standard errors of Xˆ θ

We need diag X(X′WδX)−1X′ = diag XSmX′ where Sm, c1c2c3 × c1c2c3. Let S, c2

1 × c2 2 × c2 3, be the array form of Sm.

diag XSmX′, n1n2n3 × 1 ≡ ρ(G(X3), ρ(G(X2), ρ(G(X1), S))), n1 × n2 × n3.

20

slide-6
SLIDE 6

Computation: a summary

Linear functions: X′y, c1c2c3 × 1

ρ(X′

3, ρ(X′ 2, ρ(X′ 1, Y ))), c1 × c2 × c3

Xθ, n1n2n3 × 1 ρ(X3, ρ(X2, ρ(X1, Θ))), n1 × n2 × n3

Inner products : X′WδX, c1c2c3 × c1c2c3

ρ(G(X3)′, ρ(G(X2)′, ρ(G(X1)′, W ))), c2

1 × c2 2 × c2 3

Diagonal function : diagXSmX′, n1n2n3 × 1

ρ(G(X3), ρ(G(X2), ρ(G(X1), S))), n1 × n2 × n3.

21

Generalised linear array models

Structure

  • Data: a data array Y , n1 × n2 × n3
  • Model: a model matrix X = X3 ⊗ X2 ⊗ X1, Xi, ni × ci

a parameter array Θ, c1 × c2 × c3 a link function M = E(Y ), g(M) = ρ(X3, ρ(X2, ρ(X1, Θ))).

  • Error distribution: normal, Poisson, binomial, . . .

Algorithm

  • Efficient scoring algorithm

Array computation of X′ ˜ WδXˆ θ = X′ ˜ Wδ˜ z where ˜ z = X˜ θ + ˜ Wδ

−1(y − ˜

µ).

22

LM, GLM & GLAM

LM GLM GLAM Data y y Y = Y d Model µ = Xθ g(µ) = Xθ g(M) = ρ(Xd, . . . , ρ(X1, Θ) . . .) Error N N, P, B,... N, P, B,... Algorithm N-equns Scoring Efficient Scoring

23

Examples

  • Factorial designs
  • Contingency tables
  • Smooth models with P-splines

24

slide-7
SLIDE 7

20 40 60 80 0.0 0.1 0.2 0.3 0.4 0.5 0.6

A single cubic B−spline

Age B−spline

25

20 40 60 80 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

1−D B−spline basis

Age Basis functions

26

P-splines in 2-d

Structure

  • Data: deaths matrix, D, exposure matrix, E
  • Model: a model matrix B = By ⊗ Ba, Ba, 80 × 20, By, 100 × 25

a parameter array Θ, 20 × 25 a link function M = E(D), log(M) = log E + BaΘB′

y.

  • Error distribution: Poisson.

Algorithm

  • Efficient penalised scoring algorithm

Array computation of (B′ ˜ WδB + P )ˆ θ = B′ ˜ Wδ˜ z where ˜ z = B˜ θ + ˜ Wδ

−1(d − ˜

µ) and P = P (λa, λy) is the penalty matrix.

27

1 9 4 7 1 9 6 1 9 7 3 1 9 8 6 1 9 9 9 Y e a r 11 33 56 78 100 Age 0.1 0.2 0.3 0.4 0.5

28

slide-8
SLIDE 8

1900 1920 1940 1960 1980 2000 −8 −6 −4 −2

Swedish females: ages 20 to 90

Year Log(mortality)

20 30 40 50 60 70 80 90 29

Smooth Lee-Carter models

Structure

  • Data: deaths matrix, D, exposure matrix, E
  • Model: Dij ∼ P(µij), log µij = log Eij + αi + βiκj

For given ˆ κ

M = E(D), log(M) = log E + BaΘ ˆ K

where α = Baa, β = Bab, Θ = [a : b], ˆ K = [1 : ˆ κ].

For given ˆ α and ˆ β

M = E(D), log(M) = log E + log(1′ ⊗ ˆ α) + ˆ βk′B′

k

where κ = Bkk.

Algorithm: Efficient penalised scoring algorithm

30

20 40 60 80 100 −8 −7 −6 −5 −4 −3 −2 −1 Age Alpha 20 40 60 80 100 1.0 1.2 1.4 1.6 1.8 2.0 2.2 Age Beta 1950 1960 1970 1980 1990 2000 −0.3 −0.2 −0.1 0.0 0.1 Year Kappa 1950 1960 1970 1980 1990 2000 −4.4 −4.2 −4.0 −3.8 −3.6 Year log(mortality)

Age: 65

31

Microarray data

Data are arranged on a 72 × 128 grid in 8 blocks of 2 rows and 4 columns. E(Y ) = B3A ˘ B

′ 3 + B0C ˘

B

′ 32

slide-9
SLIDE 9

Microarray backgound 20 40 60 80 100 120 10 20 30 40 50 60 70

Fitted surface (smooth + pin) 20 40 60 80 100 120 10 20 30 40 50 60 70 Pin component 20 40 60 80 100 120 10 20 30 40 50 60 70 Smooth component 20 40 60 80 100 120 10 20 30 40 50 60 70 33

Respiratory data

Deaths, D, and exposures, E, from a respiratory disease, classified by age, 1 to 105, year, 1959 to 1998, and month, 1 to 12, are arranged in a 105 × 40 × 12 array. E(D) = M, log M = log E + ρ(Bm, ρ(By, ρ(Ba, Θ))) Ba, 105 × 15, By, 40 × 10, Bm, 12 × 7 B = Bm ⊗ By ⊗ Ba, 50400 × 1050

34

Age log(Deaths/Day) 20 40 60 80 100

  • 6
  • 2

2

  • • • • • •
  • • •
  • • • •
  • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
  • • • •

Year log(Deaths/Day) 1960 1970 1980 1990 2000 2.0 2.4 2.8

  • Month

log(Deaths/Day) 5 10 15 2.7 3.0

  • 35

Times (seconds) to calculate B′WδB Array npar Standard Array Ratio size regression regression 6 × 6 × 6 216 20 1 20:1 7 × 7 × 7 343 200 2 100:1 8 × 8 × 8 512 2000 4 500:1 9 × 9 × 9 729 − 20 −

36

slide-10
SLIDE 10

References Smoothing with P-splines

Eilers & Marx (1996) Statistical Science, 11, 758-783. Currie, & Durban, (2002) Statistical Modelling, 2, 333-349. Currie, Durban & Eilers (2004) Statistical Modelling, 4, 279-298.

GLAM

Eilers, Currie & Durban (2006) Computational Statistics and Data Analysis, 50, 61-76. Currie, Durban & Eilers (2006) Journal of the Royal Statistical Society, Series B, 68, 259-280.

Web site

www.ma.hw.ac.uk/∼iain/research/papers.html

37