[PPT] - From Yates Algorithm to Multidimensional Smoothing with Plan of PowerPoint Presentation

SLIDE 1

From Yates Algorithm to Multidimensional Smoothing with GLAM Iain Currie

Heriot Watt University with James Kirkby, Maria Durban & Paul Eilers University of Edinburgh December, 2006

1

Plan of talk

Classical regression
Factorial designs and Yates algorithm
Yates algorithm as an array computation
Generalised linear models
Array regression
Examples

Themes

Structure of data and models
Algorithms

2

Classical regression

Structure

Data: a data vector y, n × 1
Model: a model matrix X, n × c

a parameter vector θ, c × 1 µ = E(y) = Xθ

Error distribution: normal

Algorithm

Normal equations

X′X ˆ θ = X′y

3

Classical regression

Universal recipe

Fit <- lm(y ∼ X - 1)

A model algebra

Fit <- lm(y ∼ x + x2) Fit <- lm(y ∼ A + B + A:B) Fit <- lm(y ∼ x + x2 + A∗B)

4

SLIDE 2

Factorial designs

Example: 23 design, John & Quenoullie, p64

Treatment Yield Effect Estimate*8 1 94.1 1 867.6 n 107.5 N 68.8 p 97.0 P 24.8 np 113.5 NP

10.0

k 96.9 K 43.4 nk 122.9 NK 9.0 pk 111.4 PK 7.0 npk 124.3 NPK

16.2

Model yijk = µ + αi + βj + γk + (αβ)ij + (αγ)ik + (βγ)jk + (αβγ)ijk i, j, k = 1, 2

5

Analysis in R

Fit <- lm(y ∼ N∗P∗K) X <- model.matrix(y ∼ N∗P∗K) 1 N P NP K NK PK NPK 1

1
1

1

1

1 1

1

1 1

1
1
1
1

1 1 1

1

1

1
1

1

1

1 1 1 1 1

1
1
1
1

1

1
1

1 1

1
1

1 1 1

1
1

1 1

1
1

1

1

1

1

1

1

1

1

1 1 1 1 1 1 1 1 X′X = 8I8 ⇒ ˆ θ =

1 8X′y 6

Yates Algorithm

Yield (1) (2) (3) 1 94.1 201.6 412.1 867.6 1 n 107.5 210.5 455.5 68.8 N p 97.0 219.8 29.9 24.8 P np 113.5 235.7 38.9

10.0

NP k 96.9 13.4 8.9 43.4 K nk 122.9 16.5 15.9 9.0 NK pk 111.4 26.0 3.1 7.0 PK npk 124.3 12.9

13.1
16.2

NPK

7

Kronecker products

1 N P NP K NK PK NPK 1

1
1

1

1

1 1

1

1 1

1
1
1
1

1 1 1

1

1

1
1

1

1

1 1 1 1 1

1
1
1
1

1

1
1

1 1

1
1

1 1 1

1
1

1 1

1
1

1

1

1

1

1

1

1

1

1 1 1 1 1 1 1 1

8

SLIDE 3

Kronecker products

X =   X1 −X1 X1 X1   ⇒ X =   1 −1 1 1   ⊗ X1 where X1 =   1 −1 1 1   ⊗ X2 ⇒ X =   1 −1 1 1   ⊗   1 −1 1 1   ⊗   1 −1 1 1  

9

Restatement of problem

Evaluate X′y where X = Z ⊗ Z ⊗ Z and Z =   1 −1 1 1   .

Model structure: Kronecker product Data structure: ??

10

1 94.1 n 107.5 n 107.5 p 97.0 k 96.9 np 113.5 pk 111.4 npk 124.3 nk 122.9

11

Restatement of problem

Evaluate X′y where

Data structure: Y , a 3-d array Model structure: X = Z ⊗ Z ⊗ Z

Evaluate X′y where

Data structure: Y , n1 × n2 Model structure: X = X2 ⊗ X1, Xi, ni × ci

(X2 ⊗ X1)′y ≡ X′

1Y X2 12

SLIDE 4

Efficient computation

(X2 ⊗ X1)′y ≡ X′

1Y X2 = (X′ 2(X′ 1Y )′)′

Example: X1, X2, 103 × 102

No need to compute X
LHS: 1010 multiplications

RHS: 108 multiplications

Reformat RHS

13

The rotated H-transform

(X2 ⊗ X1)′y ≡ (X′

2(X′ 1Y )′)′

(X3 ⊗ X2 ⊗ X1)′y ≡ (X′

3(X′ 2(X′ 1Y )′)′)′

??? (X3 ⊗ X2 ⊗ X1)′y ≡ ρ(X3, ρ(X2, ρ(X1, Y )))

Definition: X, n1 × c1 matrix;

A, c1 × c2 × c3 array. ρ(X, A) → XAc1×c2c3 = An1×c2c3 → An1×c2×c3 → Ac2×c3×n1

Example: 23 factorial design.

Evaluate X′y, X = Z ⊗ Z ⊗ Z, Z2×2, Y 2×2×2. ρ(Z′, ρ(Z′, ρ(Z′, Y ))) = ˆ Θ2×2×2

14

Yates Algorithm

Yield (1) (2) (3) 1 94.1 201.6 412.1 867.6 1 n 107.5 210.5 455.5 68.8 N p 97.0 219.8 29.9 24.8 P np 113.5 235.7 38.9

10.0

NP k 96.9 13.4 8.9 43.4 K nk 122.9 16.5 15.9 9.0 NK pk 111.4 26.0 3.1 7.0 PK npk 124.3 12.9

13.1
16.2

NPK

15

1 867.6 N 68.8 P 24.8 K 43.4 NP −10 PK 7.0 NPK −16.2 NK 9.0

16

SLIDE 5

Generalised linear models

Structure

Data: a data vector y
Model: a model matrix X = X3 ⊗ X2 ⊗ X1

a parameter vector θ a link function µ = E(y), g(µ) = Xθ

Error distribution: normal, Poisson, binomial, . . .

Algorithm

Scoring algorithm

X′ ˜ WδXˆ θ = X′ ˜ Wδ˜ z where ˜ z = X˜ θ + ˜ Wδ

−1(y − ˜

µ).

17

Computation

X˜ θ X′ ˜ Wδ˜ z X′ ˜ WδX In 1-d, let X = (x1, . . . , xc), n × c. (X′WδX)jk =

n

i=1

wixijxik = (x1jx1k, . . . , xnjxnk)w = (xj ∗ xk)′w

Definition. The row tensor of X, n × c, is

G(X) = (X ⊗ 1′) ∗ (1′ ⊗ X), n × c2 where 1 is vector of 1’s of length c. X′WδX, c × c ≡ G(X)′w, c2 × 1

18

Computation of X′WδX

Xi, ni × ci, i = 1, 2, 3. X = X3 ⊗ X2 ⊗ X1, n1n2n3 × c1c2c3 Wδ is diagonal, n1n2n3 × n1n2n3 W is the corresponding array, n1 × n2 × n3 X′WδX, c1c2c3 × c1c2c3 ≡ ρ(G(X3)′, ρ(G(X2)′, ρ(G(X1)′, W ))), c2

1 × c2 2 × c2 3 19

Standard errors of Xˆ θ

We need diag X(X′WδX)−1X′ = diag XSmX′ where Sm, c1c2c3 × c1c2c3. Let S, c2

1 × c2 2 × c2 3, be the array form of Sm.

diag XSmX′, n1n2n3 × 1 ≡ ρ(G(X3), ρ(G(X2), ρ(G(X1), S))), n1 × n2 × n3.

20

SLIDE 6

Computation: a summary

Linear functions: X′y, c1c2c3 × 1

ρ(X′

3, ρ(X′ 2, ρ(X′ 1, Y ))), c1 × c2 × c3

Xθ, n1n2n3 × 1 ρ(X3, ρ(X2, ρ(X1, Θ))), n1 × n2 × n3

Inner products : X′WδX, c1c2c3 × c1c2c3

ρ(G(X3)′, ρ(G(X2)′, ρ(G(X1)′, W ))), c2

1 × c2 2 × c2 3

Diagonal function : diagXSmX′, n1n2n3 × 1

ρ(G(X3), ρ(G(X2), ρ(G(X1), S))), n1 × n2 × n3.

21

Generalised linear array models

Structure

Data: a data array Y , n1 × n2 × n3
Model: a model matrix X = X3 ⊗ X2 ⊗ X1, Xi, ni × ci

a parameter array Θ, c1 × c2 × c3 a link function M = E(Y ), g(M) = ρ(X3, ρ(X2, ρ(X1, Θ))).

Error distribution: normal, Poisson, binomial, . . .

Algorithm

Efficient scoring algorithm

Array computation of X′ ˜ WδXˆ θ = X′ ˜ Wδ˜ z where ˜ z = X˜ θ + ˜ Wδ

−1(y − ˜

µ).

22

LM, GLM & GLAM

LM GLM GLAM Data y y Y = Y d Model µ = Xθ g(µ) = Xθ g(M) = ρ(Xd, . . . , ρ(X1, Θ) . . .) Error N N, P, B,... N, P, B,... Algorithm N-equns Scoring Efficient Scoring

23

Examples

Factorial designs
Contingency tables
Smooth models with P-splines

24

SLIDE 7

20 40 60 80 0.0 0.1 0.2 0.3 0.4 0.5 0.6

A single cubic B−spline

Age B−spline

25

20 40 60 80 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

1−D B−spline basis

Age Basis functions

26

P-splines in 2-d

Structure

Data: deaths matrix, D, exposure matrix, E
Model: a model matrix B = By ⊗ Ba, Ba, 80 × 20, By, 100 × 25

a parameter array Θ, 20 × 25 a link function M = E(D), log(M) = log E + BaΘB′

y.

Error distribution: Poisson.

Algorithm

Efficient penalised scoring algorithm

Array computation of (B′ ˜ WδB + P )ˆ θ = B′ ˜ Wδ˜ z where ˜ z = B˜ θ + ˜ Wδ

−1(d − ˜

µ) and P = P (λa, λy) is the penalty matrix.

27

1 9 4 7 1 9 6 1 9 7 3 1 9 8 6 1 9 9 9 Y e a r 11 33 56 78 100 Age 0.1 0.2 0.3 0.4 0.5

28

SLIDE 8

1900 1920 1940 1960 1980 2000 −8 −6 −4 −2

Swedish females: ages 20 to 90

Year Log(mortality)

20 30 40 50 60 70 80 90 29

Smooth Lee-Carter models

Structure

Data: deaths matrix, D, exposure matrix, E
Model: Dij ∼ P(µij), log µij = log Eij + αi + βiκj

For given ˆ κ

M = E(D), log(M) = log E + BaΘ ˆ K

′

where α = Baa, β = Bab, Θ = [a : b], ˆ K = [1 : ˆ κ].

For given ˆ α and ˆ β

M = E(D), log(M) = log E + log(1′ ⊗ ˆ α) + ˆ βk′B′

k

where κ = Bkk.

Algorithm: Efficient penalised scoring algorithm

30

20 40 60 80 100 −8 −7 −6 −5 −4 −3 −2 −1 Age Alpha 20 40 60 80 100 1.0 1.2 1.4 1.6 1.8 2.0 2.2 Age Beta 1950 1960 1970 1980 1990 2000 −0.3 −0.2 −0.1 0.0 0.1 Year Kappa 1950 1960 1970 1980 1990 2000 −4.4 −4.2 −4.0 −3.8 −3.6 Year log(mortality)

Age: 65

31

Microarray data

Data are arranged on a 72 × 128 grid in 8 blocks of 2 rows and 4 columns. E(Y ) = B3A ˘ B

′ 3 + B0C ˘

B

′ 32

SLIDE 9

Microarray backgound 20 40 60 80 100 120 10 20 30 40 50 60 70

Fitted surface (smooth + pin) 20 40 60 80 100 120 10 20 30 40 50 60 70 Pin component 20 40 60 80 100 120 10 20 30 40 50 60 70 Smooth component 20 40 60 80 100 120 10 20 30 40 50 60 70 33

Respiratory data

Deaths, D, and exposures, E, from a respiratory disease, classified by age, 1 to 105, year, 1959 to 1998, and month, 1 to 12, are arranged in a 105 × 40 × 12 array. E(D) = M, log M = log E + ρ(Bm, ρ(By, ρ(Ba, Θ))) Ba, 105 × 15, By, 40 × 10, Bm, 12 × 7 B = Bm ⊗ By ⊗ Ba, 50400 × 1050

34

Age log(Deaths/Day) 20 40 60 80 100

6
2

2

• • • • •
•
•
•
•
• •
• • •
•
• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
• • •

Year log(Deaths/Day) 1960 1970 1980 1990 2000 2.0 2.4 2.8

Month

log(Deaths/Day) 5 10 15 2.7 3.0

35

Times (seconds) to calculate B′WδB Array npar Standard Array Ratio size regression regression 6 × 6 × 6 216 20 1 20:1 7 × 7 × 7 343 200 2 100:1 8 × 8 × 8 512 2000 4 500:1 9 × 9 × 9 729 − 20 −

36

SLIDE 10

References Smoothing with P-splines

Eilers & Marx (1996) Statistical Science, 11, 758-783. Currie, & Durban, (2002) Statistical Modelling, 2, 333-349. Currie, Durban & Eilers (2004) Statistical Modelling, 4, 279-298.

GLAM

Eilers, Currie & Durban (2006) Computational Statistics and Data Analysis, 50, 61-76. Currie, Durban & Eilers (2006) Journal of the Royal Statistical Society, Series B, 68, 259-280.

Web site

www.ma.hw.ac.uk/∼iain/research/papers.html

37