The Power of Unbiased Recursive Partitioning: A Unifying View of - - PowerPoint PPT Presentation

the power of unbiased recursive partitioning a unifying
SMART_READER_LITE
LIVE PREVIEW

The Power of Unbiased Recursive Partitioning: A Unifying View of - - PowerPoint PPT Presentation

The Power of Unbiased Recursive Partitioning: A Unifying View of CTree, MOB, and GUIDE Lisa Schlosser, Torsten Hothorn, Achim Zeileis http://www.partykit.org/partykit Motivation 1/18 Motivation Other covariates Z 1 , . . . , Z p ? 1/18


slide-1
SLIDE 1

The Power of Unbiased Recursive Partitioning: A Unifying View of CTree, MOB, and GUIDE

Lisa Schlosser, Torsten Hothorn, Achim Zeileis

http://www.partykit.org/partykit

slide-2
SLIDE 2

Motivation

1/18

slide-3
SLIDE 3

Motivation

Other covariates Z1, . . . , Zp?

1/18

slide-4
SLIDE 4

Motivation

Zj ≤ ξ Zj > ξ

1/18

slide-5
SLIDE 5

Motivation

M(Y, X; ˆ β) M(Y1, X1; ˆ β1) M(Y2, X2; ˆ β2)

Zj ≤ ξ Zj > ξ

1/18

slide-6
SLIDE 6

Motivation

M(Y, X; ˆ β) M(Y1, X1; ˆ β1) M(Y2, X2; ˆ β2)

Zj ≤ ξ Zj > ξ

M can also be a more general model (possibly without X).

1/18

slide-7
SLIDE 7

Unbiased recursive partitioning

GUIDE: Loh (2002, Statistica Sinica).

  • First unbiased algorithm for recursive partitioning of linear models.
  • Separation of split variable and split point selection.
  • Based on χ2 tests.

CTree: Hothorn, Hornik, Zeileis (2006, JCGS).

  • Proposed as unbiased recursive partitioning for nonparametric modeling.
  • Based on conditional inference (or permutation tests).
  • Can be model-based via model scores as the response transformation.

MOB: Zeileis, Hothorn, Hornik (2008, JCGS).

  • Model-based recursive partitioning using M-estimation (ML, OLS, CRPS, . . . ).
  • Based on parameter instability tests.
  • Adapted to various psychometric models: Rasch, PCM, Bradley-Terry, MPT,

SEM, networks, . . . .

2/18

slide-8
SLIDE 8

Unbiased recursive partitioning

Basic tree algorithm:

1 Fit a model M(Y, X; ˆ

β) to the response Y and possible covariates X.

2 Assess association of M(Y, X; ˆ

β) and each possible split variable Zj and

select the split variable Zj∗ showing the strongest association.

3 Choose the corresponding split point leading to the highest improvement of

model fit and split the data.

4 Repeat steps 1–3 recursively in each of the resulting subgroups until some

stopping criterion is met. Here: Focus on split variable selection (step 2).

3/18

slide-9
SLIDE 9

Split variable selection

General testing strategy:

1 Evaluate a discrepancy measure capturing the observation-wise goodness

  • f fit of M(Y, X; ˆ

β).

2 Apply a statistical test assessing dependency of the discrepancy measure to

each possible split variable Zj.

3 Select the split variable Z∗ j showing the smallest p-value.

Discrepancy measures: (Model-based) transformations of Y (and X, if any), possibly for each model parameter.

  • (Ranks of) Y.
  • (Absolute) deviations Y − ¯

Y.

  • Residuals of M(Y, X; ˆ

β).

  • Score matrix of M(Y, X; ˆ

β).

  • . . .

4/18

slide-10
SLIDE 10

Discrepancy measures

Example: Simple linear regression M(Y, X; β0, β1), fitted via ordinary least squares (OLS). Residuals: r(Y, X, ˆ

β0, ˆ β1) = Y − ˆ β0 − ˆ β1 · X

5/18

slide-11
SLIDE 11

Discrepancy measures

Example: Simple linear regression M(Y, X; β0, β1), fitted via ordinary least squares (OLS). Residuals: r(Y, X, ˆ

β0, ˆ β1) = Y − ˆ β0 − ˆ β1 · X

Model scores: Based on log-likelihood or residual sum of squares. s(Y, X, ˆ

β0, ˆ β1) =

  • ∂r2(Y, X, ˆ

β0, ˆ β1) ∂β0 , ∂r2(Y, X, ˆ β0, ˆ β1) ∂β1

  • 5/18
slide-12
SLIDE 12

Discrepancy measures

Example: Simple linear regression M(Y, X; β0, β1), fitted via ordinary least squares (OLS). Residuals: r(Y, X, ˆ

β0, ˆ β1) = Y − ˆ β0 − ˆ β1 · X

Model scores: Based on log-likelihood or residual sum of squares. s(Y, X, ˆ

β0, ˆ β1) =

  • ∂r2(Y, X, ˆ

β0, ˆ β1) ∂β0 , ∂r2(Y, X, ˆ β0, ˆ β1) ∂β1

⇓ −2 · r(Y, X, ˆ β0, ˆ β1) −2 · r(Y, X, ˆ β0, ˆ β1) · X

5/18

slide-13
SLIDE 13

A unifying view

Algorithms: CTree, MOB, GUIDE are all ‘flavors’ of the general framework. Building blocks: For standard setup. Scores Binarization Categorization Statistic CTree Model scores – – Sum of squares MOB Model scores – – Maximally selected GUIDE Residuals

  • Sum of squares

Remarks:

  • All three algorithms allow for certain modifications of standard setup.
  • Further differences, e.g., null distribution, pruning strategy, etc.

6/18

slide-14
SLIDE 14

General framework

Building blocks:

  • Residuals vs. full model scores.
  • Binarization of residuals/scores.
  • Categorization of possible split variables.

7/18

slide-15
SLIDE 15

General framework

Building blocks:

  • Residuals vs. full model scores.
  • Binarization of residuals/scores.
  • Categorization of possible split variables.

s(Y, X, ˆ

β0, ˆ β1) = −2 ·      

r(Y1, X1, ˆ

β0, ˆ β1)

r(Y1, X1, ˆ

β0, ˆ β1) · X1

r(Y2, X2, ˆ

β0, ˆ β1)

r(Y2, X2, ˆ

β0, ˆ β1) · X2

. . . . . . r(Yn, Xn, ˆ

β0, ˆ β1)

r(Yn, Xn, ˆ

β0, ˆ β1) · Xn      

7/18

slide-16
SLIDE 16

General framework

Building blocks:

  • Residuals vs. full model scores.
  • Binarization of residuals/scores.
  • Categorization of possible split variables.

s(Y, X, ˆ

β0, ˆ β1) = −2 ·      

r(Y1, X1, ˆ

β0, ˆ β1)

r(Y1, X1, ˆ

β0, ˆ β1) · X1

r(Y2, X2, ˆ

β0, ˆ β1)

r(Y2, X2, ˆ

β0, ˆ β1) · X2

. . . . . . r(Yn, Xn, ˆ

β0, ˆ β1)

r(Yn, Xn, ˆ

β0, ˆ β1) · Xn      

7/18

slide-17
SLIDE 17

General framework

Building blocks:

  • Residuals vs. full model scores.
  • Binarization of residuals/scores.
  • Categorization of possible split variables.

r(Y, X, ˆ

β0, ˆ β1) =      

r(Y1, X1, ˆ

β0, ˆ β1)

r(Y2, X2, ˆ

β0, ˆ β1)

. . . r(Yn, Xn, ˆ

β0, ˆ β1)      

7/18

slide-18
SLIDE 18

General framework

Building blocks:

  • Residuals vs. full model scores.
  • Binarization of residuals/scores.
  • Categorization of possible split variables.

r(Y, X, ˆ

β0, ˆ β1) =      

r(Y1, X1, ˆ

β0, ˆ β1)

r(Y2, X2, ˆ

β0, ˆ β1)

. . . r(Yn, Xn, ˆ

β0, ˆ β1)       ⇒       > 0 ≤ 0

. . .

> 0      

7/18

slide-19
SLIDE 19

General framework

Building blocks:

  • Residuals vs. full model scores.
  • Binarization of residuals/scores.
  • Categorization of possible split variables.

Zj =

     

Zj1 Zj2 . . . Zjn

     

7/18

slide-20
SLIDE 20

General framework

Building blocks:

  • Residuals vs. full model scores.
  • Binarization of residuals/scores.
  • Categorization of possible split variables.

Zj =

     

Zj1 Zj2 . . . Zjn

      ⇒      

Q3 Q1 . . . Q2

     

7/18

slide-21
SLIDE 21

Pruning

Goal: Avoid overfitting. Two strategies:

  • Pre-pruning: Internal stopping criterion based on Bonferroni-corrected

p-values of the underlying tests. Stop splitting when there is no significant association.

  • Post-pruning: First grow a very large tree and afterwards prune splits that do

not improve the model fit, either via cross-validation (e.g., cost-complexity pruning as in CART) or based on information criteria (e.g., AIC or BIC).

8/18

slide-22
SLIDE 22

Simulation

Name Notation Specification Variables: Response Y

= β0(Z1) + β1(Z1) · X + ǫ

Regressor X

U([−1, 1])

Error

ǫ N(0, 1)

True split variable Z1

U([−1, 1]) or N(0, 1)

Noise split variables Z2, Z3, . . . , Z10

U([−1, 1]) or N(0, 1)

Parameters/functions: Intercept

β0

0 or ±δ Slope

β1

1 or ±δ True split point

ξ ∈ {0, 0.2, 0.5, 0.8}

Effect size

δ ∈ {0, 0.1, 0.2, . . . , 1}

9/18

slide-23
SLIDE 23

Simulation 1: True tree structure

z1 1 ≤ ξ > ξ true parameters: β0 = 0 or −δ β1 = 1 or +δ 2 true parameters: β0 = 0 or +δ β1 = 1 or −δ 3 −1.0 −0.5 0.0 0.5 1.0 −4 −2 2 4

varying β0

Y X

  • z1 ≤ ξ

z1 > ξ β0 = +δ β1 = 1 β0 = −δ β1 = 1 −1.0 −0.5 0.0 0.5 1.0 −4 −2 2 4

varying β1

Y X

  • z1 ≤ ξ

z1 > ξ β0 = 0 β1 = −δ β0 = 0 β1 = +δ −1.0 −0.5 0.0 0.5 1.0 −4 −2 2 4

varying β0 and β1

Y X

  • z1 ≤ ξ

z1 > ξ β0 = −δ β1 = +δ β0 = +δ β1 = −δ

10/18

slide-24
SLIDE 24

Simulation 1: Residuals vs. full model scores

δ Selection probability of Z1

0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1

ξ = 0.8 (90%)

0.2 0.4 0.6 0.8 1

varying β0 ξ = 0 (50%)

0.2 0.4 0.6 0.8 1

varying β1

0.0 0.2 0.4 0.6 0.8 1.0

varying β0 and β1 CTree MOB GUIDE+scores GUIDE

11/18

slide-25
SLIDE 25

Simulation 1: Maximum vs. linear selection

δ Selection probability of Z1

0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1

ξ = 0.8 (90%)

0.2 0.4 0.6 0.8 1

varying β0 ξ = 0 (50%)

0.2 0.4 0.6 0.8 1

varying β1

0.0 0.2 0.4 0.6 0.8 1.0

varying β0 and β1 CTree CTree+max MOB GUIDE+scores GUIDE

12/18

slide-26
SLIDE 26

Simulation 1: Continuously changing parameters

δ Selection probability of Z1

0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1

varying β0

0.2 0.4 0.6 0.8 1

varying β1

0.2 0.4 0.6 0.8 1

varying β0 and β1 CTree CTree+max MOB GUIDE+scores GUIDE

13/18

slide-27
SLIDE 27

Simulation 2: True tree structure

z2 1 ≤ ξ > ξ true parameters: β0 = 0 β1 = +δ 2 z1 3 ≤ ξ > ξ true parameters: β0 = −δ β1 = −δ 4 true parameters: β0 = +δ β1 = −δ 5

−1.0 −0.5 0.0 0.5 1.0 −4 −2 2 4 X Y

  • z2 ≤ ξ

z2 > ξ and z1 ≤ ξ z2 > ξ and z1 > ξ

β0 = 0 β1 = +δ β0 = +δ β1 = −δ β0 = −δ β1 = −δ

14/18

slide-28
SLIDE 28

Simulation 2: Residuals vs. full model scores

δ Adjusted Rand Index

0.0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 1

ξ = 0

0.2 0.4 0.6 0.8 1

ξ = 0.2

0.2 0.4 0.6 0.8 1

ξ = 0.5

0.2 0.4 0.6 0.8 1

ξ = 0.8 CTree MOB GUIDE+scores GUIDE

15/18

slide-29
SLIDE 29

Simulation 2: Pre-pruning vs. post-pruning

δ Adjusted Rand Index

0.0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 1

ξ = 0

0.2 0.4 0.6 0.8 1

ξ = 0.2

0.2 0.4 0.6 0.8 1

ξ = 0.5

0.2 0.4 0.6 0.8 1

ξ = 0.8 CTree MOB GUIDE+scores GUIDE

16/18

slide-30
SLIDE 30

Recommendations

In this setting:

  • Full model scores better than residuals only.
  • Original values of scores/residuals better than binarized values.
  • Categorization is simpler, but less powerful in margins.
  • Maximally-selected statistics (as in MOB) more powerful for abrupt shifts.
  • Linear statistics (default in CTree) more powerful for linear changes.
  • If the significance tests perform well pre-pruning works well,
  • therwise post-pruning might be needed.

17/18

slide-31
SLIDE 31

References

Schlosser L, Hothorn T, Zeileis A (2019). “The Power of Unbiased Recursive Partitioning: A Unifying View of CTree, MOB, and GUIDE.” arXiv:1906.10179, arXiv.org E-Print Archive.

https://arxiv.org/abs/1906.10179.

Loh W-Y (2002). “Regression Trees with Unbiased Variable Selection and Interaction Detection.” Statistica Sinica, 12(2), 361–386. http://www.jstor.org/stable/24306967 Hothorn T, Hornik K, Zeileis A (2006). “Unbiased Recursive Partitioning: A Conditional Inference Framework.” Journal of Computational and Graphical Statistics, 15(3), 651–674.

doi:10.1198/106186006X133933

Zeileis A, Hothorn T, Hornik K (2008). “Model-Based Recursive Partitioning.” Journal of Computational and Graphical Statistics, 17(2), 492–514. doi:10.1198/106186008X319331

18/18