Linear models Analysis of Covariance Confounding Interactions - - PowerPoint PPT Presentation

linear models analysis of covariance
SMART_READER_LITE
LIVE PREVIEW

Linear models Analysis of Covariance Confounding Interactions - - PowerPoint PPT Presentation

Esben Budtz-Jrgensen April 22, 2008 Linear models Analysis of Covariance Confounding Interactions Parameterizations Analysis of Covariance group comparisons can become biased if an important predictor of the response is


slide-1
SLIDE 1

Esben Budtz-Jørgensen April 22, 2008

Linear models Analysis of Covariance

  • Confounding
  • Interactions
  • Parameterizations
slide-2
SLIDE 2

Analysis of Covariance

  • group comparisons can become biased if an important predictor
  • f the response is distributed differently in the groups
  • An unbiased analysis can be obtained in a multiple regression

analysis with the group variable and the predictor as independent variables Examples:

  • Comparison of blood pressure level in men and women

— when they are not equally ’fat’

  • Comparison of lung capacity in men and women

— when they are not of the same height

1

slide-3
SLIDE 3

Lung Capacity, TLC

  • 32 patients are planned to have a heart/lung transplantation
  • TLC (Total Lung Capacity) determined by means of whole body

plethysmography

  • Is there a difference in lung capacity between men and women?

OBS SEX AGE HEIGHT TLC 1 F 35 149 3.40 2 F 11 138 3.41 3 M 12 148 3.80 . . . . . . . . . . . . . . . 29 F 20 162 8.05 30 M 25 180 8.10 31 M 22 173 8.70 32 M 25 171 9.45

2

slide-4
SLIDE 4

Box plots:

female male 4 6 8

total lung capacity

female male 140 160 180

height

3

slide-5
SLIDE 5

Marginal comparisons

TTEST PROCEDURE Variable: TLC SEX N Mean Std Dev Std Error

  • F

16 5.19812500 1.30082138 0.32520534 M 16 6.97687500 1.43801585 0.35950396 Variances T DF Prob>|T| Unequal

  • 3.6693

29.7 0.0009 Equal

  • 3.6693

30.0 0.0009 For H0: Variances are equal, F’ = 1.22 DF = (15,15) Prob>F’ = 0.7028 Variable: HEIGHT SEX N Mean Std Dev Std Error

  • F

16 160.81250000 9.36816417 2.34204104 M 16 174.06250000 10.66126165 2.66531541 Variances T DF Prob>|T| Unequal

  • 3.7344

29.5 0.0008 Equal

  • 3.7344

30.0 0.0008 For H0: Variances are equal, F’ = 1.30 DF = (15,15) Prob>F’ = 0.6228

Clear difference for both TLC and HEIGHT

4

slide-6
SLIDE 6

Analysis of covariance

Comparison of parallel regression lines MODEL: Ygi = αg + βxgi + ǫgi g = 1, 2; i = 1, . . . , ng

5

slide-7
SLIDE 7

What happens if we ’forget’ about x?

MODEL: Ygi = αg + βxgi + ǫgi g = 1, 2; i = 1, . . . , ng If ¯ x1 = ¯ x2, the difference in group means ( ¯ Y2 − ¯ Y1) is biased.

6

slide-8
SLIDE 8

Interaction

The two lines can have different slopes. More general model: ygi = αg + βgxgi + ǫgi g = 1, 2; i = 1, . . . , ng If β1 = β2, the two covariates interact:

  • Effect of height depends on sex
  • Difference between males and females depends on height

7

slide-9
SLIDE 9

Relationship between TLC and HEIGHT:

8

slide-10
SLIDE 10

Relationship between log-transformed TLC and height, HEIGHT

9

slide-11
SLIDE 11

Model specification: Model with interaction

proc glm; class sex; model ltlc=sex height sex*height / solution; run;

Or in SAS Analyst: ANOVA/Linear models

  • choose ltlc as dependent
  • choose height as a quantitative variable
  • choose sex as a class variable
  • under the Model button insert the “cross”-term

10

slide-12
SLIDE 12

Output

Dependent Variable: LTLC Sum of Mean Source DF Squares Square F Value Pr > F Model 3 0.27230446 0.09076815 13.05 0.0001 Error 28 0.19478293 0.00695653 Corrected Total 31 0.46708739 R-Square C.V. Root MSE LTLC Mean 0.582984 10.85524 0.08341 0.76835 Source DF Type I SS Mean Square F Value Pr > F SEX 1 0.13626303 0.13626303 19.59 0.0001 HEIGHT 1 0.13451291 0.13451291 19.34 0.0001 HEIGHT*SEX 1 0.00152852 0.00152852 0.22 0.6429 Source DF Type III SS Mean Square F Value Pr > F SEX 1 0.00210426 0.00210426 0.30 0.5867 HEIGHT 1 0.13597107 0.13597107 19.55 0.0001 HEIGHT*SEX 1 0.00152852 0.00152852 0.22 0.6429 T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT

  • .2190181620 B
  • 0.62

0.5391 0.35221658 SEX F

  • .2810587157 B
  • 0.55

0.5867 0.51102682 M 0.0000000000 B . . . HEIGHT 0.0060473650 B 2.99 0.0057 0.00201996 HEIGHT*SEX F 0.0014344422 B 0.47 0.6429 0.00306016 M 0.0000000000 B . . . 11

slide-13
SLIDE 13

Relationship between log-transformed TLC and height, HEIGHT

12

slide-14
SLIDE 14

Reduction of the model

The interaction term was excluded

Dependent Variable: LTLC Sum of Mean Source DF Squares Square F Value Pr > F Model 2 0.27077594 0.13538797 20.00 0.0001 Error 29 0.19631145 0.00676936 Corrected Total 31 0.46708739 R-Square C.V. Root MSE LTLC Mean 0.579712 10.70821 0.08228 0.76835 Source DF Type I SS Mean Square F Value Pr > F SEX 1 0.13626303 0.13626303 20.13 0.0001 HEIGHT 1 0.13451291 0.13451291 19.87 0.0001 Source DF Type III SS Mean Square F Value Pr > F SEX 1 0.00968023 0.00968023 1.43 0.2415 HEIGHT 1 0.13451291 0.13451291 19.87 0.0001 T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT

  • .3278068826 B
  • 1.25

0.2198 0.26135206 SEX F

  • .0421012632 B
  • 1.20

0.2415 0.03520676 M 0.0000000000 B . . . HEIGHT 0.0066723630 4.46 0.0001 0.00149683

Note: Now the effect of sex has disappeared!

13

slide-15
SLIDE 15

Interpretation

In this example we saw that

  • The observed difference in (log10) lung function between females

and males could be attributed to the difference in height

A 95% confidence interval for log10-difference is 0.0421 ± 2 × 0.0352 = (−0.0283, 0.1125), corresponding to the interval (0.94, 1.30) for the ratio of lung capacity, i.e., men can have a 30% better lung function.

It is also possible that

  • Groups that appear to be equal in marginal analysis (e.g. blood

pressure in men and women) show a difference after adjustment for important covariates (such as obesity) All variables with potential influence should be considered!

14

slide-16
SLIDE 16

Example: Blood pressure vs. obesity and sex

Marginal analysis indicates that there are no differences in blood pressure levels in males and females. However, when we adjust for the degree of obesity suddenly we can see a sex-difference.

15

slide-17
SLIDE 17

Model

with interaction:

proc glm; class sex; model lbp=lobese sex sex*lobese / solution; run;

16

slide-18
SLIDE 18

Output

General Linear Models Procedure Dependent Variable: LBP Sum of Mean Source DF Squares Square F Value Pr > F Model 3 0.05583810 0.01861270 6.30 0.0006 Error 98 0.28952497 0.00295434 Corrected Total 101 0.34536306 Source DF Type I SS Mean Square F Value Pr > F LOBESE 1 0.03809379 0.03809379 12.89 0.0005 SEX 1 0.01597238 0.01597238 5.41 0.0221 LOBESE*SEX 1 0.00177193 0.00177193 0.60 0.4405 Source DF Type III SS Mean Square F Value Pr > F LOBESE 1 0.03920980 0.03920980 13.27 0.0004 SEX 1 0.01252714 0.01252714 4.24 0.0421 LOBESE*SEX 1 0.00177193 0.00177193 0.60 0.4405 T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT 2.087171366 B 165.93 0.0001 0.01257865 SEX female

  • 0.039290663 B
  • 2.06

0.0421 0.01908066 male 0.000000000 B . . . LOBESE 0.227981122 B 1.73 0.0863 0.13158758 LOBESE*SEX female 0.123097524 B 0.77 0.4405 0.15894836 male 0.000000000 B . . . 17

slide-19
SLIDE 19

Re-parametrization

proc glm; class sex; model lbp=sex sex*lobese / noint solution; run;

General Linear Models Procedure Dependent Variable: LBP Sum of Mean Source DF Squares Square F Value Pr > F Model 4 449.803216 112.450804 38062.97 0.0001 Error 98 0.289525 0.002954 Uncorrected Total 102 450.092741 ... Source DF Type III SS Mean Square F Value Pr > F SEX 2 141.530202 70.765101 23952.96 0.0001 LOBESE*SEX 2 0.054676 0.027338 9.25 0.0002 T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate SEX female 2.047880703 142.73 0.0001 0.01434744 male 2.087171366 165.93 0.0001 0.01257865 LOBESE*SEX female 0.351078645 3.94 0.0002 0.08915879 male 0.227981122 1.73 0.0863 0.13158758 18

slide-20
SLIDE 20

The model is the same, 2 different parameterizations: 1. model lbp = lobese sex sex*lobese

  • An intercept for the reference group (sex=1)
  • An intercept difference from sex=0 to sex=1
  • An effect of lobese (slope) for the reference group
  • A slope difference from sex=0 to sex=1

2. model lbp=sex sex*lobese / noint

  • An intercept for each group (sex)
  • A slope (lobese effect) for each group (sex)

19

slide-21
SLIDE 21

Reduced model: no interaction (equal slopes)

proc glm; class sex; model lbp=lobese sex / solution; run;

20

slide-22
SLIDE 22

Reduced model, output

General Linear Models Procedure Dependent Variable: LBP Sum of Mean Source DF Squares Square F Value Pr > F Model 2 0.05406617 0.02703308 9.19 0.0002 Error 99 0.29129690 0.00294239 Corrected Total 101 0.34536306 ... Source DF Type I SS Mean Square F Value Pr > F SEX 1 0.00116215 0.00116215 0.39 0.5311 LOBESE 1 0.05290402 0.05290402 17.98 0.0001 Source DF Type III SS Mean Square F Value Pr > F SEX 1 0.01597238 0.01597238 5.43 0.0218 LOBESE 1 0.05290402 0.05290402 17.98 0.0001 T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT 2.081052655 B 213.05 0.0001 0.00976800 SEX female -0.027765105 B

  • 2.33

0.0218 0.01191694 male 0.000000000 B . . . LOBESE 0.312347032 4.24 0.0001 0.07366198 NOTE: The X’X matrix has been found to be singular and a generalized inverse was used to solve the normal equations. Estimates followed by the letter ’B’ are biased, and are not unique estimators of the parameters. 21

slide-23
SLIDE 23

Conclusion

  • The male level is 0.0278 higher than the female level (for fixed

level of obesity), but remember this is on a log10-scale

  • Confidence interval: 0.0278 ± 2 × 0.0119 = (0.0040, 0.0516)
  • Back-transformed: (1.009, 1.126), i.e. the male level is between

1% og 12.6% above the female level

22