Adding Factors and Interactions Danielle Quinn PhD Candidate, - - PowerPoint PPT Presentation

adding factors and interactions
SMART_READER_LITE
LIVE PREVIEW

Adding Factors and Interactions Danielle Quinn PhD Candidate, - - PowerPoint PPT Presentation

Regression Modeling in R: Case Studies REGRESSION MODELING IN R : CASE STUDIES Adding Factors and Interactions Danielle Quinn PhD Candidate, Memorial University Regression Modeling in R: Case Studies Identifying additional factors


slide-1
SLIDE 1

Regression Modeling in R: Case Studies

Adding Factors and Interactions

REGRESSION MODELING IN R: CASE STUDIES

Danielle Quinn

PhD Candidate, Memorial University

slide-2
SLIDE 2

Regression Modeling in R: Case Studies

Identifying additional factors

pr_fac(poisson_glm, dragonflies$season, xlabel = "season", modeltype = "poisson")

slide-3
SLIDE 3

Regression Modeling in R: Case Studies

Factors and interactions

Multiple predictor variables (factors) may inuence the response variable Interaction: the eect of one predictor may depend on the level of the other predictor variable

slide-4
SLIDE 4

Regression Modeling in R: Case Studies

Adding a factor

# Poisson GLM poisson_glm <- glm(abundance ~ stream_flow, data = dragonflies, family = "poisson") # Poisson GLM with an interaction added poisson_glm_factor <- glm(abundance ~ stream_flow * season, data = dragonflies, family = "poisson")

slide-5
SLIDE 5

Regression Modeling in R: Case Studies

Generating predicted values

# Create pred_df data frame pred_df <- expand.grid(stream_flow = seq(from = 1, to = 5, length = 10), season = c("summer", "autumn")) # Add predictions to pred_df pred_df$predicted <- predict(poisson_glm_factor, pred_df, type = "response" pred_df stream_flow season predicted 1.00 summer 75.61 1.44 summer 60.98 1.89 summer 49.17 ... ... ... ... ... ... ... ... ... 1.00 autumn 64.88 1.44 autumn 51.21 1.89 autumn 40.41

slide-6
SLIDE 6

Regression Modeling in R: Case Studies

Visualizing predicted values

ggplot(dragonflies) + geom_point(aes(x = stream_flow, y = abundance)) + geom_line(aes(x = stream_flow, y = predicted, col = season), data = pred_

slide-7
SLIDE 7

Regression Modeling in R: Case Studies

Model diagnostics: residuals

diag <- data.frame(residuals = resid(poisson_glm_factor), fitted = fitted(poisson_glm_factor)) ggplot(diag) + geom_point(aes(x = fitted, y = residuals))

slide-8
SLIDE 8

Regression Modeling in R: Case Studies

Model diagnostics: dispersion

dispersion(poisson_glm_factor, modeltype = "poisson") 20.65

slide-9
SLIDE 9

Regression Modeling in R: Case Studies

Time to practice!

REGRESSION MODELING IN R: CASE STUDIES

slide-10
SLIDE 10

Regression Modeling in R: Case Studies

Adding an oset to the model

REGRESSION MODELING IN R: CASE STUDIES

Danielle Quinn

PhD Candidate, Memorial University

slide-11
SLIDE 11

Regression Modeling in R: Case Studies

Reviewing the data

head(dragonflies, n = 1) abundance feeding_events area stream_flow time season 1 16 69 3.671 1.288379 day summer

slide-12
SLIDE 12

Regression Modeling in R: Case Studies

Dealing with unequal sampling eort

# Birds per square meter 15 / 1 15 # Birds per square meter 44 / 3 14.67

slide-13
SLIDE 13

Regression Modeling in R: Case Studies

Adding an oset

# Create column containing natural log of area dragonflies$logarea <- log(dragonflies$area) head(dragonflies) abundance feeding_events area stream_flow time season logarea 1 16 69 3.671 1.2883787 day summer 1.300464 2 32 153 4.574 1.2787605 night autumn 1.520388 3 88 408 5.100 0.5956905 day summer 1.629241 4 140 691 3.188 1.4999930 day summer 1.159394 5 62 355 3.830 1.1653945 day summer 1.342865 6 143 678 3.826 1.4268238 day summer 1.341820 # Add offset to the model poisson_glm_offset <- glm(abundance ~ stream_flow * season + offset(logarea data = dragonflies, family = "poisson")

slide-14
SLIDE 14

Regression Modeling in R: Case Studies

Try it out!

REGRESSION MODELING IN R: CASE STUDIES

slide-15
SLIDE 15

Regression Modeling in R: Case Studies

Negative Binomial models and model selection

REGRESSION MODELING IN R: CASE STUDIES

Danielle Quinn

PhD Candidate, Memorial University

slide-16
SLIDE 16

Regression Modeling in R: Case Studies

Negative Binomial GLMs

Have an extra parameter, theta, which relaxes the assumptions of equality between the mean and the variance This improves upon the Poisson GLM by addressing the issue of

  • verdispersion

Use the same link function as Poisson GLMs

slide-17
SLIDE 17

Regression Modeling in R: Case Studies

Negative Binomial GLMs in R

library(MASS) neg_binom_glm <- glm.nb(abundance ~ stream_flow * season + offset(logarea), data = dragonflies)

slide-18
SLIDE 18

Regression Modeling in R: Case Studies

Dropping terms

# Use drop1 to test influence of each term drop1(neg_binom_glm, test = "Chisq") Single term deletions Model: abundance ~ stream_flow * season + offset(logarea) Df Deviance AIC LRT Pr(>Chi) <none> 159.93 1361.1 stream_flow:season 1 160.10 1359.3 0.16363 0.6858 # Remove the interaction from the model neg_binom_glm_small <- glm.nb(abundance ~ stream_flow + season + offset(log data = dragonflies)

slide-19
SLIDE 19

Regression Modeling in R: Case Studies

Dropping (more?) terms

drop1(neg_binom_glm_small, test = "Chisq") Single term deletions Model: abundance ~ stream_flow + season + offset(logarea) Df Deviance AIC LRT Pr(>Chi) <none> 160.04 1359.3 stream_flow 1 351.37 1548.6 191.323 <2e-16 *** season 1 161.48 1358.7 1.434 0.0485 *

  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
slide-20
SLIDE 20

Regression Modeling in R: Case Studies

Model diagnostics: residuals

neg_binom_glm neg_binom_glm_small

slide-21
SLIDE 21

Regression Modeling in R: Case Studies

Model diagnostics: dispersion

dispersion(neg_binom_glm, modeltype = "nb") 1.11 dispersion(neg_binom_glm_small, modeltype = "nb") 1.10

slide-22
SLIDE 22

Regression Modeling in R: Case Studies

Apply your rst Negative Binomial GLM!

REGRESSION MODELING IN R: CASE STUDIES

slide-23
SLIDE 23

Regression Modeling in R: Case Studies

Model selection and visualization

REGRESSION MODELING IN R: CASE STUDIES

Danielle Quinn

PhD Candidate, Memorial University

slide-24
SLIDE 24

Regression Modeling in R: Case Studies

How complex should it get?

Larger model: All terms included (neg_binom_glm) Smaller model: Interaction eect removed (neg_binom_glm_small) "Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful." George Box

slide-25
SLIDE 25

Regression Modeling in R: Case Studies

Model selection: what we know so far

Do both models make sense? Is there heterogeneity in the residuals? Is there overdispersion?

slide-26
SLIDE 26

Regression Modeling in R: Case Studies

Model selection: Akaike Information Criterion (AIC)

Used for nested models only Lower values indicate beer t A dierence of three or more indicates a model is a beer t than the other When all else is similar, the less complex model is usually a good choice.

AIC(neg_binom_glm, neg_binom_glm_small) df AIC neg_binom_glm 5 1363.135 neg_binom_glm_small 4 1361.299

slide-27
SLIDE 27

Regression Modeling in R: Case Studies

Generating predicted values

# Create data frame pred_df pred_df <- expand.grid(stream_flow = seq(from = 1, to = 5, length = 10), season = c("summer", "autumn"), logarea = mean(dragonflies$logarea)) # Add predicted values to pred_df pred_df$predicted <- predict(neg_binom_glm_small, pred_df, type = "response head(pred_df) stream_flow season logarea predicted 1 1.000000 summer 1.73009 126.37291 2 1.444444 summer 1.73009 85.21510 3 1.888889 summer 1.73009 57.46179 4 2.333333 summer 1.73009 38.74732 5 2.777778 summer 1.73009 26.12789 6 3.222222 summer 1.73009 17.61842

slide-28
SLIDE 28

Regression Modeling in R: Case Studies

Visualizing the model

ggplot(dragonflies) + geom_point(aes(x = stream_flow, y = abundance)) + geom_line(aes(x = stream_flow, y = predicted, col = season), data = pred_

slide-29
SLIDE 29

Regression Modeling in R: Case Studies

slide-30
SLIDE 30

Regression Modeling in R: Case Studies

Generating standard errors

# Extract fitted values raw_fit <- predict(neg_binom_glm_small, pred_df, type = "link") # Extract standard errors raw_se <- predict(neg_binom_glm_small, pred_df, type = "link", se.fit = TRUE)$se # Calculate upper and lower standard errors and add to pred_df pred_df$lower <- exp(raw_fit - 1.96 * raw_se) pred_df$upper <- exp(raw_fit + 1.96 * raw_se) head(pred_df) stream_flow season logarea predicted upper lower 1 1.000000 summer 1.73009 126.37291 153.46921 104.06068 2 1.444444 summer 1.73009 85.21510 101.30002 71.68422 3 1.888889 summer 1.73009 57.46179 67.56242 48.87121 4 2.333333 summer 1.73009 38.74732 45.62704 32.90494 5 2.777778 summer 1.73009 26.12789 31.19052 21.88699 6 3.222222 summer 1.73009 17.61842 21.52939 14.41790

slide-31
SLIDE 31

Regression Modeling in R: Case Studies

Visualizing the standard error

ggplot(dragonflies) + geom_point(aes(x = stream_flow, y = abundance)) + geom_line(aes(x = stream_flow, y = predicted, col = season), data = pred_ geom_line(aes(x = stream_flow, y = upper, col = season), linetype = "dash data = pred_df) + geom_line(aes(x = stream_flow, y = lower, col = season), linetype = "dash data = pred_df)

slide-32
SLIDE 32

Regression Modeling in R: Case Studies

slide-33
SLIDE 33

Regression Modeling in R: Case Studies

Time to practice!

REGRESSION MODELING IN R: CASE STUDIES