Online Tutorial: UCLAstats
Readings Recommended by Lee for this script:
Rogosa, D. (1980). Comparing nonparallel regression lines. Psychological Bulletin, 88, 307–321 Bauer, D.J. and Curran, P.J., 2005. Probing interactions in fixed and multilevel regression: Inferential and graphical techniques. Multivariate behavioral research, 40(3), pp.373-400.
Set working directory.
setwd("...your_working_directory/")
Load the following libraries. If not yet installed, use
install.packages("package_name")
library("car")
library("data.table")
library("tidyverse")
library("pals")
library("jtools")
library("ggplot2")
library("ggiraphExtra")
library("emmeans")
library("corrgram")
library("Matrix")
In last week’s lab, we explored linear models that contained main effects. In other words, we focused on models that evaluated the distinct impact of two predictor variables on the response. However, it may be the case that the magnitude of effect of a predictor variable (x1) on the response y, is modified by a second predictor variable (x2; or some grouping factor). For example, a large difference in y may be observed between two levels of x2 at lower values of x1 but that difference disappears at higher values of x1. Interaction terms explore whether the effect of one predictor on a response variable (Y ~ X1) changes for different levels of predictor variable (X2). Visually, interactions can be illustrated in many ways but examples frequently draw from non-parallel lines on xy plots (e.g. intersecting lines).
What exactly are we trying to evaluate when we ask “Is there an interaction?”
With categorical predictor variables, as is the case with ANOVA, we are asking “How does one factor affect the”performance” of the other factor in explaining variance in the response y?“.
Similarly, in linear regression where at least one predictor is continuous, we are asking: “How does the effect of the continuous predictor variable on the response differ across levels of the other (continuous or categorical) predictor?”.
Note in this lab, we will focus on effect sizes when interpreting the
outputs - not null hypothesis testing (e.g. p-values/significance
testing). The first section focuses on interpreting the
coef
output. In other words, understanding the
effect sizes as they relate to linear models with
interactions. In the second section, you will manipulate interactions
through simulations.
Note: This section covers ONLY multiplicative Interactions. We will explore non-multiplicative interactions using simulated data in Section 2.
We will use the Salaries
data set for the example.
Salaries
is built into R and has the salary
for professors of different rank (associate professor, assistant
professor and professor), from different disciplines (two levels: A and
B) and binary gender data (male, female) - among other variables. Below
we explore whether sex
modifies the relationship between
salary
and discipline
. In other words, maybe
males make higher salary in one discipline and females in the other.
salary_model <- lm(salary ~ discipline*sex, data = Salaries)
coef(salary_model)
## (Intercept) disciplineB sexMale disciplineB:sexMale
## 89064.94 22169.58 21635.04 -14109.19
interaction.plot(Salaries$discipline, Salaries$sex, Salaries$salary)
Remember, coef
returns the effect sizes (aka differences
in means). The interpretation of coef
is as follows.
Intercept
or β0
: is the mean
salary for females in discipline A. Females in discipline A make on
average $89,064.94 (refer eq. A below)disciplineB
or β1
: is the effect
(or the change in mean salary) for females in discipline
B. On average, females make 22169.58 more dollars in discipline
B than they do in discipline A. Mean salary for females in discipline B
is $111,234.52. (89,064.94 + 22,169.58). (refer eq. B
below)sexMale
or β2
: is the effect (or
the change in mean salary) for males in discipline A.
Males in discipline A make on average 21,635.04 more dollars than
females in discipline A. Average salary for males in discipline A is
$110,699.98.(89,064.94 + 21,635.04) (refer eq. C
below)disciplineB:sexMale
or β3
: Change
in discipline effect if sex is male OR the change in the sex effect if
the individual teaches discipline is B. The mean salary for males in
discipline B can be calculated by adding all the coefficients together.
(89,064.94 + 22,169.58 + 21,635.04 -14,109.19)(refer eq. D
below)If those interpretations make perfect sense to you I’d skip ahead to the next section. I like to remind myself what is happening under the hood to get those outputs and find it a helpful exercise, especially if you are newer to stats, to write out the mathematical formula. The way R codes a model can sometimes detach us from the mathematical model.
First a quick review of model components:
Now we will create a regression equation that solves for each of the conclusions above.
Calculating Intercept
/ Salary for females in
discipline A:
Remember, use of the lm()
function with two categorical
variables is the same as performing a 2x2 ANOVA (Since ANOVAs are linear
models, you can get R to summarize aov()
and
lm()
models in the same way useing
summary.aov()
). In cases with categorical variables, R
dummy codes the variables and the structure of the regression formula.
Here are the dummy codes relevant to our model above:
discipline
: A = 0; B = 1 sex
:
female = 0; male = 1
How do I know that is the way R dummy codes these variables? R
assigns 1 to the first level it encounters for a given factor. Also, the
coef
heading disciplineB
and
sexMale
lets you know those two levels were assigned dummy
code = 1 for each of those variables.
Let’s input the coefficients, and dummy code accordingly. If we are interested in Salary for females in discipline A then this would imply the dummy code for discipline = 0 and sexM = 0. All terms cancel with the exception of the intercept. Yay! It makes sense the intercept represents the mean salary for females in discipline A.
Calculating salary for females in discipline B:
If we are interested in Salary for females in discipline B then this would imply the dummy code for discipline = 1 and sexM = 0. Only the sex term and interaction term cancel this time. Hence, we add the effect of discipline to the intercept.
Calculating salary for males in discipline A:
Calculating salary for males in discipline B:
In summary, the effect of gender on salary is modified by the discipline to which the professors are a part of. Male professors generally make higher salary than female professors (ugh) and this disparity is greater for discipline A than B.
Let’s grab some example data from online.
autompg = read.table(
"http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data",
quote = "\"",
comment.char = "",
stringsAsFactors = FALSE) |>
setNames(c("mpg", "cyl", "disp", "hp", "wt", "acc", "year", "origin", "name")) |> #name the columns
subset(hp != "?") |> #remove missing data, which is stored as "?"
subset(name != "plymouth reliant") |> #remove the plymouth reliant, as it causes some issues
subset(!(cyl %in% c(3, 5))) |> #remove 3 and 5 cylinder cars (which are very rare.)
select(-name) |> #remove the "name" column
within({
hp <- as.numeric(hp) #change horsepower from character to numeric
domestic <- as.numeric(origin == 1) #create a dummy variable for cars. domestic = 1
cyl <- as.factor(cyl) #change cyl from numeric to factor
})
In autompg
:
mpg
: miles per gallon (continuous dependent
variable)disp
: engine displacement (continuous predictor
variable)domestic
: domestic/foreign (categorical predictors
variable)#There are two ways to code the same interaction in R using the lm() function. One uses the `*` operator and the other the `:` operator.
model.disp_dom1 <- lm(mpg ~ disp*domestic, data = autompg)
model.disp_dom2 <- lm(mpg ~ disp + domestic + disp:domestic, data = autompg)
#R interprets * as a combination of + (addition) and : (interaction). R should see these models as exactly the same
#Yay! outputs yield same results-just as we'd expect.
summary(model.disp_dom1)
##
## Call:
## lm(formula = mpg ~ disp * domestic, data = autompg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.8332 -2.8956 -0.8332 2.2828 18.7749
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.05484 1.80582 25.504 < 2e-16 ***
## disp -0.15692 0.01668 -9.407 < 2e-16 ***
## domestic -12.57547 1.95644 -6.428 3.90e-10 ***
## disp:domestic 0.10252 0.01692 6.060 3.29e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.308 on 379 degrees of freedom
## Multiple R-squared: 0.7011, Adjusted R-squared: 0.6987
## F-statistic: 296.3 on 3 and 379 DF, p-value: < 2.2e-16
summary(model.disp_dom2)
##
## Call:
## lm(formula = mpg ~ disp + domestic + disp:domestic, data = autompg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.8332 -2.8956 -0.8332 2.2828 18.7749
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.05484 1.80582 25.504 < 2e-16 ***
## disp -0.15692 0.01668 -9.407 < 2e-16 ***
## domestic -12.57547 1.95644 -6.428 3.90e-10 ***
## disp:domestic 0.10252 0.01692 6.060 3.29e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.308 on 379 degrees of freedom
## Multiple R-squared: 0.7011, Adjusted R-squared: 0.6987
## F-statistic: 296.3 on 3 and 379 DF, p-value: < 2.2e-16
Let’s look at this visually:
#Define intercepts.
int_for = coef(model.disp_dom1)[1]
int_dom = coef(model.disp_dom1)[1] + coef(model.disp_dom1)[3] #Consider the formula above that we worked out by hand. The calculation for int_dom should make sense now.
#Define slopes
slope_for = coef(model.disp_dom1)[2]
slope_dom = coef(model.disp_dom1)[2] + coef(model.disp_dom1)[4]#Consider the formula above that we worked out by hand. The calculation for slope_dom should make sense now.
#plot the coefficients and slopes using base R functions
plot(mpg ~ disp, data = autompg, col = domestic + 1, pch = domestic + 1, xlim = c(0, 500), ylim = c(0, 50))
abline(int_for, slope_for, col = 1, lty = 1, lwd = 2) # line for foreign cars
abline(int_dom, slope_dom, col = 2, lty = 2, lwd = 2) # line for domestic cars
legend("topright", c("Foreign", "Domestic"), pch = c(1, 2), col = c(1, 2))
#or if you don't want to do the math
ggplot(autompg, aes(disp, mpg, color = as.factor(domestic)))+
geom_point()+
geom_smooth(method = lm)
## `geom_smooth()` using formula = 'y ~ x'
Again, since we are emphasizing effect sizes (not significance
testing) let’s just look at the coef
coef(model.disp_dom1)
## (Intercept) disp domestic disp:domestic
## 46.0548423 -0.1569239 -12.5754714 0.1025184
Interpreting Model Outputs:
Remember from the ANOVA example, we cannot interpret the main effects
without respect to a given level of the other variable.
disp = -0.16
is not interpreted as an overall effect on
mpg
. Instead, to interpret disp = -0.16
we
must take into account the two levels of domestic
.
Intercept
: Mean mpg
of
foreign cars with 0 disp
disp
: effect of disp
on
mpg
for foreign cars.domestic
: the effect of being a
domestic caron mean mpg
disp:domestic
: Change in disp
effect on mpg
if origin of car is domestic
OR the change in disp
effect on
mpg
if the car origin is foriegn
.Means and slopes for each group:
mpg
of foreign cars with 0
disp
= 46.05mpgdisp
average fuel efficiency (mpg)
decreases by 0.16.mpg
of domestic cars when
disp
= 0 is 33.48mpgmpg
) is a reduction of
0.06 for every one unit change in engine power
(disp
).Skip ahead if you can interpret the coef
output easily
and those interpretations make perfect sense. We will now write out the
model like before and show how we came to the conclusions above.
Let’s interpret coefficients by implementing dummy coding.
domestic
: 0 = foreign; 1 = domestic.Let’s first consider the distinct effect of engine displacement
disp
on mpg
for foreign cars.
To do so, we dummy code domestic
with 0. Solving for the
equation reduces it to mpg = 46.05 + (-0.16*disp)
. We can
now interpret the coefficient estimate for disp
in the
summary output above as: In foreign cars, the estimated change in
average fuel efficiency (mpg) is a reduction of 0.16 for every one unit
change in engine power. The mean mpg for foreign cars with no
displacement is 46.05.
Now let’s solve for the distinct effect of engine displacement
disp
on mpg
for domestic
cars. To do so, we dummy code domestic
with 1. To
solve for the equation, you will see we end up doing the following:
β1 + β3 = -0.06 and β0 + β2 = 33.48
. We can now interpret
the coefficient estimate for disp
in the summary output
above as: In domestic cars, the estimated change in average fuel
efficiency (mpg) is a reduction of 0.06 for every one unit change in
engine power. We also see that the intercept
for domestic
cars is 33.48.
Summary: Foreign cars have a stronger negative relationship between engine power and miles per gallon than domestic cars. At lower values of displacement (engine power), foreign cars have greater fuel efficiency than domestic cars. However, as displacement increases (engine power increases above >120), domestic cars have greater fuel efficiency.
Review: Video 1
Check for Understanding
cyl
:
lm(mpg ~ disp*cyl)
.Now let’s consider two continuous predictor
variables (disp
and hp
, engine
displacement and horsepower) and their effect on miles per gallon.
# hp = horsepower
model.disp_hp <- lm(mpg ~ disp *hp, data = autompg)
coef(model.disp_hp)
## (Intercept) disp hp disp:hp
## 52.4081997848 -0.1001737655 -0.2198199720 0.0005658269
As we did before, let’s write out the equation (eq. a). First, we we
can rearrange terms by factoring out x1 (eq. b). Once inputting the
coef
(eq. c) we see that for a one unit increase in x1
(disp
), the mean of Y (mpg
) increases by the
term β1 + β3*x2
. This indicates that the increase in mean Y
for each unit increase in x1, will differ across values of x2
(hp
) (eq. d).
Let’s solve for x2 (eq. e-g). For a one unit increase in x2
(hp
), the mean of Y (mpg
) increases by the
term β2 + β3x1
(eq. g). This indicates that the increase in
mean Y for each unit increase in x2, will differ across values of x1
(disp
).
To plot this we can bin data and you will see an example of this
later in this script using simulated data where values are grouped by
quantiles. For now, let’s interpret the coef
.
coef(model.disp_hp)
## (Intercept) disp hp disp:hp
## 52.4081997848 -0.1001737655 -0.2198199720 0.0005658269
mpg
for a
car with 0 disp
and 0 hp
mpg
for an increase in 1 disp
, for a
car with 0 hp
. When hp
increases to
any value other than zero, then the influence of disp
on
mpg
increase by 0.0005 for every level of
hp
.mpg
for an increase in 1 hp
, for a car
with 0 disp
. When disp
takes on
values above zero, then the influence of hp
on
mpg
increases by 0.005 for every level of
disp
.mpg
for and increase in
disp
, for a car of a certain hp
(or visa
versa)Need Review? Video1
You may be asking now, there’s got to be an automated way to get
those slopes for interactions? Of course there is! But our approach
above hopefully encouraged a better understanding. Use
emmeans::emtrends()
for a quick output of the slopes for
each level of the modifier variable. For example:
emtrends(model.disp_dom1, ~ domestic, var = "disp")
## domestic disp.trend SE df lower.CL upper.CL
## 0 -0.1569 0.01670 379 -0.1897 -0.1241
## 1 -0.0544 0.00282 379 -0.0599 -0.0489
##
## Confidence level used: 0.95
These are some functions that will come in handy later.
split_data()
takes a variable
in
data
and breaks into a specified number of quantiles
n_breaks
. It adds a column break_num
to
data
which is the quantile to which each observation
belongs. ggint()
makes an interaction plot where colors
represent quantiles, which will save us from copy - pasting this code
over and over.
split_data <- function(data, n_breaks, var){
breaks <- quantile(data[,var], seq(0, 1, by = 1/n_breaks)) #named vector of quantiles for the input column
data$break_num <- as.factor(sapply(data[,var], \(i) max(which(breaks <= i)))) #the quantile for each point
#fixes max value returning n_breaks + 1 when it should be n_breaks
return(within(data,{break_num[data[,var] == max(data[,var])] = n_breaks}))
}
ggint <- function(data){
return(
ggplot(slopes_plot) +
geom_point(aes(x1, y, fill = break_num), pch = 21, color = "black") +
geom_smooth(aes(x1, y, group = break_num), color = "black", se = FALSE, method = lm, linewidth = 1.5)+
geom_smooth(aes(x1, y, color = break_num), se = FALSE, method = "lm", linewidth = 1) +
scale_fill_manual(values = ocean.haline(6)) + #pals::ocean contains colorblind friendly palettes
scale_color_manual(values = ocean.haline(6)) +
labs(fill = "x2 quantile", color = "x2 quantile") +
theme_classic())
}
Main Effects Model- No Interactions
Before we examine interactions let’s start with a dataset with no
interactions - only additive effects. The code below creates that data
frame. It is important to remember that we are simulating data,
therefore, we are defining what predicts the response variable (y). This
is in contrast to when you have data in hand and you fit a model to find
what predicts y. Here we are defining y as
y <- 5*x1 + -2*x2 + rnorm(1000, 0, 2)
. Which is to say
we are simulating y to be dependent on two main effects: x1
and x2
with an error term (noise) that is normally
distributed. Notice that by making our errors normally distributed we
are “forcing” our model to meet the assumptions of linear models.
set.seed(335)
# simulate to uncorrelated predictor variables: x1 and x2. If we wanted to simulate correlated variables,
# we would so something like this: x2 <- x1 + rnorm(1000, 0, 0.2)
additive <- data.frame('x1' = rnorm(1000, 0, 1), 'x2' = rnorm(1000, 0, 1)) |>
within({y <- 5*x1 + -2 *x2 + rnorm(1000, 0, 2)}) #generate a response variable that is predicted by x1 and x2, with noise.
To look for interactions in each data set, we will examine whether
the relationship between x1
and y
changes for
different subsets of the data. Since x2
is currently a
continuous variable, we will make it categorical using the
split_data
function. This breaks x2
into 6
parts (because we specified n_breaks = 6
). The new column
break_num
is created in a data frame
slopes_plot
. A break_num = 1
represents the
lowest values of x2
, while break_num = 6
represents the highest values of x2
. Here we break the data
into 6 sections.
slopes_plot <- split_data(additive, 6, "x2")
Let’s plot the data. We see that the relationship between
x1
and y
is independent of x2
. We
know this because the slope of the line does not change whether looking
exclusively at the low values of x2
or the high values of
x2
. No interactions here.
ggint(slopes_plot)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
Let’s view this relationship another way by plotting the slope
estimates for each x2
quantile. Note here we have
n_breaks = 20
rather than n_breaks = 6
from
above. The latter would only leave us with 6 points on the graph.
# `coef_plot_data` is a data frame of slope estimates (aka. effect sizes or coefficients) for the effect of X1 on Y for each group (here, quantile of X2).
slopes_plot <- split_data(additive, 20, "x2")
coef_plot_data <- data.frame(
'x1' = sapply(1:20, \(i) coef(lm(y ~ x1, data = subset(slopes_plot, break_num == i)))[[2]]),
'x2' = 1:20)
ggplot(coef_plot_data, aes(x1, x2)) +
geom_point() +
labs(x = "Effect size of x1 on y", y = "Quantile of x2") +
theme_classic()
Here we see how the slope estimate of the line varies by subset of
the data. Again, no pattern here. In other words, there is no
interaction among x1
and x2
. The effect of
x1
on y
is not modified by x2
. As
we simulated, these predictor variables are independent of each
other.
While the plot above shows compelling visual evidence for no
relationship between x1
and x2
, let’s verify
using the effect size from a linear model. Effect size basically zero
(x1
= 0.21 +/- 4.1).
summary(lm(x2 ~ x1, data = coef_plot_data))
##
## Call:
## lm(formula = x2 ~ x1, data = coef_plot_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.5538 -4.8155 0.0106 4.6486 9.5761
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.4441 20.6794 0.457 0.653
## x1 0.2101 4.1059 0.051 0.960
##
## Residual standard error: 6.078 on 18 degrees of freedom
## Multiple R-squared: 0.0001455, Adjusted R-squared: -0.0554
## F-statistic: 0.002619 on 1 and 18 DF, p-value: 0.9598
Model With Interaction Effect
Now we add a multiplicative interaction, with still
uncorrelated predictors. Same steps as above, but you will note the
interaction added to the model in the term: 2*x1*x2
multiplicative <- data.frame('x1' = rnorm(1000, 0, 1), 'x2' = rnorm(1000, 0, 1)) |>
within({y <- 5*x1 + 2*x2 + 2*x1*x2 + rnorm(1000, 0, 2)}) #here you can see the interaction (2*x1*x2)
slopes_plot <- split_data(multiplicative, 6, "x2")
#Now we see an interaction. As the values of x2 increase, so does the strength
#of the relationship between x1 and y.
ggint(slopes_plot)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
We can see here that for different levels of
x2
, the effect
(or slope) of x1
on y
differs.
Now, when we plot the slopes for different quantiles of
x2
, we should expect to see some relationship. A pattern
indicates that the effect of one variable on the response will vary
depending on the value of the other predictor.
#Now this coefficient plot might be more interesting. Let's examine the relationship
#between the values of x2 and the slope between x1 and y.
slopes_plot <- split_data(multiplicative, 20, "x2")
coef_plot_data <- data.frame(
'x1' = sapply(1:20, \(i) coef(lm(y ~ x1, data = subset(slopes_plot, break_num == i)))[[2]]),
'x2' = 1:20)
#We see a linear relationship, like we simulated!
ggplot(coef_plot_data, aes(x1, x2)) +
geom_point() +
labs(x = "Effect size of x1 on y", y = "Quantile of x2") +
theme_classic()
#Were we see a slope of around 3, not quite what we simulated, but close.
summary(lm(x2 ~ x1, data = coef_plot_data))
##
## Call:
## lm(formula = x2 ~ x1, data = coef_plot_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2501 -0.6889 -0.1723 0.5711 2.5269
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.5392 0.7689 -5.903 1.37e-05 ***
## x1 2.9956 0.1434 20.894 4.52e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.21 on 18 degrees of freedom
## Multiple R-squared: 0.9604, Adjusted R-squared: 0.9582
## F-statistic: 436.6 on 1 and 18 DF, p-value: 4.522e-14
Probing interactions
By evaluating the slopes at different levels of the modifier variable
we “probed the interaction”. By binning the x2
, we can see
is high or low slopes are more important to modifying the effect of
x1
on y
. A novice may limit their excitement
to a significant interaction, but it is possible to further explore the
pattern of that interaction by asking what levels of the variable are
more important modifiers. For example, we could say something like “At
high values of the moderator x2
the effect of
x1
on the response was very important, but at lower levels
not so much”. Hypothesis tests can be applied to assess the statistical
significance of the coefficient of the modifier. For some reason the
term “probing” is more common in Psyc literature and is often referred
to more generally as “moderator analysis”. To explore the concept of
probing, Lee recommends reading Rogosa 1980
& Bauer and
Curran 2005. For a video explanation using point-and-click software
Click here
What if the interaction is not multiplicative?
Now you will notice when we simulate the response variable
y
, the interaction effect is exponential
exp(x1 + x2)
.
exponential <- data.frame('x1' = rnorm(1000, 0, 1), 'x2' = rnorm(1000, 0, 1)) |>
within({y <- 5*x1 + 2*x2 + exp(x1 + x2) + rnorm(1000, 0, 2)}) #Here we simulate an exponential interaction
slopes_plot <- split_data(exponential, 6, "x2")
ggint(slopes_plot)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
Interesting, for the highest level of
x2
(red poits) we see
what looks like a non-linear increase in higher values of
x1
. Let’s look at the plot of coefficients.
slopes_plot <- split_data(exponential, 20, "x2")
coef_plot_data <- data.frame(
'x1' = sapply(1:20, \(i) coef(lm(y ~ x1, data = subset(slopes_plot, break_num == i)))[[2]]),
'x2' = 1:20)
#Here we see that the effect of x1 on y explodes at high values of x2.
ggplot(data = coef_plot_data) +
geom_point(aes(x1, x2), pch = 21, fill = "gray65", color = "black") +
labs(x = "Effect size of x1 on y", y = "Quantile of x2") +
theme_classic()
summary(lm(x2 ~ x1, data = coef_plot_data))
##
## Call:
## lm(formula = x2 ~ x1, data = coef_plot_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.005 -3.173 -0.456 3.657 5.346
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.0323 2.0750 0.497 0.625
## x1 1.2163 0.2417 5.033 8.64e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.917 on 18 degrees of freedom
## Multiple R-squared: 0.5846, Adjusted R-squared: 0.5615
## F-statistic: 25.33 on 1 and 18 DF, p-value: 8.641e-05
Very cool! Much more clear. For an exponential interaction, the effect of x1 on y is modified by x2 and this effect is super strong at higher values of x2.
Let’s compare some models using AIC. Model 1 uses typical interaction coding, notice that the slope is close to the slope we got a couple of lines up. Basically this model draws a straight line through that exponential relationship.
model_linear <- lm(y ~ x1 + x2 + x1*x2, data = exponential)
summary(model_linear)
##
## Call:
## lm(formula = y ~ x1 + x2 + x1 * x2, data = exponential)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.806 -2.403 -0.548 1.594 66.070
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.8241 0.1561 18.09 <2e-16 ***
## x1 7.8708 0.1573 50.03 <2e-16 ***
## x2 4.7530 0.1589 29.92 <2e-16 ***
## x1:x2 3.1270 0.1587 19.70 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.937 on 996 degrees of freedom
## Multiple R-squared: 0.7993, Adjusted R-squared: 0.7987
## F-statistic: 1322 on 3 and 996 DF, p-value: < 2.2e-16
#model 2 tests what we simulated
model_exp <- lm(y ~ x1 + x2 + exp(x1*x2), data = exponential)
summary(model_exp)
##
## Call:
## lm(formula = y ~ x1 + x2 + exp(x1 * x2), data = exponential)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.3037 -1.8098 -0.3427 1.6274 29.8417
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.64514 0.11323 14.53 <2e-16 ***
## x1 7.44491 0.11121 66.95 <2e-16 ***
## x2 4.54927 0.11177 40.70 <2e-16 ***
## exp(x1 * x2) 0.56825 0.01336 42.54 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.468 on 996 degrees of freedom
## Multiple R-squared: 0.901, Adjusted R-squared: 0.9007
## F-statistic: 3021 on 3 and 996 DF, p-value: < 2.2e-16
#An AIC difference of 1500 is pretty compelling. The exponential model fits the data much better
AIC(model_linear, model_exp)
## df AIC
## model_linear 5 6037.378
## model_exp 5 5330.858
What if the interactive variables are correlated?
Remember: Interaction terms are not incorporated into models because you suspect that the two predictors are correlated. They are incorporated because you suspect that one variable modifies the other variables effect on y. Nevertheless, to satisfy curiosity, let’s generate a response variable that is predicted by correlated variables x1 and x2, and see how a model with and without the interaction compares.
Generate correlated variable.
#correlated predictors
x1 <- rnorm(1000, 0, 1)
x2 <- x1 + rnorm(1000, 0, 0.2) # here we correlate x2 to x1
cor(x1, x2) #Cool! We successfully correlated x2 to x1
## [1] 0.9804142
Generate a response variable that is predicted by x1
and
x2
, no interaction.
y <- 5*x1 + -2*x2 + rnorm(1000, 0, 2)
We can use VIF (variance inflation factor) to detect multicollinearity - which we expect in this case because we correlated the simulated variables. The value tells us by how much the variance of the coefficient estimate is inflated because of multicollinearity. 1 is the lowest value and means that the variance is the same whether the variable is in the model or not (absence of multicollinearity.). Higher values indicate variables are correlated. What constitutes as higher? Some say values > 5.
model <- lm(y ~ x1 + x2)
vif(model)
## x1 x2
## 25.78121 25.78121
summary(model)
##
## Call:
## lm(formula = y ~ x1 + x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.3850 -1.3645 0.0228 1.2694 6.7666
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.006565 0.063611 -0.103 0.918
## x1 5.351415 0.320067 16.720 < 2e-16 ***
## x2 -2.273416 0.314393 -7.231 9.53e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.011 on 997 degrees of freedom
## Multiple R-squared: 0.7102, Adjusted R-squared: 0.7096
## F-statistic: 1222 on 2 and 997 DF, p-value: < 2.2e-16
Model y with an interaction
y <- 5*x1 + -2*x2 + 2*x1*x2 + rnorm(1000, 0, 2)
model_int <- lm(y ~ x1 + x2 + x1*x2)
vif(model_int)
## there are higher-order terms (interactions) in this model
## consider setting type = 'predictor'; see ?vif
## x1 x2 x1:x2
## 25.784239 25.787843 1.000873
summary(model_int)
##
## Call:
## lm(formula = y ~ x1 + x2 + x1 * x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1678 -1.3044 -0.0509 1.2498 5.7364
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.007115 0.074566 -0.095 0.924
## x1 4.819437 0.308132 15.641 < 2e-16 ***
## x2 -1.788074 0.302691 -5.907 4.77e-09 ***
## x1:x2 1.994971 0.041897 47.616 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.936 on 996 degrees of freedom
## Multiple R-squared: 0.8315, Adjusted R-squared: 0.831
## F-statistic: 1638 on 3 and 996 DF, p-value: < 2.2e-16
Whether a model has an interaction term or not, multicollinearity is detected among the two variables.
First we will simulate data. Your job later will be to play around and modify the values of the coefficients, the equations for simulated responses and predictors to see how they change the interpretation.
we’ll first set coefficients. We are establishing the raw effect
sizes here (alpha
& beta
1 through 4).
Second, we’ll generate covariates and some error terms that will be used
to define (or “cause”) the various response variables (A
,
B
, e1
, e2
, e3
,
int_err
, int
, int_add
,
e
). You will notice that covariates A (two levels) and B
(three levels) will serve to generate a data set with a traditional
ANOVA design. We’ll then generate response and predictor variables
(temp
, precip
, interact
,
drought
, abundance1
, abundance2
,
mortal
, f
). The code chunk below generates all
sorts of variables so that later you can modify them to test all sorts
of questions.
set.seed(345)
simulation <- data.frame(
'alpha' = rep(1, 1000),
'beta1' = 2,
'beta2' = -1,
'beta3' = -2,
'beta4' = 1)|>
within({
## Factorial variables
A = rep(0:1, each = 500) # '0' 500 times, '1' 500 times
B = rep(c(0,1,0,1), each = 250) # '0'x250, '1'x250, '0'x250, '1'x250
## Error variables
e1 = rnorm(1000, 10, sd = 2)
e2 = rnorm(1000, 5, sd = 1)
e3 = rnorm(1000, 1, sd = 0.5)
## Interaction variables
int_err = A*B + e1
int = A*B #multiplicative interaction
int_add = (A + 1) + (B*3) #additive interaction
e = rnorm(1000, 10, sd = 2)
## Simulated Continuous PREDICTOR Variables
temp = A + e1 #This makes a continuous variable where each level of A will have a different overall mean
precip = B + e2 #This makes a continuous variable where each level of B, will have a different overall mean
interact = temp*precip #simulating an interaction variable
drought = temp + precip + e3 #This is also an additive interaction if added to one of the response vars
## Simulated RESPONSE Variables
#`abundance1` models abundance on the two categorical predictors and their interaction
#`abundance2` models abundance two observed continuous predictors `precip` and `temp` and their interaction
abundance1 = alpha + beta1*A + beta2*B + beta3*A*B + e
abundance2 = alpha + beta1*temp + beta2*precip + beta4*temp*precip + e
abundance1.z = (abundance1 - mean(abundance1)) / sd(abundance1) #standardized from`abundance1` above
abundance2.z = (abundance2 - mean(abundance2)) / sd(abundance2) #standardized from`abundance2` above
mortal = beta2*abundance2 + e # mortality is modeled on `abundance2`
mortal.z = (mortal - mean(mortal)) / sd(mortal) #standardized response variable of `mortal`
#this is intended to represent a correlated predictor variable. It's correlated with `abundance1`
f <- ifelse(A == 0, abundance1 + 2 , abundance1 - 2)
})
Exploring 2-Categorical Predictors
Let’s now apply our simulated data. We will start by modeling the unstandardized response variable. Can we estimate our original linear model parameters? We should be able to! No matter how much you mess with the beta coefficients we established, when the unstandardized response is used, the linear model with the interaction should yield coefficients that match them. Recall, we set the coefficients above to be:
beta1 = 2 beta2 = -1 *beta3 = -2
model1 <- lm(abundance1 ~ A + B + A*B , data = simulation) #modeling with the unstandardized variable `abundance1`
summary(model1)
##
## Call:
## lm(formula = abundance1 ~ A + B + A * B, data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.6068 -1.3093 0.0053 1.2336 6.3523
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.1887 0.1218 91.845 < 2e-16 ***
## A 1.7287 0.1723 10.034 < 2e-16 ***
## B -1.1881 0.1723 -6.896 9.46e-12 ***
## A:B -1.6173 0.2436 -6.638 5.22e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.926 on 996 degrees of freedom
## Multiple R-squared: 0.2707, Adjusted R-squared: 0.2685
## F-statistic: 123.3 on 3 and 996 DF, p-value: < 2.2e-16
model2 <- lm(abundance1.z ~ A + B + A*B , data = simulation) #modeling with the standardized variable `abundance1.z`
summary(model2)
##
## Call:
## lm(formula = abundance1.z ~ A + B + A * B, data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.48954 -0.58136 0.00235 0.54774 2.82058
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.05951 0.05409 1.100 0.271
## A 0.76758 0.07650 10.034 < 2e-16 ***
## B -0.52755 0.07650 -6.896 9.46e-12 ***
## A:B -0.71811 0.10818 -6.638 5.22e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8553 on 996 degrees of freedom
## Multiple R-squared: 0.2707, Adjusted R-squared: 0.2685
## F-statistic: 123.3 on 3 and 996 DF, p-value: < 2.2e-16
Yep, as expected:
Key takeaways here:
abundance1
, we should expect to get the predetermined
beta-coefficients (where beta1 = 2, beta2 = -1 and beta3 = -2) since
abundance1
was originally modeled on those coefficients
abundance1 = alpha + beta1*A + beta2*B + beta3*A*B + e
. By
standardizing the response variable to abundance1.z
,
coefficients are now represented in terms of standard deviations from
the mean and we would expect the coefficients to change. More
specifically, they should be smaller values than coefficients modeling
the unstandardized response since units of standard deviations are much
smaller. The proportion of each effect size relative to the other in
both cases is the same (e.g. in model1
A = 1.7287 and B =
-1.1881 such that the ratio A:B (1.7287 / 1.1881 = 1.455012) is
analogous to the ratio of A:B in model2
(0.76758 / 0.52755
= 1.455))Explore Two Continuous Predictors
model <- lm(abundance2 ~ precip + precip*temp, data = simulation)
summary(model)
##
## Call:
## lm(formula = abundance2 ~ precip + precip * temp, data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.5528 -1.2886 0.0206 1.2534 6.4130
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.10114 1.61194 8.128 1.29e-15 ***
## precip -1.35980 0.28865 -4.711 2.82e-06 ***
## temp 1.80881 0.15157 11.934 < 2e-16 ***
## precip:temp 1.03355 0.02709 38.148 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.927 on 996 degrees of freedom
## Multiple R-squared: 0.9901, Adjusted R-squared: 0.9901
## F-statistic: 3.327e+04 on 3 and 996 DF, p-value: < 2.2e-16
ggPredict(model, se = TRUE, interactive = TRUE)
Check for Understanding
abundance2.z
? Do the coefficients remain the same?abundance2
, develop models and compare
outputs for model fits and the coefficients. Substitute the
abundance2
variable in the code above with these options
(non-multiplicative interactions and no main effects) and rerun
code:abundance2 = alpha + beta1*temp + beta2*precip + beta3*log(temp + 1)*precip + e
abundance2 = alpha + beta1*temp + beta2*temp + beta3*(log(temp + 1) + precip) + e
abundance2 = alpha + beta1*(temp + precip) + e
abundance2 = alpha + beta1*(temp*precip) + e
For example, let’s say I reran
abun = alpha + beta1*temp + beta2*precip + beta3*log(temp + 1)*precip + e
.
I would then compare summary(model1)
and
summary(model2)
for a model that:
model1 <- lm(abun ~ temp + precip + log(temp + 1)*precip , data = simulation)
model2 <- lm(abun ~ temp + precip + temp*precip , data = simulation)
model1 <- aov(abundance1.z ~ A + B + int, data = simulation) # Fit an ANOVA with interaction (int)
summary(model1)
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 41.7 41.72 57.04 9.64e-14 ***
## B 1 196.5 196.52 268.66 < 2e-16 ***
## int 1 32.2 32.23 44.06 5.22e-11 ***
## Residuals 996 728.5 0.73
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model2 <-aov(abundance1.z ~ A*B, data = simulation) #Fit an ANOVA with interaction (A*B)
summary(model2)
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 41.7 41.72 57.04 9.64e-14 ***
## B 1 196.5 196.52 268.66 < 2e-16 ***
## A:B 1 32.2 32.23 44.06 5.22e-11 ***
## Residuals 996 728.5 0.73
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model2
? d) Does including an interaction with a correlated
variable reduce its effect on the response?model1 <- aov(abundance1.z ~ A + B + f, data = simulation)#Fit an ANOVA with 3 variables, no interaction
summary(model1)
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 41.7 41.7 3.183e+30 <2e-16 ***
## B 1 196.5 196.5 1.499e+31 <2e-16 ***
## f 1 760.8 760.8 5.804e+31 <2e-16 ***
## Residuals 996 0.0 0.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model2 <- aov(abundance1.z ~ A + B + f + A*B + A*f, data = simulation)#Fit an ANOVA with 3 variables, and some interactions
summary(model2)
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 41.7 41.7 3.184e+30 <2e-16 ***
## B 1 196.5 196.5 1.500e+31 <2e-16 ***
## f 1 760.8 760.8 5.806e+31 <2e-16 ***
## A:B 1 0.0 0.0 2.321e+00 0.128
## A:f 1 0.0 0.0 1.220e-01 0.727
## Residuals 994 0.0 0.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model1 <- aov(abundance1.z ~ A + B + f + A*f, data = simulation)#Fit an ANOVA with 3 variables, and some interactions
summary(model1)
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 41.7 41.7 3.180e+30 <2e-16 ***
## B 1 196.5 196.5 1.498e+31 <2e-16 ***
## f 1 760.8 760.8 5.799e+31 <2e-16 ***
## A:f 1 0.0 0.0 1.580e-01 0.691
## Residuals 995 0.0 0.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model1 <- lm(abundance2.z ~ temp + precip + int, data = simulation)
summary(model1)
##
## Call:
## lm(formula = abundance2.z ~ temp + precip + int, data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.68610 -0.09342 -0.00247 0.09089 0.66298
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.732234 0.035585 -189.2 <2e-16 ***
## temp 0.385957 0.002360 163.6 <2e-16 ***
## precip 0.485373 0.004687 103.5 <2e-16 ***
## int 0.022693 0.011942 1.9 0.0577 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1559 on 996 degrees of freedom
## Multiple R-squared: 0.9758, Adjusted R-squared: 0.9757
## F-statistic: 1.337e+04 on 3 and 996 DF, p-value: < 2.2e-16
model2 <- lm(abundance2 ~ temp + precip + int, data = simulation)
summary(model2)
##
## Call:
## lm(formula = abundance2 ~ temp + precip + int, data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.2836 -1.8087 -0.0479 1.7597 12.8361
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -45.88649 0.68896 -66.6 <2e-16 ***
## temp 7.47255 0.04569 163.6 <2e-16 ***
## precip 9.39735 0.09075 103.5 <2e-16 ***
## int 0.43935 0.23121 1.9 0.0577 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.018 on 996 degrees of freedom
## Multiple R-squared: 0.9758, Adjusted R-squared: 0.9757
## F-statistic: 1.337e+04 on 3 and 996 DF, p-value: < 2.2e-16
cyl
:
lm(mpg ~ disp*cyl)
.model.disp_cyl <- lm(mpg ~ disp*as.factor(cyl), data = autompg)
coef(model.disp_cyl)
## (Intercept) disp as.factor(cyl)6 as.factor(cyl)8 disp:as.factor(cyl)6
## 43.59051971 -0.13069347 -13.20026081 -20.85706424 0.08298924
## disp:as.factor(cyl)8
## 0.10817135
cs <- coef(model.disp_cyl)
ggplot(autompg, aes(disp, mpg, color = as.factor(cyl)))+
geom_point()+
geom_abline(intercept = cs[1], slope = cs[2], color = "#F8766D") +
geom_abline(intercept = cs[1] + cs[3], slope = cs[2] + cs[5], color = "#00BA38") +
geom_abline(intercept = cs[1] + cs[4], slope = cs[2] + cs[6], color = "#619CFF")
Interpretation: A 4-cylinder car with zero displacement has an average mpg of 43.59 mpg. In a 4-cylinder car, the change in mpg for 1 unit engine displacement is -0.13.
Effect of displacement on mpg in 6 cylinInterpretation: A 6-cylinder car with zero displacement has an average mpg of 30.39 mpg. In a 6-cylinder car, the change in mpg for 1 unit engine displacement is -0.05.
Effect of displacement on mpg in 8 cylinInterpretation: An 8-cylinder car with zero displacement has an average mpg of 22.74 mpg. In an 8-cylinder car, the change in mpg for 1 unit engine displacement is -0.02.
Below I explored 3 examples/possibilities to draw attention to some important points.
Example 1: Excluding an interaction term
# original model
model <- lm(abundance2 ~ precip + temp + precip*temp, data = simulation)
summary(model)
##
## Call:
## lm(formula = abundance2 ~ precip + temp + precip * temp, data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.5528 -1.2886 0.0206 1.2534 6.4130
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.10114 1.61194 8.128 1.29e-15 ***
## precip -1.35980 0.28865 -4.711 2.82e-06 ***
## temp 1.80881 0.15157 11.934 < 2e-16 ***
## precip:temp 1.03355 0.02709 38.148 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.927 on 996 degrees of freedom
## Multiple R-squared: 0.9901, Adjusted R-squared: 0.9901
## F-statistic: 3.327e+04 on 3 and 996 DF, p-value: < 2.2e-16
# modified model (interaction removed)
model <- lm(abundance2 ~ precip + temp, data = simulation)
summary(model)
##
## Call:
## lm(formula = abundance2 ~ precip + temp, data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.3096 -1.8105 0.0029 1.8156 12.9973
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -46.16159 0.67446 -68.44 <2e-16 ***
## precip 9.44342 0.08757 107.84 <2e-16 ***
## temp 7.48504 0.04527 165.34 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.022 on 997 degrees of freedom
## Multiple R-squared: 0.9757, Adjusted R-squared: 0.9756
## F-statistic: 2e+04 on 2 and 997 DF, p-value: < 2.2e-16
Let’s take
lm(abundance2 ~ precip + temp + precip*temp, data = simulation)
to be truth. Since we simulated abundance2
this way. Let’s
see how deviations from the “truth” impact how we would understand the
relationships:
Observation: In the model without the interaction
term, the coefficient estimate for precip
changed in
magnitude and direction.
Discussion: Main effects will change in their value
and sometimes direction when an interaction term is removed. Interaction
terms change how we interpret main effects. Interaction terms can be
thought of as modifier terms, such that when they are included in the
model, they modify how we interpret the main effect. The interpretation
becomes a CONDITIONAL statement - main effects become CONDITIONAL
effects: the effect of a predictor variable becomes conditional on
the value of the other predictor variable. (e.g. the effect of x1
on y depends on x2). Let’s first focus on directionality. In the model
with the interaction term, the precip
coefficient
(-1.35980) makes sense- given that when temp
= 0 the effect
of precip
on abundance2
was negative (refer to
the dark blue line on graph generated from Ln 639). In the model without
the interaction, we no longer will need to interpret precip
as conditional on temp
. Now, taken on its own,
precip
has a positive relationship with
abundance2
(9.44342). That makes sense too! Consider the
overall trend between precip
and abundance2
in
the graph below:
model <- lm(abundance2 ~ precip + temp, data = simulation)
ggPredict(model, se = TRUE, interactive = TRUE)
Observation: In the model without the interaction, the standard errors decreased (certainty around the estimate changed).
Discussion: Let’s consider precip
. In
the model with the interaction, the standard error and its p-value was
(se = 0.28865, p-value = 2.82e-06) which represents a decrease in
certainty in the parameter estimate for precip
when
compared to the model without the interaction term (se = 0.08757,
p-value = <2e-16). This is a result of the correlation between
precip
and the interaction term precip*temp
-
there is multicollinearity. The more correlated variables independent
are, the more difficult it is to estimate how much variation in the
response each predictor is responsible for. For example, if the two
predictors are highly correlated, it is difficult to ascribe that
variation to one over the other. So, as a result standard errors
increase for variables when they are in a model with variables they may
be correlated with. We also see this reflected in the p-value for
precip
. While still “significant” in both cases, the
significance decreased in the model with the interaction term. In
summary, mulitcolinearity reduces power of the significance test.
Observation: There are fewer degrees of freedom (DF) in the model without the interaction term.
Discussion: There are 3 degrees of freedom in the
model with the interaction term and 2 degrees of freedom in the model
without the interaction term). Degrees of freedom here refer
specifically to the number of parameters in the model to be estimated.
So Df = 3 refers to the precip
, temp
and
precip*temp
terms and df = 2 refers to the
precip
and temp
term. Therefore, it should also
make sense why the F-statistic also changed between the two models since
part of the calculation for the F statistic is the DF.
Observation: In the model without the interaction term, the y-intercept was (-46.16159) compared to (13.10114).
Discussion:The y-intercept is calculated as the
point on the axis where the predictors are zero. In this case we are
looking for the intersection of the y-axis where precip
and
temp
= 0. While it is not possible to get an abundance of
-46.16, therefore the y-intercept is not meaningful here, -46.16 is the
point in the y-axis when precip
and temp
= 0.
Refer to the dark blue line (line that represents temp
= 0)
and where it intersects the y-axis when precip
is also
zero.
Example 2: The inclusion of an uncorrelated fixed effect
# original model
model <- lm(abundance2 ~ precip, data = simulation)
summary(model)
##
## Call:
## lm(formula = abundance2 ~ precip, data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -53.938 -10.859 0.037 11.157 56.209
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.1847 2.6195 11.52 <2e-16 ***
## precip 9.8516 0.4664 21.12 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.1 on 998 degrees of freedom
## Multiple R-squared: 0.3089, Adjusted R-squared: 0.3082
## F-statistic: 446.1 on 1 and 998 DF, p-value: < 2.2e-16
# modified model (uncorrelated fixed effect is included)
model <- lm(abundance2 ~ precip + temp, data = simulation)
summary(model)
##
## Call:
## lm(formula = abundance2 ~ precip + temp, data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.3096 -1.8105 0.0029 1.8156 12.9973
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -46.16159 0.67446 -68.44 <2e-16 ***
## precip 9.44342 0.08757 107.84 <2e-16 ***
## temp 7.48504 0.04527 165.34 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.022 on 997 degrees of freedom
## Multiple R-squared: 0.9757, Adjusted R-squared: 0.9756
## F-statistic: 2e+04 on 2 and 997 DF, p-value: < 2.2e-16
Observation: The inclusion of an UNCORRELATED
predictor variable (e.g. temp
) increased the certainty
around the parameter estimate for precip
.
Discussion: We know from our specification of
abundance2
that temp
explains a lot of the
variation in abundance2
. We also know temp
is
NOT correlated to precip
because they were simulated
differently. Thus, we can see that the inclusion of another fixed
effect, temp
, reduced the standard error of the
precip
variable-our estimate of it became more certain as a
lot of the unaccounted variation became accounted for my
temp
. Had these variables been correlated (multicollinear),
the magnitude of the effect would be largely different and in some cases
can flip from + to - and vice versa. Secondly, holding all else
constant, had abundance2
been simulated with another
variable that wasn’t temp
, the inclusion of
temp
in a model would have not accounted for residual
variation in abundance2
and therefore would not have
changed the coefficient for precip
in the same way.
Observation: The range of values for residuals are
much greater in the model precip
as the only predictor and
the range is smaller for the model with temp
and
precip
.
Discussion: We would expect more residual variation
in the precip
only model compared to the model that
includes temp
because more variation in
abundance2
is accounted for by the addition of the other
variable that we designed abundance2
to be dependant
on.
Observation: DF changed
Discussion: same explanation as in previous example
Example 3: include an interaction but exclude a main effect
#original model
model <- lm(abundance2 ~ precip + temp + precip:temp, data = simulation)
summary(model)
##
## Call:
## lm(formula = abundance2 ~ precip + temp + precip:temp, data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.5528 -1.2886 0.0206 1.2534 6.4130
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.10114 1.61194 8.128 1.29e-15 ***
## precip -1.35980 0.28865 -4.711 2.82e-06 ***
## temp 1.80881 0.15157 11.934 < 2e-16 ***
## precip:temp 1.03355 0.02709 38.148 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.927 on 996 degrees of freedom
## Multiple R-squared: 0.9901, Adjusted R-squared: 0.9901
## F-statistic: 3.327e+04 on 3 and 996 DF, p-value: < 2.2e-16
# modified model (main effect in the interaction removed)
model <- lm(abundance2 ~ precip + precip:temp, data = simulation)
summary(model)
##
## Call:
## lm(formula = abundance2 ~ precip + precip:temp, data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.582 -1.374 -0.001 1.403 7.208
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 31.970242 0.335108 95.40 <2e-16 ***
## precip -4.673908 0.084109 -55.57 <2e-16 ***
## precip:temp 1.350961 0.005515 244.98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.06 on 997 degrees of freedom
## Multiple R-squared: 0.9887, Adjusted R-squared: 0.9887
## F-statistic: 4.364e+04 on 2 and 997 DF, p-value: < 2.2e-16
Observation: The intercept increased from 13.10114 in the model with the interaction, to 31.970242 in the model without the interaction.
Discussion: That is because 31.970242 marks the
intersection of the y-axis when precip
= 0. Since that
model does not include temp
the interpretation of the
intercept is only with regard to the precip
. In the other
model, the y-intercept represents the intersection where both
precip
and temp
= 0. Compare the intercepts.
You will also notice that this also causes the intercept to be the same
for all levels of temp
. All the lines will intersect the
abundance2
at 31.970242. If there is reason to believe that
the average abundance2
at precip
= 0 should be
the same for all levels, then it would make sense to model this way. For
example, in some cases where the effect of treatment variable is tested
(e.g. multiple dosages) for a treatment and control group, it may be
reasonable to assume that baseline of the response is properly
represented as the average response of all samples regardless of
treatment at dosage = 0.
model <- lm(abundance2 ~ precip + precip:temp, data = simulation)
ggPredict(model)
Observation: Standard errors decreased in the model
without temp
.
Discussion: temp
is correlated with the
interaction term precip*temp
so by removing the term we
become more confident about the variation in abundance2
attributed to the other terms.
Observation: DF changed
Discussion: same explanation as in previous examples
2b) Now try using the standardized response variable
abundance2.z
? Do the coefficients remain the same?
Below I repeat what I did above but replace abundance2
with abundance2.z
. I won’t go into detail with each because
the patterns are the same. coefficients are now understood as standard
deviations from the mean since abundance2.z
is
abundance2
standardized.So we would expect the
coefficients, standard errors, ranges of the residuals, etc. to be much
smaller in magnitude, but the relative contribution of each term to
explaining variance in abundance2
remains the same.
Example 1: Excluding an interaction term
model <- lm(abundance2.z ~ precip + temp + precip*temp, data = simulation)
summary(model)
##
## Call:
## lm(formula = abundance2.z ~ precip + temp + precip * temp, data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.28680 -0.06655 0.00106 0.06474 0.33123
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.685526 0.083257 -44.267 < 2e-16 ***
## precip -0.070234 0.014909 -4.711 2.82e-06 ***
## temp 0.093425 0.007829 11.934 < 2e-16 ***
## precip:temp 0.053383 0.001399 38.148 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09955 on 996 degrees of freedom
## Multiple R-squared: 0.9901, Adjusted R-squared: 0.9901
## F-statistic: 3.327e+04 on 3 and 996 DF, p-value: < 2.2e-16
# modified model (interaction removed)
model <- lm(abundance2.z ~ precip + temp, data = simulation)
summary(model)
##
## Call:
## lm(formula = abundance2.z ~ precip + temp, data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.68744 -0.09351 0.00015 0.09377 0.67131
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.746442 0.034836 -193.7 <2e-16 ***
## precip 0.487752 0.004523 107.8 <2e-16 ***
## temp 0.386602 0.002338 165.3 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1561 on 997 degrees of freedom
## Multiple R-squared: 0.9757, Adjusted R-squared: 0.9756
## F-statistic: 2e+04 on 2 and 997 DF, p-value: < 2.2e-16
Example 2: Including an uncorrelated main effect
model <- lm(abundance2.z ~ precip, data = simulation)
summary(model)
##
## Call:
## lm(formula = abundance2.z ~ precip, data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.7859 -0.5609 0.0019 0.5762 2.9032
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.80316 0.13530 -20.72 <2e-16 ***
## precip 0.50884 0.02409 21.12 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8317 on 998 degrees of freedom
## Multiple R-squared: 0.3089, Adjusted R-squared: 0.3082
## F-statistic: 446.1 on 1 and 998 DF, p-value: < 2.2e-16
# modified model (uncorrelated fixed effect is included)
model <- lm(abundance2.z ~ precip + temp, data = simulation)
summary(model)
##
## Call:
## lm(formula = abundance2.z ~ precip + temp, data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.68744 -0.09351 0.00015 0.09377 0.67131
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.746442 0.034836 -193.7 <2e-16 ***
## precip 0.487752 0.004523 107.8 <2e-16 ***
## temp 0.386602 0.002338 165.3 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1561 on 997 degrees of freedom
## Multiple R-squared: 0.9757, Adjusted R-squared: 0.9756
## F-statistic: 2e+04 on 2 and 997 DF, p-value: < 2.2e-16
Example 3: Excluding a variable as a main effect but keeping it in the interaction term
#original model
model <- lm(abundance2.z ~ precip + temp + precip:temp, data = simulation)
summary(model)
##
## Call:
## lm(formula = abundance2.z ~ precip + temp + precip:temp, data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.28680 -0.06655 0.00106 0.06474 0.33123
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.685526 0.083257 -44.267 < 2e-16 ***
## precip -0.070234 0.014909 -4.711 2.82e-06 ***
## temp 0.093425 0.007829 11.934 < 2e-16 ***
## precip:temp 0.053383 0.001399 38.148 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09955 on 996 degrees of freedom
## Multiple R-squared: 0.9901, Adjusted R-squared: 0.9901
## F-statistic: 3.327e+04 on 3 and 996 DF, p-value: < 2.2e-16
# modified model (main effect in the interaction removed)
model <- lm(abundance2.z ~ precip + precip:temp, data = simulation)
summary(model)
##
## Call:
## lm(formula = abundance2.z ~ precip + precip:temp, data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.28832 -0.07099 -0.00005 0.07246 0.37230
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.7109373 0.0173083 -156.63 <2e-16 ***
## precip -0.2414071 0.0043442 -55.57 <2e-16 ***
## precip:temp 0.0697771 0.0002848 244.98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1064 on 997 degrees of freedom
## Multiple R-squared: 0.9887, Adjusted R-squared: 0.9887
## F-statistic: 4.364e+04 on 2 and 997 DF, p-value: < 2.2e-16
abundance2
, develop models and compare
outputs for model fits and the coefficients. Substitute the
abundance2
variable in the code above with these options
(non-multiplicative interactions and no main effects) and rerun
code:Example 1: A model with non-linear interaction term
# modify how we model `abundance2`. Recall that `abundance2` was originally simulated as: `abundance2 = alpha + beta1*temp + beta2*precip + beta4*temp*precip + e`
simulation <- transform(simulation, abundance2.new = alpha + beta1*temp + beta2*precip + beta3*log(temp + 1)*precip + e)
# Let's first run the model that we know properly reflects how `abundance2` was simulated
model <- lm(abundance2.new ~ precip + temp + precip:log(temp + 1), data = simulation)
summary(model)
##
## Call:
## lm(formula = abundance2.new ~ precip + temp + precip:log(temp +
## 1), data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.5863 -1.2880 0.0078 1.2405 6.3998
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.9179 1.3458 9.598 < 2e-16 ***
## precip -1.8215 0.5978 -3.047 0.00237 **
## temp 1.8267 0.1253 14.582 < 2e-16 ***
## precip:log(temp + 1) -1.6646 0.2457 -6.775 2.13e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.927 on 996 degrees of freedom
## Multiple R-squared: 0.9236, Adjusted R-squared: 0.9234
## F-statistic: 4014 on 3 and 996 DF, p-value: < 2.2e-16
# Now let's run a model with all relevant terms, but leave `temp` untransformed in the interaction term**
model <- lm(abundance2 ~ precip + temp + precip:temp, data = simulation)
summary(model)
##
## Call:
## lm(formula = abundance2 ~ precip + temp + precip:temp, data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.5528 -1.2886 0.0206 1.2534 6.4130
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.10114 1.61194 8.128 1.29e-15 ***
## precip -1.35980 0.28865 -4.711 2.82e-06 ***
## temp 1.80881 0.15157 11.934 < 2e-16 ***
## precip:temp 1.03355 0.02709 38.148 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.927 on 996 degrees of freedom
## Multiple R-squared: 0.9901, Adjusted R-squared: 0.9901
## F-statistic: 3.327e+04 on 3 and 996 DF, p-value: < 2.2e-16
Observation: Despite abundance2
being
simulated from linear main effects and non-linear interactive effects,
we we see that both interaction effects -linear and non-linear come out
significant in both models.
Discussion: This alludes to the idea that we may include and interaction term but fail to properly characterize it.
model <- aov(abundance1.z ~ A + B + int, data = simulation) # Fit an ANOVA with interaction (int)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 41.7 41.72 57.04 9.64e-14 ***
## B 1 196.5 196.52 268.66 < 2e-16 ***
## int 1 32.2 32.23 44.06 5.22e-11 ***
## Residuals 996 728.5 0.73
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model <-aov(abundance1.z ~ A*B, data = simulation) #Fit an ANOVA with interaction (A*B)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 41.7 41.72 57.04 9.64e-14 ***
## B 1 196.5 196.52 268.66 < 2e-16 ***
## A:B 1 32.2 32.23 44.06 5.22e-11 ***
## Residuals 996 728.5 0.73
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Recall that int
is the multiplicative interaction
between A
and B
- in other words,
int
= A*B
. Also, recall that A*B
is a shorthand way to denote a model with two main effects and the
interaction term -
aov(abundance1.z ~ A*B, data = simulation)
is analogous to
aov(abundance1.z ~ A + B + A*B, data = simulation)
model2
? d) Does
including an interaction with a correlated variable reduce its effect on
the response?model <- aov(abundance1.z ~ A + B + A*B, data = simulation)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 41.7 41.72 57.04 9.64e-14 ***
## B 1 196.5 196.52 268.66 < 2e-16 ***
## A:B 1 32.2 32.23 44.06 5.22e-11 ***
## Residuals 996 728.5 0.73
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model <- aov(abundance1.z ~ A + B + f, data = simulation)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 41.7 41.7 3.183e+30 <2e-16 ***
## B 1 196.5 196.5 1.499e+31 <2e-16 ***
## f 1 760.8 760.8 5.804e+31 <2e-16 ***
## Residuals 996 0.0 0.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model <- aov(abundance1.z ~ A + B + A*B + A*f, data = simulation)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 41.7 41.7 3.184e+30 <2e-16 ***
## B 1 196.5 196.5 1.500e+31 <2e-16 ***
## f 1 760.8 760.8 5.806e+31 <2e-16 ***
## A:B 1 0.0 0.0 2.321e+00 0.128
## A:f 1 0.0 0.0 1.220e-01 0.727
## Residuals 994 0.0 0.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model <- aov(abundance1.z ~ A + B + A*f, data = simulation)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 41.7 41.7 3.180e+30 <2e-16 ***
## B 1 196.5 196.5 1.498e+31 <2e-16 ***
## f 1 760.8 760.8 5.799e+31 <2e-16 ***
## A:f 1 0.0 0.0 1.580e-01 0.691
## Residuals 995 0.0 0.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Because of the specifications for f
and
abundance1.z
, all the variation in
abundance1.z
is accounted for by A
and
f
. The model fit or R^2 = 1.Try
summary(lm(abundance1.z ~ A + f, data = simulation))
and
summary(lm(abundance1.z ~ A + f + A*B, data = simulation))
.
model1 <- lm(abundance2.z ~ temp + precip + int, data = simulation)
summary(model1)
##
## Call:
## lm(formula = abundance2.z ~ temp + precip + int, data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.68610 -0.09342 -0.00247 0.09089 0.66298
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.732234 0.035585 -189.2 <2e-16 ***
## temp 0.385957 0.002360 163.6 <2e-16 ***
## precip 0.485373 0.004687 103.5 <2e-16 ***
## int 0.022693 0.011942 1.9 0.0577 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1559 on 996 degrees of freedom
## Multiple R-squared: 0.9758, Adjusted R-squared: 0.9757
## F-statistic: 1.337e+04 on 3 and 996 DF, p-value: < 2.2e-16
model2 <- lm(abundance2 ~ temp + precip + int, data = simulation)
summary(model2)
##
## Call:
## lm(formula = abundance2 ~ temp + precip + int, data = simulation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.2836 -1.8087 -0.0479 1.7597 12.8361
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -45.88649 0.68896 -66.6 <2e-16 ***
## temp 7.47255 0.04569 163.6 <2e-16 ***
## precip 9.39735 0.09075 103.5 <2e-16 ***
## int 0.43935 0.23121 1.9 0.0577 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.018 on 996 degrees of freedom
## Multiple R-squared: 0.9758, Adjusted R-squared: 0.9757
## F-statistic: 1.337e+04 on 3 and 996 DF, p-value: < 2.2e-16
Notice that the significance of the the coefficients does not change between models with the standardized and unstandardized response variable.
Interpretation of Outputs:
lm(abundance2.z ~ temp + precip + int, data = simulation)
precip
= 0). When precip
increases to any
value other than zero, then the influence of temp
on
abundance2
increases by 0.439 for every level of
precip
.temp
= 0). When
temp
increases to any value other than zero, then the
influence of precip
on abundance2
increases by
0.439 for every level of temp
.lm(abundance2 ~ temp + precip + int, data = simulation)
precip
= 0). When precip
increases to any
value other than zero, then the influence of temp
on
abundance2
increases by 0.439 standard deviations for every
level of precip
.temp
=
0). When temp
increases to any value other than zero, then
the influence of precip
on abundance2
increases by 0.439 for every level of temp
.