Chapter 5 Moderation and mediation

5.1 Moderation in linear models

Including an interaction in a linear model in R is straightforward. If you have two predictors, x1 and x2, and want to include both the “simple slopes” as well as the slope for the “product predictor” (i.e. x1 \(\times\) x2), then the model with y as dependent variable can be specified in formula form as

y ~ x1 * x2

which evaluates to

y ~ 1 + x1 + x2 + x1:x2

As discussed previously, 1 represents the intercept, which is automatically included in a model specification, unless you remove it explicitly by adding -1 to the formula. x1 represents the “simple effect” of x1, x2 the corresponding “simple effect” of x2, whilst x1:x2 represents the product-predictor for the interaction between x1 and x2 (perhaps a little confusing for those of you who know : as a division operator). Because you would generally want to include the simple effects for the predictors as well as the interaction, the authors of R have chosen to save you typing in the full model by expanding x1 * x2 in this way. This can be used to specify more complicated models, with three-way interactions. For instance,

y ~ x1 * x2 * x3

evaluates to

y ~ 1 + x1 + x2 + x3 + x1:x2 + x1:x3 + x2:x3 + x1:x2:x3

which is a model with all “simple effects”, all pairwise product-predictors, as well as a three-way product predictor which is x1 \(\times\) x2 \(\times\) x3. We won’t discuss such higher-order interactions until later, but it is good to be aware of this in advance.

As a linear model with a product-predictors is just another linear model, that is really all there is to say about specifying linear models with interactions/moderation in R. Through the formula interface, R will create the necessary additional product predictor(s), and then estimate the parameters of the resulting linear model.

Let’s have a look at how this works with the speeddate data which was also analysed in the SDAM book. The data is included in the sdamr package, and can be loaded and (partially) inspected as usual:

library(sdamr)
data("speeddate")
head(speeddate)
##   iid pid gender age date_like other_like date_want other_want match self_attr
## 1 132 137 female  27         8          7         0          1     0         8
## 2 132 138 female  27         8          8         0          1     0         8
## 3 132 139 female  27         5          8         0          1     0         8
## 4 132 140 female  27         9          7         1          1     1         8
## 5 132 141 female  27         7          7         0          1     0         8
## 6 133 137 female  24         7          7         0          0     0         6
##   self_sinc self_intel self_fun self_amb other_attr other_sinc other_intel
## 1        10          9       10       10          8          8           9
## 2        10          9       10       10          8          7           4
## 3        10          9       10       10          8         NA           8
## 4        10          9       10       10          7          7           7
## 5        10          9       10       10          8          7           7
## 6         8          8        7        7          6         10          10
##   other_fun other_amb other_shar date_attr date_sinc date_intel date_fun
## 1         8         9          7         7         9          7        9
## 2         6         6          4         8         8          8        8
## 3         8         7         NA         5         7          9        5
## 4         7         7          7         8         9          9        9
## 5         7         7          8         5         8          8        8
## 6         6         7          5         7         7          7        8
##   date_amb date_shar self_imp_attr self_imp_sinc self_imp_intel self_imp_fun
## 1        6         9         16.67         16.67          16.67        16.67
## 2        8         8         16.67         16.67          16.67        16.67
## 3        9         5         16.67         16.67          16.67        16.67
## 4        9         8         16.67         16.67          16.67        16.67
## 5        7         7         16.67         16.67          16.67        16.67
## 6        6         8         12.77         19.15          17.02        17.02
##   self_imp_amb self_imp_shar other_imp_attr other_imp_sinc other_imp_intel
## 1        16.67         16.67          17.39          17.39           15.22
## 2        16.67         16.67          20.00          20.00           20.00
## 3        16.67         16.67          18.75          16.67           18.75
## 4        16.67         16.67          18.60          16.28           18.60
## 5        16.67         16.67          20.83          20.83           16.67
## 6        14.89         19.15          17.39          17.39           15.22
##   other_imp_fun other_imp_amb other_imp_shar
## 1         17.39         13.04          19.57
## 2         20.00          6.67          13.33
## 3         20.83         12.50          12.50
## 4         18.60         11.63          16.28
## 5         16.67          6.25          18.75
## 6         17.39         13.04          19.57

There are rather a large number of variables in the dataset. You can obtain more information about each variable in the documentation of the dataset, by calling ?speeddate. In my humble opinion, the (generally quite) good documentation of R packages is a real benefit of R over some other systems, and I strongly recommend you to check out and read the documentation of functions and datasets before you use them. Functions in R are generally quite flexible and it is infeasible to discuss all the nuances and possibilities in introductory notes like these. In the book, we mainly focused on the variables starting with other_, which are the perceptions of the participant by their dating partner. For example, the model \[\begin{align} \texttt{like}_i =& \beta_0 + \beta_{\texttt{attr}} \times \texttt{attr}_i + \beta_{\texttt{intel}} \times \texttt{intel}_i + \beta_{\texttt{fun}} \times \texttt{fun}_i \\ &+ \beta_{\texttt{attr} \times \texttt{intel}} \times (\texttt{attr} \times \texttt{intel})_i + \beta_{\texttt{fun} \times \texttt{intel}} \times (\texttt{fun} \times \texttt{intel})_i + \epsilon_i \end{align}\] referred to in the SDAM book can be estimated by calling:

modg <- lm(other_like ~ other_attr*other_intel + other_fun*other_intel, data=speeddate)

Hypothesis tests with the \(t\) statistic are obtained as usual though

summary(modg)
## 
## Call:
## lm(formula = other_like ~ other_attr * other_intel + other_fun * 
##     other_intel, data = speeddate)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.2380 -0.6632  0.0239  0.6583  4.6484 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            -0.91118    0.44269  -2.058 0.039740 *  
## other_attr              0.67123    0.09027   7.436 1.76e-13 ***
## other_intel             0.32393    0.05994   5.405 7.57e-08 ***
## other_fun               0.14331    0.08734   1.641 0.101045    
## other_attr:other_intel -0.04333    0.01171  -3.700 0.000224 ***
## other_intel:other_fun   0.03186    0.01139   2.798 0.005209 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.133 on 1472 degrees of freedom
##   (84 observations deleted due to missingness)
## Multiple R-squared:  0.6227, Adjusted R-squared:  0.6214 
## F-statistic: 485.8 on 5 and 1472 DF,  p-value: < 2.2e-16

The equivalent tests with the \(F\) statistic are easily obtained through

car::Anova(modg,type=3)
## Anova Table (Type III tests)
## 
## Response: other_like
##                         Sum Sq   Df F value    Pr(>F)    
## (Intercept)               5.43    1  4.2365 0.0397398 *  
## other_attr               70.93    1 55.2931 1.755e-13 ***
## other_intel              37.47    1 29.2094 7.568e-08 ***
## other_fun                 3.45    1  2.6923 0.1010448    
## other_attr:other_intel   17.56    1 13.6898 0.0002236 ***
## other_intel:other_fun    10.04    1  7.8287 0.0052093 ** 
## Residuals              1888.24 1472                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

5.1.1 A reminder about namespaces

In the code above, I’m using car:: to refer to a function in the car package. Technically, a statement like package_name:: denotes the namespace of the package with the name package_name (i.e., car in this case). This allows you to use functions from a package without loading the package completely (without loading all the functions of the package in memory). This can be (and often is!) better than first loading the package through e.g. library(car) and then calling Anova. The issue is that different packages can use the same name for a function, and when you call a function, it will be the one of the package that was last loaded. When packages use the same name for functions, the function with the same name from a package that was loaded earlier will be “masked” and R will print this as a warning in the R console. For example, if you load the dplyr package (a rather useful package for data manipulation, that is a little too much to discuss here in detail), you will see the following warnings:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

The line The following objects are masked from 'package:stats' indicates that the functions filter and lag are “masked from the stats package. Whenever you call these functions, they will be the corresponding functions from the dplyr package, and not those from the stats package. This does not break the functionality of the packages themselves, as a properly written package that needs the filter function from the stats package will still use the function from that namespace (package), and not the one from the dplyr one. But when you call the a function, R will try find the definition of that function in the global namespace, and the global namespace contains the version of the last provided definition of any R object. Just like you can overwrite an R object by giving it the same name as an already existing r object (as I often do deliberately through e.g. mod <- lm()), a function in R is just another object. So if I specify a function like

head <- function(x, na.rm = FALSE) {
  return("Where's your head? It's almost Halloween!")
}

Then next time I call that function, I will get as a result

head(speeddate)
## [1] "Where's your head? It's almost Halloween!"

and not the result from

utils::head(speeddate)

5.2 Centering

As discussed in the SDAM book, sometimes you might want to center variables by subtracting their (sample) mean from each value. This can be done in a number of ways. You can either create new variables in a dataframe by subtracting the single value obtained through mean() from a vector, as in

speeddate$other_like_c <- speeddate$other_like - mean(speeddate$other_like, na.rm = TRUE)

and then using the new variable other_like_c in your lm model. Alternatively, you can do by calling the scale function. By default, the scale function creates \(Z\) transformed variables by subtracting the mean and then dividing this mean-deviation by the standard deviation: \[\text{scale}(Y_i) = \frac{Y_i - \overline{Y}}{S_Y}\] The scale function has three arguments:

  • x: the variable (or matrix of variables) which you want to scale
  • center: a logical value indicating whether you want to subtract the mean
  • scale: a logical value indicating whether you want to divide by the standard deviation

Centering (subtracting the mean, but not dividing by the standard deviation) is thus obtained by calling scale(x, scale=FALSE). Personally, I find calling a function scale with argument scale = FALSE a little confusing. The sdamr package therefore provides the function center which is basically just a version of scale which by default sets the argument scale = FALSE:

center <- function(x) {
  scale(x, center = TRUE, scale = FALSE)
}

Because R is a functional programming language, you can call the scale or center function directly within the call to the lm function. This saves you having to create centered variables in a dataframe first. For instance, if you have loaded the sdamr package (or if you ran the code above defining the center function), you can obtain the results of the model with centered predictors by calling

modg_c <- lm(other_like ~ center(other_attr)*center(other_intel) + center(other_fun)*center(other_intel), data=speeddate)
summary(modg_c)
## 
## Call:
## lm(formula = other_like ~ center(other_attr) * center(other_intel) + 
##     center(other_fun) * center(other_intel), data = speeddate)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.2380 -0.6632  0.0239  0.6583  4.6484 
## 
## Coefficients:
##                                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                             6.19613    0.03151 196.609  < 2e-16 ***
## center(other_attr)                      0.34539    0.01922  17.967  < 2e-16 ***
## center(other_intel)                     0.25791    0.02351  10.970  < 2e-16 ***
## center(other_fun)                       0.38292    0.02094  18.287  < 2e-16 ***
## center(other_attr):center(other_intel) -0.04333    0.01171  -3.700 0.000224 ***
## center(other_intel):center(other_fun)   0.03186    0.01139   2.798 0.005209 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.133 on 1472 degrees of freedom
##   (84 observations deleted due to missingness)
## Multiple R-squared:  0.6227, Adjusted R-squared:  0.6214 
## F-statistic: 485.8 on 5 and 1472 DF,  p-value: < 2.2e-16

5.3 Mediation analysis

In the SDAM book, we looked at mediation with a different dataset called legacy2015. This is also part of the sdamr package, and we can make it available and look at the initial cases as usual:

data("legacy2015")
head(legacy2015)
## [1] "Where's your head? It's almost Halloween!"

Oops! We had overwritten (“masked”) the head function before. We can get rid of our new definition of the head function by removing it from the global namespace with the rm() function (the name refers to remove):

rm("head")

After this, we can try again:

head(legacy2015)
##   id    sex age legacy belief intention education income donation
## 2  2   male  23   4.38    5.4      3.57         5      5        0
## 3  3 female  59   4.75    5.2      2.43         5      6        0
## 4  4   male  30   5.00    5.2      2.43         5      3        1
## 5  5 female  25   2.88    5.0      2.00         4      3        0
## 6  6 female  32   4.88    4.4      2.57         2      1        0
## 7  7   male  38   5.12    4.6      3.71         3      2        7

5.3.1 Causal steps

The causal steps approach to assessing mediation is done through testing significance of effects in three regression models. This can be done straightforwardly with the lm() function which we have used quite a bit already. For instance, to assess whether the relation between legacy and donation is mediated by intention, we would estimate and test the parameters of the following models:

mod1 <- lm(donation ~ legacy, data = legacy2015)
mod2 <- lm(intention ~ legacy, data = legacy2015)
mod3 <- lm(donation ~ intention + legacy, data = legacy2015)

5.3.2 Investigating the mediated (indirect) effect with a bootstrap test

There are quite a few packages in R which will allow you to test the mediated effect of a predictor on the dependent variable “via” the mediator. I have chosen here for the mediate function from the mediation package (Tingley et al. 2019), as it is a versatile option, and doesn’t require too much additional explanation. Another very good option to conduct mediation analysis is to specify the mediation model through a generalization of linear models generally called Structural Equation Models. These are multivariate models and something we will cover later on. For present purposes, this can be seen as a way to link different regression models (i.e. the dependent variable of one regression model becomes a predictor in another regression model) into what are conventionally called path models. The current “go-to” and most comprehensive SEM package in R is called lavaan and we will consider that package in a later chapter.

Let’s focus on the mediation package for now. If you haven’t done so already, you will need to install it with

install.packages("mediation")

If you check the documentation of mediate in this package (i.e type ?mediation::mediate), you will see there are lots of arguments to specify. We will only focus one the ones important for present purposes:

  • model.m: the name of the R object which contains the linear model predicting the mediator from the predictor, e.g. mod2 above
  • model.y: the name of the R object which contains the linear model predicting the dependent variable from both the mediator and predictor, e.g. mod3 above
  • sims: the number of simulations to use for the bootstrap test
  • boot: (logical) whether to use a bootstrap test. You should set this to TRUE
  • boot.ci.type: the type of bootstrap confidence interval to be computed. This can either be perc, which stands for percentile, and is a simple way where the empirical 2.5th and 97.5th percentiles are calculated from the ordered outcomes. This is the option chosen by default. The other option is bca which stands for bias-corrected and accelerated. This includes a correction of the percentile method to try and reduce bias. It is generally recommended to use this option in mediation analysis. Hence, you should set boot.ci.type = "bca"
  • treat: the name of the predictor in the linear models specified under model.m and model.y. This is would be e.g. legacy in the models above
  • mediator: the name of the mediator in the linear models specified under model.m and model.y. This is would be e.g. intention in the models above

To run a bootstrap mediation test with 2000 simulations, we can run the following command:

set.seed(20201027)
med <- mediation::mediate(model.m = mod2, model.y = mod3, sims = 2000, boot = TRUE, boot.ci.type = "bca", treat = "legacy", mediator = "intention")
## Running nonparametric bootstrap
summary(med)
## 
## Causal Mediation Analysis 
## 
## Nonparametric Bootstrap Confidence Intervals with the BCa Method
## 
##                Estimate 95% CI Lower 95% CI Upper p-value    
## ACME              0.245        0.104         0.45  <2e-16 ***
## ADE               0.488        0.186         0.82  <2e-16 ***
## Total Effect      0.733        0.425         1.07  <2e-16 ***
## Prop. Mediated    0.334        0.155         0.67  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sample Size Used: 237 
## 
## 
## Simulations: 2000

Note that rather than loading the mediation package with e.g. library(mediation) I’m calling the mediate function through mediation::mediate. Loading the mediation package results in loading quite a few other R packages, which are not all necessary to perform a mediation analysis with linear models. Calling the mediate function directly through the appropriate namespace avoids loading these other add-on packages (which then will mask some functions that I don’t want masked).

The output from calling the summary function on the results of the bootstrap procedure (the R object I named med) has four rows:

  • ACME: this is the average causal mediation effect, i.e. the average of \(\hat{a} \times \hat{b}\) in the simulations. The value under estimate is the average, and you will also see the lower and upper bound of the 95% confidence interval under 95% CI Lower and 95% CI Upper respectively. Finally, the p-value is the probability of the found ACME or more extreme given that in reality, the ACME equals 0. I.e., this is the p-value of the hypothesis test that the true mediated effect equals 0.
  • ADE: this is the average direct effect, and reflects the effect of the predictor which is not mediated. It is the average of \(\hat{c}'\) in the simulations.
  • Total Effect is the total effect of the predictor on the dependent variable, which is the sum of the ACME and ADE.
  • Prop. Mediated is the proportion of the effect of the predictor on the dependent variable which is mediated. This is ACME divided by Total Effect.

The results under ACME show that the bootstrap confidence interval of the mediated effect does not include 0. The p-value for this effect is also smaller than \(\alpha = .05\). As such, the null hypothesis that \(a \times b = 0\) is rejected, and we have found evidence that the effect of legacy on donation is mediated by intention. Because the confidence interval of the ADE also does not include 0, this analysis indicates that the mediation is partial. There is also a significant direct effect of legacy on donation. About 33.4% of the effect of legacy on donation is mediated by intention, so the residual direct effect is quite substantial.

References

Tingley, Dustin, Teppei Yamamoto, Kentaro Hirose, Luke Keele, Kosuke Imai, Minh Trinh, and Weihuang Wong. 2019. Mediation: Causal Mediation Analysis. https://imai.princeton.edu/projects/mechanisms.html.