# Chapter 7 Factorial ANOVA

In this chapter, we will look at factorial ANOVA, different ways of model comparisons (different SS types), and some ways to perform multiple comparisons and post-hoc tests. We will introduce these with the same data – from a study investigating the role of experimenter belief in social priming – used in Chapter 7 of SDAM. The dataset is available in the `sdamr`

package as `expBelief`

. We can load it from there, and inspect the first six cases, as usual:

```
library(sdamr)
data("expBelief")
head(expBelief)
```

```
## pid exptrNum age gender yearInUni ethnicity englishFluency
## 1 1001 1 18 F 1 caucasian 7
## 2 1003 1 18 F 1 CAUCASIAN 7
## 3 1004 1 18 F 1 MiddleEastern 7
## 4 1005 1 19 M 1 asian 5
## 5 1006 1 21 M 1 china 5
## 6 1007 1 19 F 1 caucasion 7
## experimenterBelief primeCond powerPRE powerPOST ApproachAdvantage attractive
## 1 H LPP 56.1 68.8 -149.14290 6
## 2 H LPP 98.2 89.0 -171.48810 1
## 3 H HPP 46.5 70.2 -27.33083 5
## 4 H LPP 58.8 44.8 374.15660 5
## 5 L HPP 64.7 53.6 68.55114 5
## 6 H HPP 61.4 50.3 -92.96860 5
## competent friendly trustworthy
## 1 7 7 7
## 2 3 5 6
## 3 7 7 7
## 4 5 6 5
## 5 6 6 6
## 6 7 7 7
```

## 7.1 Rainclouds for factorial designs

The experiment had a 2 (social prime: low-power bs high power prime) by 2 (experimenter belief: low-power vs high-power manipulation). The two experimental factors are called `primeCond`

and `experimenterBelief`

in the data frame. The dependent variable we looked at is called `ApproachAdvantage`

. We can use the `plot_raincloud`

function from `sdamr`

for plotting the data. The function has an argument `groups`

which allows you to plot separate rainclouds for different levels of a grouping variable. In this case, we need four rainclouds. Because there is no variable to reflect the combinations of the levels of `primeCond`

and `experimenterBelief`

, we should create one first. The `interaction()`

function is a useful function to create a new factor from the combinations of a set of given factors. As I’m going to make changes to the original dataset, I like to first create a new copy of the data fro this, so that I can later still use the original data set.

```
# create a copy of expBelief and call it "dat"
<- expBelief
dat $condition <- interaction(dat$primeCond, dat$experimenterBelief)
dat# show the levels of the newly created factor:
levels(dat$condition)
```

`## [1] "HPP.H" "LPP.H" "HPP.L" "LPP.L"`

You can see that the `condition`

factor has four levels, which concatenate the levels of `primeCond`

(which are `LPP`

for low-power prime, and `HPP`

for high-power prime) and `experimenterBelief`

(which as `L`

for when the experimenter is made to believe the participant received the low-power prime, and `H`

for when the experimenter believed this was the high-power prime). We can now create a raincloud plot for the four conditions as follows:

`plot_raincloud(dat, ApproachAdvantage, groups=condition)`

In the book, I first changed the labels of the two variables before calling the interaction function. If you want the same plot as in the book, you could run the following code (which is not evaluated here):

```
# turn primeCond and experimenterBelief in factors and change the labels
$primeCond <- factor(dat$primeCond, labels=c("PH","PL"))
dat$experimenterBelief <- factor(dat$experimenterBelief, labels=c("EH","EL"))
dat# now create an interaction factor, and change the separation sign to "-" instead of "."
$condition <- interaction(dat$primeCond, dat$experimenterBelief, sep="-")
datplot_raincloud(dat, ApproachAdvantage, groups=condition)
```

The raincloud plot above effectively treats the design as a oneway design. If we want the plot to more directly reflect the factorial design, we can add some functionality from the `ggplot2`

package. In particular, we can use so-called *facets*, which basically allow you to repeatedly draw a similar plot for different levels of an independent variable. Because the `plot_raincloud`

produces a raincloud plot by calling underlying `ggplot2`

functions, and the result is a `ggplot`

, you can use any function from `ggplot2`

to make changes to the resulting plot. For instance, we can, within a plot, separate the levels of the `experimenterBelief`

manipulation, and then create two panels (facets) for the levels of the `primeCond`

condition. This is done as follows:

`plot_raincloud(dat, ApproachAdvantage, groups = experimenterBelief) + facet_grid(~primeCond)`

As usual, it pays to read the documentation for the `facet_grid`

function (try calling `?facet_grid`

). There is an alternative for `facet_grid`

, called `facet_wrap`

, which provides slightly different labelling to the panels. `facet_grid`

is particularly useful when you have *two* independent variables in a factorial design for which you would like to create different panels (we will show an example of this later). As we are considering factorial designs here, I chose to use `facet_grid`

, but you can try `facet_wrap`

as well.

## 7.2 Formulating, estimating, and testing a factorial ANOVA

Formulating a factorial ANOVA model, where we distinguish between main effects and interactions, is not any different from formulating a moderated regression model. We can use the formula interface to indicate that we want to include predictors, as well as the product predictors required to assess interactions. In this case, however, we will enter nominal independent variables into the formula. When these are defined as `factors`

with associated contrast codes, R will automatically expand the model to include the contrast-coded predictors, as well as all relevant product-predictors.

The first thing to do is to make sure that the variables are defined as factors

```
# check what type the two IVs are
class(dat$primeCond)
```

`## [1] "character"`

`class(dat$experimenterBelief)`

`## [1] "character"`

```
# turn each into a factor
$primeCond <- as.factor(dat$primeCond)
dat$experimenterBelief <- as.factor(dat$experimenterBelief)
dat# let's check the class for one of them to make sure
class(dat$primeCond)
```

`## [1] "factor"`

`# that worked :-)`

Now let’s define appropriate contrast codes. As usual, it is a good idea to first check the existing contrast, as this shows the order of the factor levels:

`contrasts(dat$primeCond)`

```
## LPP
## HPP 0
## LPP 1
```

We can see that we need to define a single contrast, with the value for `HPP`

(high-power prime) first and then the value for `LPP`

(low-power prime) second. As the social priming hypothesis would predict the ApproachAdvantage score to be higher for `LPP`

than for `HPP`

, the following contrast makes sense:

```
contrasts(dat$primeCond) <- c(1/2, -1/2)
contrasts(dat$primeCond)
```

```
## [,1]
## HPP 0.5
## LPP -0.5
```

We define the contrast for `experimenterBelief`

in the same way:

`contrasts(dat$experimenterBelief)`

```
## L
## H 0
## L 1
```

```
# H comes before l
contrasts(dat$experimenterBelief) <- c(1/2, -1/2)
contrasts(dat$experimenterBelief)
```

```
## [,1]
## H 0.5
## L -0.5
```

Now we are ready to estimate the linear model. To estimate a model with the main effects and interaction, we would use:

`<- lm(ApproachAdvantage ~ primeCond*experimenterBelief, data=dat) modg `

Remember that this notation will expand the formula to

`ApproachAdvantage ~ 1 + primeCond + experimenterBelief + primeCond:experimenterBelief`

i.e. to a model with an intercept, a main effect of `primeCond`

, a main effect of `experimenterBelief`

, and an interaction `primeCond:experimenterBelief`

. The easiest way to obtain the parameter estimates (and t-tests for those) is to use the `summary`

function on this fitted linear model:

`summary(modg)`

```
##
## Call:
## lm(formula = ApproachAdvantage ~ primeCond * experimenterBelief,
## data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -556.7 -176.3 10.6 169.8 605.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.46 10.72 2.002 0.046 *
## primeCond1 11.98 21.44 0.559 0.577
## experimenterBelief1 90.51 21.44 4.221 3.02e-05 ***
## primeCond1:experimenterBelief1 37.64 42.88 0.878 0.381
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 214.4 on 396 degrees of freedom
## Multiple R-squared: 0.04565, Adjusted R-squared: 0.03842
## F-statistic: 6.314 on 3 and 396 DF, p-value: 0.0003425
```

Alternatively, we can use the `Anova`

function from the `car`

package to obtain Type 3 (omnibus) tests:

`::Anova(modg, type=3) car`

```
## Anova Table (Type III tests)
##
## Response: ApproachAdvantage
## Sum Sq Df F value Pr(>F)
## (Intercept) 184285 1 4.0084 0.04596 *
## primeCond 14343 1 0.3120 0.57678
## experimenterBelief 819158 1 17.8175 3.018e-05 ***
## primeCond:experimenterBelief 35410 1 0.7702 0.38069
## Residuals 18206049 396
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

You can see that while the `Anova`

function reports \(F\) statistics, the tests and the corresponding \(p\)-values are identical. In this case, each factor only has two levels, and hence one contrast code. As there is only one parameter associated to each main effect and interaction (the slope of the single contrast-coded predictor for that effect), the omnibus test is a single parameter test. We will see an example where this is not the case shortly.

### 7.2.1 A threeway ANOVA

We can also try to assess the experimenter effects by including this as an additional factor. Experimenter has four levels, so we’ll need three contrast codes for this variable. In the data.frame, Experimenter is included as `exptrNum`

, which is a numerical variable. So we will first convert it into a factor

`class(dat$exptrNum)`

`## [1] "integer"`

`$exptrNum <- factor(dat$exptrNum, labels=paste0("E",1:4)) dat`

I’m using `factor`

here rather than `as.factor`

, because the former allows me to add labels to the levels, through the `labels`

argument. Note that I’m using the `paste0`

function to create a vector with labels. This function can create combinations of (character) vectors, and is quite handy. The `paste0`

function is very similar to the `paste`

function, but doesn’t include a space between the combinations:

`paste0("E",1:4)`

`## [1] "E1" "E2" "E3" "E4"`

`paste("E",1:4)`

`## [1] "E 1" "E 2" "E 3" "E 4"`

Right, so let’s define a contrast for `exptrNum`

.

`contrasts(dat$exptrNum)`

```
## E2 E3 E4
## E1 0 0 0
## E2 1 0 0
## E3 0 1 0
## E4 0 0 1
```

```
contrasts(dat$exptrNum) <- cbind(c(-1/2,1/2,0,0),
c(-1/3,-1/3,2/3,0),
c(-1/4,-1/4,-1/4,3/4))
contrasts(dat$exptrNum)
```

```
## [,1] [,2] [,3]
## E1 -0.5 -0.3333333 -0.25
## E2 0.5 -0.3333333 -0.25
## E3 0.0 0.6666667 -0.25
## E4 0.0 0.0000000 0.75
```

Before conducting the analysis, it is always a good idea to look at the data. Let’s create a slightly different raincloud plot than the one in Chapter 8 of SDAMR, now more explicitly reflecting the factorial nature of the design:

`plot_raincloud(dat, ApproachAdvantage, groups = experimenterBelief) + facet_grid(primeCond ~ exptrNum)`

This plot is not necessarily better than the one in SDAM. It does quite clearly highlight that experimenter belief does not seem to have an effect for Experimenter 4. However, personally, I find it more difficult to assess the effect of prime condition. For that, we could create a second plot to show the effect of `primeCond`

within each panel:

`plot_raincloud(dat, ApproachAdvantage, groups = primeCond) + facet_grid(experimenterBelief ~ exptrNum)`

This indicates quite clearly that priming condition does not seem to have much of an effect for any experimenter of experimenter belief condition.

Back to the analysis, then. We can estimate a threeway factorial ANOVA by simply adding another independent variable to the formula:

`<- lm(ApproachAdvantage ~ primeCond*experimenterBelief*exptrNum, data=dat) modg_exp `

This formula is expanded to

`ApproachAdvantage ~ 1 + primeCond + experimenterBelief + exptrNum + primeCond:experimenterBelief + primeCond:exptrNum + experimenterBelief:exptrNum + primeCond:experimenterBelief:exptrNum`

In other words, the model includes all the main effects, all pairwise interactions between the factors, as well as the threeway interaction. We can see the parameter estimates and associated \(t\)-tests as usual through the summary function:

`summary(modg_exp)`

```
##
## Call:
## lm(formula = ApproachAdvantage ~ primeCond * experimenterBelief *
## exptrNum, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -507.20 -179.77 11.34 167.78 589.70
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.401 10.742 1.992 0.0471
## primeCond1 11.908 21.485 0.554 0.5797
## experimenterBelief1 90.441 21.485 4.210 3.19e-05
## exptrNum1 -7.411 30.387 -0.244 0.8075
## exptrNum2 -5.076 26.312 -0.193 0.8471
## exptrNum3 42.309 24.807 1.706 0.0889
## primeCond1:experimenterBelief1 37.381 42.969 0.870 0.3849
## primeCond1:exptrNum1 -60.323 60.774 -0.993 0.3215
## primeCond1:exptrNum2 1.765 52.625 0.034 0.9733
## primeCond1:exptrNum3 -21.445 49.613 -0.432 0.6658
## experimenterBelief1:exptrNum1 -17.453 60.774 -0.287 0.7741
## experimenterBelief1:exptrNum2 11.056 52.625 0.210 0.8337
## experimenterBelief1:exptrNum3 -116.864 49.613 -2.356 0.0190
## primeCond1:experimenterBelief1:exptrNum1 48.352 121.548 0.398 0.6910
## primeCond1:experimenterBelief1:exptrNum2 -8.115 105.249 -0.077 0.9386
## primeCond1:experimenterBelief1:exptrNum3 68.237 99.227 0.688 0.4921
##
## (Intercept) *
## primeCond1
## experimenterBelief1 ***
## exptrNum1
## exptrNum2
## exptrNum3 .
## primeCond1:experimenterBelief1
## primeCond1:exptrNum1
## primeCond1:exptrNum2
## primeCond1:exptrNum3
## experimenterBelief1:exptrNum1
## experimenterBelief1:exptrNum2
## experimenterBelief1:exptrNum3 *
## primeCond1:experimenterBelief1:exptrNum1
## primeCond1:experimenterBelief1:exptrNum2
## primeCond1:experimenterBelief1:exptrNum3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 214.8 on 384 degrees of freedom
## Multiple R-squared: 0.07105, Adjusted R-squared: 0.03476
## F-statistic: 1.958 on 15 and 384 DF, p-value: 0.01719
```

Wow, there are a *lot* of estimates and tests here (16 in total)! While these tests are informative, it is common to (at least also) consider omnibus tests. Experimenter has four levels, so three associated contrasts, and we can’t find a test of the “overall” main effect of Experimenter in the output above. For these omnibus tests, we can (as before) use the `Anova`

function from the `car`

package:

`::Anova(modg_exp,type=3) car`

```
## Anova Table (Type III tests)
##
## Response: ApproachAdvantage
## Sum Sq Df F value Pr(>F)
## (Intercept) 183165 1 3.9689 0.04705 *
## primeCond 14178 1 0.3072 0.57972
## experimenterBelief 817797 1 17.7205 3.19e-05 ***
## exptrNum 138582 3 1.0010 0.39238
## primeCond:experimenterBelief 34927 1 0.7568 0.38487
## primeCond:exptrNum 54288 3 0.3921 0.75875
## experimenterBelief:exptrNum 262131 3 1.8933 0.13018
## primeCond:experimenterBelief:exptrNum 29490 3 0.2130 0.88738
## Residuals 17721491 384
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

If you’d just consider the output from this function, which does not provide a significant Experimenter by Belief interaction, you probably would have missed the potentially interesting `experimenterBelief1:exptrNum3`

interaction, which was discussed in SDAM.

## 7.3 Type 1, 2, and 3 Sums of Squares

Whilst intended (I think!) as a factorial experiment with an equal sample size for each priming condition, experimenter belief, and experimenter combination, the sample sizes are actually slightly unbalanced. One way to count the number of cases for each combination of factor levels is through the `ftable`

function (which stands for frequency table). The function has various interfaces, and I find the `formula`

interface easiest to work with. On the left-hand side of the formula, you can provide the name of a factor which you want to place in the columns of the table, and on the right-hand side you can include multiple factors which make up the rows, separated by a “+” sign:

`ftable(exptrNum ~ primeCond + experimenterBelief, data=dat)`

```
## exptrNum E1 E2 E3 E4
## primeCond experimenterBelief
## HPP H 26 25 25 25
## L 25 25 25 25
## LPP H 25 25 25 25
## L 25 24 25 25
```

As you can see, Experimenter 1 tested 26 participants in the high-power prime and high experimenter belief condition, whilst experimenter 2 tested 24 participants in the low-power prime and low experimenter belief condition.

The result of unbalanced data is that the contrast-coded predictors are no longer orthogonal. As a result, different ways of performing model comparisons will give different results. The differences are likely to be rather subtle here, because the sample sizes are mostly equal. Nevertheless, let’s consider how we can obtain results for the Type 1 and Type 2 SS procedures.

A Type 2 procedure is easily obtained by using the `car::Anova`

function, now with argument `type=2`

:

`::Anova(modg_exp,type=2) car`

```
## Anova Table (Type II tests)
##
## Response: ApproachAdvantage
## Sum Sq Df F value Pr(>F)
## primeCond 14682 1 0.3181 0.5731
## experimenterBelief 819687 1 17.7615 3.125e-05 ***
## exptrNum 138507 3 1.0004 0.3926
## primeCond:experimenterBelief 34707 1 0.7520 0.3864
## primeCond:exptrNum 53901 3 0.3893 0.7608
## experimenterBelief:exptrNum 261854 3 1.8913 0.1305
## primeCond:experimenterBelief:exptrNum 29490 3 0.2130 0.8874
## Residuals 17721491 384
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

If you compare the results to those of the Type 3 procedure used earlier, you can see some subtle differences. You can see that (apart from the threeway interaction), the SSR terms are slightly different, leading to small differences in the \(F\) statistic and associated \(p\)-value.

Unfortunately, the `car::Anova`

function will not work with `type=1`

. To get the results of a Type 1 procedure, you can use the `anova`

function from the default `stats`

package:

`anova(modg_exp)`

```
## Analysis of Variance Table
##
## Response: ApproachAdvantage
## Df Sum Sq Mean Sq F value Pr(>F)
## primeCond 1 14565 14565 0.3156 0.5746
## experimenterBelief 1 820883 820883 17.7874 3.084e-05
## exptrNum 3 138657 46219 1.0015 0.3921
## primeCond:experimenterBelief 1 35260 35260 0.7640 0.3826
## primeCond:exptrNum 3 54707 18236 0.3951 0.7566
## experimenterBelief:exptrNum 3 261854 87285 1.8913 0.1305
## primeCond:experimenterBelief:exptrNum 3 29490 9830 0.2130 0.8874
## Residuals 384 17721491 46150
##
## primeCond
## experimenterBelief ***
## exptrNum
## primeCond:experimenterBelief
## primeCond:exptrNum
## experimenterBelief:exptrNum
## primeCond:experimenterBelief:exptrNum
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

again, comparing this to the results of those obtained previously, you can see differences in the SSR terms. It is important to realise that the Type 1 procedure depends on the order of the factors in the formula. For instance, if we change this order as follows:

`anova(lm(ApproachAdvantage ~ experimenterBelief*exptrNum*primeCond, data=dat))`

```
## Analysis of Variance Table
##
## Response: ApproachAdvantage
## Df Sum Sq Mean Sq F value Pr(>F)
## experimenterBelief 1 820878 820878 17.7873 3.084e-05
## exptrNum 3 138443 46148 1.0000 0.3928
## primeCond 1 14784 14784 0.3204 0.5717
## experimenterBelief:exptrNum 3 263137 87712 1.9006 0.1290
## experimenterBelief:primeCond 1 34782 34782 0.7537 0.3859
## exptrNum:primeCond 3 53901 17967 0.3893 0.7608
## experimenterBelief:exptrNum:primeCond 3 29490 9830 0.2130 0.8874
## Residuals 384 17721491 46150
##
## experimenterBelief ***
## exptrNum
## primeCond
## experimenterBelief:exptrNum
## experimenterBelief:primeCond
## exptrNum:primeCond
## experimenterBelief:exptrNum:primeCond
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

you get slightly different results. I should also mention the `aov`

function from the `stats`

package, which will also provide Type 1 ANOVA tests. I have wrestled with this function often when I started using R a long time ago. I’ve happily not used it for some time now, so will only mention its existence here.

While the differences are very subtle here, this should not lead you to believe that the methods generally provide the same results. When the design is more unbalanced, the results can change quite dramatically.

Finally, I want to point out again that all three procedures test the SSR terms against the same SSE term (the SS given under `Residuals`

). This is the SSE of the full model (the model with all effects) and this is exactly the same for all three procedures. The procedures differ in how they compute the SSR terms for the different effects, as you can see.

## 7.4 Planned comparisons and post-hoc tests with `emmeans`

The `emmeans`

package (Lenth 2021) is very useful when you want to do more comparisons than can be implemented in the contrast codes within a single model, whether these are planned comparisons or post-hoc tests. The name of the package stands for *estimated marginal means*. One part of the functionality of the package is to compute the (unweighted) marginal means according to different types of models, including linear models. In the SDAM book, we discussed how these marginal means can be computed using contrast codes. For example, when using orthogonal contrasts, the intercept represents the grand mean, which is a simple average of averages, where the sample means of all groups are added and then divided by the number of groups. If the groups have unequal sample sizes, this is not taken into account in computing the grand mean. That is what is meant by *unweighted* marginal means.

You can think of the marginal means as the estimated population means assuming all groups have an equal sample size. The `emmeans`

function (from the `emmeans`

package with the same name) provides a simple way to compute the estimated marginal means for each condition, but also for the levels of one factor (averaging over the levels of other factors). The `emmeans`

requires at least two arguments: an `object`

, which is an estimated model, and `specs`

, which is either a character vector with the names for the predictors for which the estimated marginal predictors should be computed, or a formula. Here, we will use the formula interface, as it is flexible and intuitive. Going back to our simpler 2 by 2 design (ignoring Experimenter), the estimated marginal means of the four conditions can be computed with `emmeans`

as follows:

```
# load the package. If you don't have it installed, you will need to run
# install.packages("emmeans") first!
library(emmeans)
# call emmeans with modg as the object
emmeans(modg, specs = ~ primeCond:experimenterBelief)
```

```
## primeCond experimenterBelief emmean SE df lower.CL upper.CL
## HPP H 82.1 21.3 396 40.17 124.1
## LPP H 51.3 21.4 396 9.17 93.5
## HPP L -27.2 21.4 396 -69.36 14.9
## LPP L -20.4 21.5 396 -62.74 22.0
##
## Confidence level used: 0.95
```

You can see that the `emmeans`

function computes an estimated marginal mean for each combination of `primeCond`

and `experimenterBelief`

. For each mean, we get an estimate, a standard error of that estimate, the degrees of freedom (\(n - \text{npar}(G)\)), and a confidence interval. The marginal means for each group in the design are just the sample means in the groups in this case, but things become more complicated when we add additional metric predictors to the design, as we will see when we discuss ANCOVA in another chapter. You can also obtain estimated marginal means for the levels of one factor, averaging over the levels of the others. These are the marginal means that are compared in the main effects of that factor. For instance, for the `primeCond`

factor, the marginal means are

`emmeans(modg, specs = ~ primeCond)`

`## NOTE: Results may be misleading due to involvement in interactions`

```
## primeCond emmean SE df lower.CL upper.CL
## HPP 27.5 15.1 396 -2.28 57.2
## LPP 15.5 15.2 396 -14.41 45.4
##
## Results are averaged over the levels of: experimenterBelief
## Confidence level used: 0.95
```

and for `experimenterBelief`

they are:

`emmeans(modg, specs = ~ experimenterBelief)`

`## NOTE: Results may be misleading due to involvement in interactions`

```
## experimenterBelief emmean SE df lower.CL upper.CL
## H 66.7 15.1 396 37.0 96.45
## L -23.8 15.2 396 -53.7 6.09
##
## Results are averaged over the levels of: primeCond
## Confidence level used: 0.95
```

In addition to computing marginal means and providing confidence intervals for each, the package has a reasonably straightforward interface for testing *differences* between estimated marginal means. Such differences are effectively the contrasts that we have specified with contrast codes, and tested with Type 3 tests. A benefit of using `emmeans`

is that you can test more of these contrasts than the required number of contrast codes (i.e. 3 in this example).

If you want all pairwise comparisons between the means, you can get these by entering `pairwise`

as the left-hand side of the formula:

`emmeans(modg, specs = pairwise ~ primeCond:experimenterBelief)`

```
## $emmeans
## primeCond experimenterBelief emmean SE df lower.CL upper.CL
## HPP H 82.1 21.3 396 40.17 124.1
## LPP H 51.3 21.4 396 9.17 93.5
## HPP L -27.2 21.4 396 -69.36 14.9
## LPP L -20.4 21.5 396 -62.74 22.0
##
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## HPP H - LPP H 30.79 30.2 396 1.018 0.7390
## HPP H - HPP L 109.33 30.2 396 3.614 0.0019
## HPP H - LPP L 102.49 30.3 396 3.380 0.0044
## LPP H - HPP L 78.53 30.3 396 2.590 0.0487
## LPP H - LPP L 71.69 30.4 396 2.358 0.0870
## HPP L - LPP L -6.84 30.4 396 -0.225 0.9960
##
## P value adjustment: tukey method for comparing a family of 4 estimates
```

which automatically uses the Tukey HSD procedure to adjust the significance level of each test to obtain a family-wise significance level of \(\alpha_\text{FW} = .05\). You can obtain other corrections through the `adjust`

argument. Some options to enter there, which were discussed in SDAM, are:

`tukey`

`scheffe`

`bonferroni`

`holm`

There are other possibilities (see `?summary.emmGrid`

for details). For instance, we can apply the Scheffé adjustment with:

`emmeans(modg, specs = pairwise ~ primeCond:experimenterBelief, adjust="scheffe")`

```
## $emmeans
## primeCond experimenterBelief emmean SE df lower.CL upper.CL
## HPP H 82.1 21.3 396 40.17 124.1
## LPP H 51.3 21.4 396 9.17 93.5
## HPP L -27.2 21.4 396 -69.36 14.9
## LPP L -20.4 21.5 396 -62.74 22.0
##
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## HPP H - LPP H 30.79 30.2 396 1.018 0.7924
## HPP H - HPP L 109.33 30.2 396 3.614 0.0049
## HPP H - LPP L 102.49 30.3 396 3.380 0.0103
## LPP H - HPP L 78.53 30.3 396 2.590 0.0836
## LPP H - LPP L 71.69 30.4 396 2.358 0.1369
## HPP L - LPP L -6.84 30.4 396 -0.225 0.9970
##
## P value adjustment: scheffe method with rank 3
```

### 7.4.1 Adjusted p-values

One thing I should mention is that rather than showing you the corrected significance level \(\alpha\) for each test, `emmeans`

provides you with an **adjusted p-value**. A benefit of this is that you can just compare each \(p-value\) to the usual criterion level and call each test significant when \(p<.05\). However, I personally don’t find the resulting \(p\) value easy to interpret as a probability. Remember that the conventional \(p\)-value is the probability of obtaining a test result as large or more extreme, assuming that the null hypothesis is true. This is itself already a tricky concept, but with some experience with statistical distributions and the conceptual foundations of null-hypothesis significance testing, it is a valid probability that is interpretable as such. Wright (1992) defines the adjusted p-value as *the smallest family-wise significance level at which the tested null-hypothesis would be rejected*. This isn’t really a probability any more, as far as I can see it. It is true that the conventional \(p\)-value is, by definition, also equal to the smallest significance level at which the null-hypothesis would be rejected. For instance, if a particular test provides a \(p\)-value of \(p = .07\), then you would reject the null-hypothesis by setting \(\alpha \geq .07\). Hence, \(\alpha = .07\) is the smallest value of \(\alpha\) which would provide a significant test result. Similarly, if the test provided a \(p\)-value of \(p=.004\), then \(\alpha = .004\) is the smallest significance level for which the test would provide a significant result.

Although the \(p\)-value is equivalent to this “minimum \(\alpha\)” value, it is also a valid probability, and when you move to the domain of corrections for multiple comparisons, defining the \(p\)-value as the minimum family-wise significance level \(\alpha_\text{FW}\) for which the individual test would provide a significant test result, the correspondence with a proper probability is lost.

For the Bonferroni correction, the adjusted \(p\)-value is easy to compute. Remember that the Bonferroni correction for a total of \(q\) tests is to set the significance level of each individual test to \(\alpha = \frac{\alpha}{q}\). We can adjust the \(p\)-values correspondingly as \(p_\text{adj} = q \times p\). But if you’d perform \(q=100\) tests, and \(p=.2\), then \(p_\text{adj} = 100 \times .2 = 20\), which is obviously not a valid probability! Adjusted \(p\)-values are in this sense just convenience values which can be compared to e.g. \(\alpha_\text{FW} = .05\), but nothing more.

## 7.5 Testing general contrasts with `emmeans`

When you have computed the required estimated marginal means, you can then use these to define a set of general contrasts that you want to test. This set can include more contrasts than \(g-1\), but each contrast is defined in a way that we are used to. Let’s consider an example.

```
<- emmeans(modg, specs = ~ primeCond:experimenterBelief)
ems ems
```

```
## primeCond experimenterBelief emmean SE df lower.CL upper.CL
## HPP H 82.1 21.3 396 40.17 124.1
## LPP H 51.3 21.4 396 9.17 93.5
## HPP L -27.2 21.4 396 -69.36 14.9
## LPP L -20.4 21.5 396 -62.74 22.0
##
## Confidence level used: 0.95
```

We have four estimated marginal means, and the order that these are presented in is seen above: `HPP,H`

, `LPP,H`

, `HPP,L`

, and `LPP,L`

. We can now use the `contrast`

function from `emmeans`

(note that there is no “s” at the end, so this is a different function than `contrasts`

!) to supply the `ems`

object and a list of (named) contrasts. Suppose we want to test the following set of (somewhat arbitrary) contrasts:

\(c_1\) | \(c_2\) | \(c_3\) | \(c_4\) | \(c_5\) | \(c_6\) | \(c_7\) | \(c_8\) | |
---|---|---|---|---|---|---|---|---|

HPP,H | \(\tfrac{1}{2}\) | \(\tfrac{1}{2}\) | \(\tfrac{1}{2}\) | \(0\) | \(\tfrac{1}{2}\) | \(0\) | \(\tfrac{2}{3}\) | \(0\) |

LPP,H | \(-\tfrac{1}{2}\) | \(\tfrac{1}{2}\) | \(0\) | \(\tfrac{1}{2}\) | \(-\tfrac{1}{2}\) | \(0\) | \(-\tfrac{1}{3}\) | \(-\tfrac{1}{3}\) |

HPP,L | \(\tfrac{1}{2}\) | \(-\tfrac{1}{2}\) | \(-\tfrac{1}{2}\) | \(0\) | \(0\) | \(\tfrac{1}{2}\) | \(0\) | \(\tfrac{2}{3}\) |

LPP,L | \(-\tfrac{1}{2}\) | \(-\tfrac{1}{2}\) | \(0\) | \(-\tfrac{1}{2}\) | \(0\) | \(-\tfrac{1}{2}\) | \(-\tfrac{1}{3}\) | \(-\tfrac{1}{3}\) |

These can be interpreted as

- \(c_1\): main effect of Prime
- \(c_2\): main effect of Belief
- \(c_3\): comparing high and low belief for low-power prime
- \(c_4\): comparing high and low belief for high-power prime
- \(c_5\): comparing high-power and low-power prime for high belief
- \(c_6\): comparing high-power and low-power prime for low belief
- \(c_7\): comparing high-power prime with high belief to low-power prime conditions
- \(c_8\): comparing high-power prime with low belief to low-power prime conditions

Using the `contrast`

function, all these comparisons can be tested simultaneously, using the Scheffé adjustment, as follows:

```
contrast(ems,
method = list(c1 = c(1/2, -1/2, 1/2, -1/2),
c2 = c(1/2, 1/2, -1/2, -1/2),
c3 = c(1/2, 0, -1/2, 0),
c4 = c(0, 1/2, 0, -1/2),
c5 = c(1/2, -1/2, 0, 0),
c6 = c(0, 0, 1/2, -1/2),
c7 = c(2/3, -1/3, 0, -1/3),
c8 = c(0, -1/3, 2/3, -1/3)),
adjust="scheffe")
```

```
## contrast estimate SE df t.ratio p.value
## c1 11.98 21.4 396 0.559 0.9577
## c2 90.51 21.4 396 4.221 0.0006
## c3 54.66 15.1 396 3.614 0.0049
## c4 35.85 15.2 396 2.358 0.1369
## c5 15.40 15.1 396 1.018 0.7924
## c6 -3.42 15.2 396 -0.225 0.9970
## c7 44.43 17.5 396 2.544 0.0926
## c8 -28.46 17.5 396 -1.624 0.4518
##
## P value adjustment: scheffe method with rank 3
```

If this were a set of planned comparison, and you were confident enough to not apply a correction for multiple comparison, you could leave out the `adjust`

argument, or provide the value `adjust="none"`

.

### References

*Emmeans: Estimated Marginal Means, Aka Least-Squares Means*. https://github.com/rvlenth/emmeans.

*Biometrics*, 1005–13.