Chapter 8 Factorial ANOVA

In this chapter, we will extend our discussion of including nominal predictors in the General Linear Model to the case of multiple nominal predictors, where all the combinations of the levels of each are included in the design of a study. The analysis of such designs, with a factorial combination of all levels, is generally called factorial ANOVA. While the method is, as we will see, not fundamentally different than a oneway ANOVA, treated in the previous chapter, the factorial nature allows one to distinguish between main effects of factors, and their interactions. After discussing how to interpret these effects, we turn to issues that arise when there are an unequal number of observations for the combinations of the levels.

8.1 Experimenter beliefs and social priming

Research suggests that stimuli that prime social concepts can fundamentally alter people’s behaviour. For instance, in one experiment people were primed with either a high-power or low-power social status. Not only did people differ in how powerful they felt, they also differed in their cognitive functioning, processing information more quickly when it was consistent with the induced status (low or high power). However, it is also known that experimenter expectations may alter participants’ behaviour. As many studies on social priming have not been conducted as double-blind experiments (where neither experimenters nor participants are aware of the actual experimental conditions), it may be the case that some of the results were due to experimenter expectations. To investigate this, Gilder & Heerey (2018) conducted an experiment in which they systematically primed social status, as well as experimenters’ beliefs about which prime was used for which participant.

A total of \(n = 400\) students participated in their experiment. They first performed a priming task in which they were either assigned a high-power (“boss”) or low-power (“employee”) role. Independent of the actual condition, the experimenter (one of four research assistants) was made to believe that half of the participants in each condition were in the other condition (e.g., that people in the high-power condition were in the low-power condition). After the priming task, participants performed a lexical decision task, in which they as quickly as possible had to indicate whether a presented letter string was a word or non-word. Their response was made by pressing a key to move a stick figure closer (approach) or further away (avoid) from the word. Earlier research found that participants primed with high power were quicker to approach than avoid stimuli, while the reverse was true for those primed with low power. The dependent variable was therefore an “approach advantage”, calculated \(\texttt{ApproachAdvantage}_i = \overline{\texttt{RT}}_{\text{avoid},i} - \overline{\texttt{RT}}_{\text{approach},i}\)).

According to the “social priming hypothesis”, people will be faster to approach than avoid when they feel they have more power (and be faster to avoid than approach when they feel they have low power). If this hypothesis is true, then the \(\texttt{ApproachAdvantage}\) measure would be positive on average in the high-power condition, and negative on average in the low-power condition. Alternatively, according to the “experimenter belief hypothesis”, it is the experimenter belief about the condition, and not the actual condition, that drives any effects. If this hypothesis is true, then the \(\texttt{ApproachAdvantage}\) measure would be positive whenever the experimenter believes a participant is in the high-power condition, and negative when the experimenter believes a participant is in the low-power condition. Figure 8.1 shows the data for the four conditions (the four possible combinations of prime and experimenter belief). The plot looks like the results are more consistent with the experimenter-belief hypothesis: the averages of \(\texttt{ApproachAdvantage}\) in the two “experimenter low” conditions are lower than in the two “experimenter high” conditions.

Approach advantage scores in the four conditions of Experiment 5 of @gilder2018role; PL = Low-power prime, PH = high-power prime, EL = experimenter believes condition was PL, EH = experimenter believes condition was PH

Figure 8.1: Approach advantage scores in the four conditions of Experiment 5 of Gilder & Heerey (2018); PL = Low-power prime, PH = high-power prime, EL = experimenter believes condition was PL, EH = experimenter believes condition was PH

8.1.1 A oneway ANOVA

Let’s analyze the data with the tools we already have. There are in total four conditions in the experiment, so we can treat condition as a nominal independent variable with four levels. We then need to use three contrast codes. We are free to choose these in any way we like, as long as the model is able to predict ApproachAdvantage in each condition as the sample average in that condition. What would be the most interesting comparisons to make for this study? Two contrasts of interest follow directly from the two main hypotheses. To test the “social priming hypothesis”, it is of interest to compare the conditions in which participants received a low-power prime to those in which they received a high-power prime. More specifically, if the social-priming hypothesis is false and the power-primes have no effect on the speediness of approach and avoid responses, then the average of the low-power prime conditions would be equal to the average of the high-power prime conditions: \[\frac{\mu_\text{PL,EL} + \mu_\text{PL,EH}}{2} = \frac{\mu_\text{PH,EL} + \mu_\text{PH,EH}}{2}\] If the social-priming hypothesis is true, on the other hand, we would expect the approach advantage scores to be higher in the high-power prime conditions than in the low-power prime conditions, i.e.: \[\frac{\mu_\text{PH,EL} + \mu_\text{PH,EH}}{2} - \frac{\mu_\text{PL,EL} + \mu_\text{PL,EH}}{2} > 0\] The suggested contrast code is then \(c_1 = (-\tfrac{1}{2}, -\tfrac{1}{2}, \tfrac{1}{2}, \tfrac{1}{2})\) for the PL-EL, PL-EH, PH-EL, and PH-EH conditions, respectively.

To test the experimenter-belief hypothesis, a similar comparison would be made between the “experimenter believes low” and “experimenter believes high” conditions. More specifically, if the experimenter-belief hypothesis is false and the experimenter beliefs have no effect on the speediness of approach and avoid responses, then the average of the experimenter-believes-low conditions would be equal to the average of the experimenter-believes-high conditions: \[\frac{\mu_\text{PL,EL} + \mu_\text{PH,EL}}{2} = \frac{\mu_\text{PL,EH} + \mu_\text{PH,EH}}{2}\] If the experimenter-belief hypothesis is true, on the other hand, we would expect the approach advantage scores to be higher in the experimenter-believes-high conditions than in the experimenter-believes-low conditions, i.e.: \[\frac{\mu_\text{PL,EH} + \mu_\text{PH,EH}}{2} - \frac{\mu_\text{PL,EL} + \mu_\text{PH,EL}}{2} > 0\] The suggested contrast code is then \(c_2 = (-\tfrac{1}{2}, \tfrac{1}{2}, -\tfrac{1}{2}, \tfrac{1}{2})\) for the PL-EL, PL-EH, PH-EL, and PH-EH conditions, respectively.

It is straightforward to check that these two contrasts are orthogonal.20 If you don’t have any burning other question to ask about differences between the conditions, it then makes sense to look for a third contrast which completes the set of orthogonal contrasts (i.e. we should look for a contrast \(c_3\) that is orthogonal to \(c_1\) and \(c_2\)). One such contrast is \(c_3 = (\tfrac{1}{2}, -\tfrac{1}{2}, -\tfrac{1}{2}, \tfrac{1}{2})\). This contrast provides a test of the following equivalence: \[\frac{\mu_\text{PL,EL} + \mu_\text{PH,EH}}{2} = \frac{\mu_\text{PL,EH} + \mu_\text{PH,EL}}{2}\] Our full set of three (orthogonal) contrasts is thus the following:

\(c_1\) \(c_2\) \(c_3\)
PL,EL \(-\tfrac{1}{2}\) \(-\tfrac{1}{2}\) \(\tfrac{1}{2}\)
PL,EH \(-\tfrac{1}{2}\) \(\tfrac{1}{2}\) \(-\tfrac{1}{2}\)
PH,EL \(\tfrac{1}{2}\) \(-\tfrac{1}{2}\) \(-\tfrac{1}{2}\)
PH,EH \(\tfrac{1}{2}\) \(\tfrac{1}{2}\) \(\tfrac{1}{2}\)

Estimating a linear regression model with the corresponding contrast-coded predictors gives the results provided in Table 8.1.

Table 8.1: Linear model predicting \(\texttt{ApproachAdvantage}\) by three orthogonal contrast-coded predictors.
\(\hat{\beta}\) \(\text{SS}\) \(\text{df}\) \(F\) \(p(\geq \lvert F \rvert)\)
Intercept 21.5 184285 1 4.008 0.046
Condition 870858 3 6.314 0.000
\(\quad X_1\) 12.0 14343 1 0.312 0.577
\(\quad X_2\) 90.5 819158 1 17.818 0.000
\(\quad X_3\) 18.8 35410 1 0.770 0.381
Error 18206049 396

The estimate of the intercept is positive and differs significantly from 0. Remember that in a model with sum-to-zero contrasts, the intercept reflects the grand mean (average of group means). Thus, we can conclude that, on average, participants were quicker in making approach than in making avoid responses. The scale of the dependent variable (\(\texttt{ApproachAdvantage}\)) is in milliseconds, so approach responses were on average 21.5 milliseconds faster than avoid responses. The omnibus test of Condition is significant, indicating that the mean of at least one condition differs from that of another. Each individual contrast represents the difference between one combination of two conditions and another combination of two conditions. Only the effect of \(X_2\) (the predictor corresponding to contrast \(c_2\)) is significant. This indicates that when the experimenter believed a participant was in the high-power condition, the average ApproachAdvantage score was 90.5 milliseconds larger than when the experimenter believed a participant was in the low-power condition. Interestingly, the effect of \(X_1\), the contrast-coded predictor corresponding to \(c_1\) is not significant. Hence, we can not reject the null hypothesis that the difference in approach advantage between the high-power and low-power primes is 0. We have thus not found evidence for the social-priming hypothesis. The results are in line however with the experimenter-belief hypothesis.

8.2 Factorial designs

In the above, we have pretty much conducted the analysis that answered the interesting questions for the study. In doing so, we treated the study consisting of four conditions, without any additional structure. But considering the design of the study, you might realise that the conditions consisted of two manipulations: the prime given to participants (a low-power or high-power prime) and the belief given to the experimenters (whether the participants was supposedly given the low-power or high-power prime). The conditions in the experiment consisted of all four (\(2 \times 2\)) possible combinations of these two manipulations (PL-EL, PL-EH, PH-EL, and PH-EH). An experimental design with all combinations of different manipulations is called a factorial design.

8.2.1 Main effects and interactions

When a study has a factorial design, we can think of the treatment effects in a slightly different manner than when treating the conditions as part of a oneway design. In our oneway contrast coding scheme above, we have actually already done this: in the first contrast (\(c_1\)), we chose to group the two “PL” conditions together, and the “PH” conditions together. Similarly, we chose to group the “EL” conditions together, and the “EH” conditions together, in the second contrast (\(c_2\)). By doing so, we have considered what are called the main effects of the two factors (priming, and experimenter belief). The main effect of the priming manipulation reflects the effect of giving participants a low-power prime compared to a high-power prime, averaging over the levels of experimenter belief. The main effect of experimenter belief reflects the effect of making the experimenter believe participants were given a low-power prime compared to a high-power prime, averaging over the levels of the power prime. These means, where we average over all levels of other factors, are also called marginal means. For example, the marginal mean of the low-power prime conditions, which can be denoted as \(\mu_{\text{PL},\cdot}\) (the \(\cdot\) symbol reflects that we are averaging over the levels of the second factor), is simply \(\mu_{\text{PL},\cdot} = \frac{\mu_{\text{PL}, \text{EL}} + \mu_{\text{PL},\text{EH}}}{2}\). The contrast \(c_1\) reflects the difference between these marginal means: \(\mu_{\text{PH},\cdot} - \mu_{\text{PL},\cdot}\). Similarly, the second contrast, which encodes the main effect of Belief, is the difference between the marginal means \(\mu_{\cdot,\text{EH}} - \mu_{\cdot,\text{EL}}\).

Previously, we defined the remaining contrast code (\(c_3\)) as that code which would complete the set of orthogonal contrast codes. When treating the design as a factorial, rather than oneway design, we can take a different approach. The two contrast codes \(c_1\) and \(c_2\), and their corresponding predictors (\(X_1\) and \(X_2\)), reflect independent effects of the power prime and experimenter belief manipulations, respectively. If we had a model with only these two predictors, we would assume the slope of each predictor is the same, no matter the value of the other predictor. That implies that the effect of the power prime is assumed to be the the same, no matter what the belief of the experimenter is. Similarly, the effect of the experimenter belief is assumed the same, no matter which power prime was presented. That is quite a strong assumption. What if the effect of the power prime depends on the belief of the experimenter, and what if the effect of the experimenter belief depends on the power prime? This implies that the effect of power prime is moderated by experimenter belief (and vice versa).

It is important to realise that, once we have constructed contrast-coding predictors, we can treat the model as any other multiple regression model. Hence, we can investigate whether effects are moderated by adding a product predictor to our model, i.e. \((X_1 \times X_2)_i\), just like we would in a multiple regression model. Such a product predictor is exactly the same as defining a contrast code \(c_3' = c_1 \times c_2\)! So, to investigate whether the effect of the power prime is moderated by experimenter belief, we should use a contrast code \(c_3' = c_1 \times c_2 = (\tfrac{1}{4},- \tfrac{1}{4}, -\tfrac{1}{4}, \tfrac{1}{4})\). Note that this is, apart from scaling, the same contrast code as we used before, i.e. \(c_3' = \tfrac{1}{2} \times c_{3}\). And, while scaling affects the value of the slope, it does not change the underlying relation between a predictor and the dependent variable. So the reduction in the Sum of Squared Error and the resulting null-hypothesis test, are exactly the same for a model based on \(c_3\) and a model based on \(c_3'\).

Before showing you that this is actually the case, let’s consider what we might expect from the slope of \(X_3'\) (the predictor corresponding to \(c_3'\)). Because \(c_3' = \tfrac{1}{2} \times c_{3}\), every one-unit increase in \(c_3\) corresponds to a half-unit increase in \(c_3'\). Conversely, that implies that every one-unit increase in \(c_3'\) corresponds to a two-unit increase in \(c_3\). So, what do you think the relation between the slope of \(X_3\) and \(X_{3}'\) would be?21

The results of estimating a model with this alternative version of the third contrast code (keeping the others the same) is given in Table 8.2. In the table, I have given the predictors more informative labels: \(X_1\) becomes \(\texttt{P}\) (for the main effect of Prime), \(X_2\) becomes \(\texttt{B}\) (for the main effect of experimenter Belief), and \(X_3'\) becomes \(\texttt{P} \times \texttt{B}\) (for the interaction between Prime and Belief). I have also omitted the omnibus test for Condition. In a factorial ANOVA, main effects and interaction effects can be omnibus tests themselves (we will see an example of this later). Besides these stylistic changes, the only real difference in these new results is the estimate of the slope of \(\texttt{P} \times \texttt{B}\), which is twice the value of the corresponding slope of \(X_3\) in Table 8.1.

Table 8.2: Linear model predicting \(\texttt{ApproachAdvantage}\) by factorial contrast-coded predictors.
\(\hat{\beta}\) \(\text{SS}\) \(\text{df}\) \(F\) \(p(\geq \lvert F \rvert)\)
Intercept 21.5 184285 1 4.008 0.046
\(\texttt{P}\) 12.0 14343 1 0.312 0.577
\(\texttt{B}\) 90.5 819158 1 17.818 0.000
\(\texttt{P} \times \texttt{B}\) 37.6 35410 1 0.770 0.381
Error 18206049 396

At this point, you might wonder what is special about factorial designs and the way they are implemented in linear models. As the previous discussion indicates, in some sense, factorial designs are not special at all. If you have one experimental manipulation \(A\) (e.g. power priming) with a total of \(g_A\) levels \(a_1, \ldots, a_{g_A}\), and another manipulation \(B\) (e.g. experimenter belief) with a total of \(g_B\) levels \(b_1, \ldots, b_{L_B}\), and the experiment crosses all these levels as \((a_1 \text{ and } b_1), (a_1 \text{ and } b_2), \ldots, (a_2 \text{ and } b_1), \ldots, (a_{g_A} \text{ and } b_{g_B})\), then in the end, you will end up with an experiment that has a total of \(g = g_A \times g_B\) conditions. It is up to you how you treat these conditions. There is nothing inherently wrong with ignoring the factorial nature of a design, and analysing it as a oneway design. Key is to come up with contrasts that test interesting hypotheses. Often, these contrasts will involve comparisons between levels of one manipulation, whilst averaging over the levels of another manipulation. This then naturally results in treating the design as factorial.

8.3 The factorial ANOVA model

An alternative, more traditional way of specifying a factorial ANOVA is in terms of a grand mean and treatment effects. This is analogous to the oneway ANOVA model of Equation (7.2). Let’s consider a case with two experimental factors, \(A\) and \(B\), and let \(Y_{i,j,k}\) denote an observation for person \(i\) at level \(j\) of the first factor (\(A\)) and level \(k\) of the second factor (\(B\)). We can state the factorial ANOVA model as \[\begin{equation} Y_{i,j,k} = \mu + \tau^{(A)}_j + \tau^{(B)}_k + \tau^{(A \times B)}_{j,k} + \epsilon_{i,j,k} \quad \quad \epsilon_{j,i} \sim \textbf{Normal}(0, \sigma_\epsilon) \tag{8.1} \end{equation}\] Here, as usual, \(\mu\) denotes the grand mean, which is the average of all means. \(\tau^{(A)}_j\) is the treatment effect of level \(j\) of factor \(A\): \[\tau^{(A)}_j = \mu_{j,\cdot} - \mu\] which is the difference between marginal mean \(\mu_{j,\cdot}\) and the grand mean \(\mu\). Similarly, \(\tau^{(B)}_k\) the treatment effect of level \(k\) of factor \(B\): \[\tau^{(B)}_k = \mu_{\cdot,k} - \mu\] which is the difference between marginal mean \(\mu_{\cdot,k}\) and the grand mean \(\mu\). Finally, \(\tau{(A \times B)}_{j,k}\) is the interaction effect: \[\tau^{(A \times B)}_{j,k} = \mu_{j,k} - (\mu + \tau^{(A)}_j + \tau^{(B)}_k)\] i.e. the difference between the true mean \(\mu_{j,k}\) at level \(j\) of factor \(A\) and level \(k\) of factor \(B\), and the predicted mean by adding treatment effects \(\tau^{(A)}_j\) and \(\tau^{(B)}_k\), which is \(\mu + \tau^{(A)}_j + \tau^{(B)}_k\).

8.4 A threeway factorial ANOVA

One benefit of treating a factorial design as a factorial design, is that you only need to worry about defining contrast codes for the main effects of the different manipulations. If you choose orthogonal contrasts for these, then the remaining contrasts for the full factorial design are simple to work out. They are interactions and computed as pairwise products of contrasts.

To see how this works in a more complex situation, let’s investigate whether the identity of the experimenter might also play a role in determining the approach advantage scores. In the study, there were four experimenters (the research assistants). Let’s treat the identity of the experimenter as another factor in the design. A plot of the data, separated by experimenter, is provided in Figure 8.2. As you can see there, while the first three experimenters show the same overall pattern as in Figure 8.1, with higher scores for the “EH” conditions than the “EL” conditions, this is not so obvious for Experimenter 4.

Approach advantage scores separated by condition and experimenter

Figure 8.2: Approach advantage scores separated by condition and experimenter

Treating experimenter as another factor, we now have a 2 (Prime: low power vs high power) by 2 (experimenter Belief: low-power prime vs high-power prime) by 4 (Experimenter: 1, 2, 3, or 4) factorial design, with a total of \(2 \times 2 \times 4 = 16\) conditions. We thus need a total of \(g-1 = 16 - 1 = 15\) contrast codes. That is a lot! We will start by defining suitable contrast codes for the main effect of each of the three factors (manipulations). We have already done this for the first two factors. For prime-condition, we can assign a value of \(-\frac{1}{2}\) to all conditions with a lower-power prime, and a value of \(\tfrac{1}{2}\) to all conditions with a high-power prime. Thus, using a more informative label than \(c_1\), the contrast code for Prime condition is \(\texttt{P} = (-\tfrac{1}{2},\tfrac{1}{2})\). Similarly, to code for experimenter belief, we can assign a value of \(-\tfrac{1}{2}\) to all conditions where the experimenter believed participants were in the low-power prime condition, and a value of \(\tfrac{1}{2}\) for the conditions where the experimenter believed participants were assigned a high-power prime, i.e. \(\texttt{B} = (-\tfrac{1}{2},\tfrac{1}{2})\). Experimenter is a factor with four levels, and hence we need three contrast codes. As I don’t have a particular a priori hypothesis about which experimenters might differ from others, we can use one of the default orthogonal contrast codes. For instance, we can use a Helmert coding scheme, with contrasts \(\texttt{E}_1 = (-\tfrac{1}{2}, \tfrac{1}{2}, 0, 0)\), \(\texttt{E}_2 = (-\tfrac{1}{3}, -\tfrac{1}{3}, \tfrac{2}{3}, 0)\), and \(\texttt{E}_3 = (-\tfrac{1}{4}, -\tfrac{1}{4}, -\tfrac{1}{4}, \tfrac{3}{4})\). We have now defined \(1 + 1 + 3 = 5\) contrast codes. The remaining 10 contrasts are easy to determine. First, we will construct codes for the moderation of \(\texttt{P}\) by \(\texttt{B}\), by computing a product contrast (multiplying the values of \(\texttt{P}\) and \(\texttt{B}\) for all 16 conditions). We would do the same for the moderation of the priming effect by experimenter, which consists of the pairwise products between \(\texttt{P}\) and \(\texttt{E}_1\), \(\texttt{P}\) and \(\texttt{E}_2\), \(\texttt{P}\) and \(\texttt{E}_3\). The moderation of the effect of experimenter belief by experimenter consists of the pairwise products of \(\texttt{B}\) and \(\texttt{E}_1\), \(\texttt{B}\) and \(\texttt{E}_2\), and \(\texttt{B}\) and \(\texttt{E}_3\). We now have in total 7 contrast codes, each of which reflects a pairwise interaction. Great! The main effects and pairwise interactions provide us with \(5+7 = 12\) of the required 15 contrast codes. The final three can be constructed as threeway interactions, i.e. as the products of three contrasts: \(\texttt{P} \times \texttt{B} \times \texttt{E}_1\), \(\texttt{P} \times \texttt{B} \times \texttt{E}_2\), and \(\texttt{P} \times \texttt{B} \times \texttt{E}_3\). You can view these threeway interactions as a form of “moderated moderations”. For example, does the moderation of the effect of power prime by experimenter belief depend on the identity of the experimenter? We will come back to the interpretation of this shortly.

The full set of 15 contrast codes, with values for all 16 conditions, is given in the (rather large) Table 8.3.
Table 8.3: A set of 15 orthogonal contrast codes for the Experimenter Belief study.
\(\texttt{P}\) \(\texttt{B}\) \(\texttt{E}_1\) \(\texttt{E}_2\) \(\texttt{E}_3\) \(\texttt{P} \times \texttt{B}\) \(\texttt{P} \times \texttt{E}_1\) \(\texttt{P} \times \texttt{E}_2\) \(\texttt{P} \times \texttt{E}_3\) \(\texttt{B} \times \texttt{E}_1\) \(\texttt{B} \times \texttt{E}_2\) \(\texttt{B} \times \texttt{E}_3\) \(\texttt{P} \times \texttt{B} \times \texttt{E}_1\) \(\texttt{P} \times \texttt{B} \times \texttt{E}_2\) \(\texttt{P} \times \texttt{B} \times \texttt{E}_3\)
PL,EL,E1 \(-\tfrac{1}{2}\) \(-\tfrac{1}{2}\) \(-\tfrac{1}{2}\) \(-\tfrac{1}{3}\) \(-\tfrac{1}{4}\) \(-\tfrac{1}{2}\times-\tfrac{1}{2}\) \(-\tfrac{1}{2}\times-\tfrac{1}{2}\) \(-\tfrac{1}{2}\times-\tfrac{1}{3}\) \(-\tfrac{1}{2}\times-\tfrac{1}{4}\) \(-\tfrac{1}{2}\times-\tfrac{1}{2}\) \(-\tfrac{1}{2}\times-\tfrac{1}{3}\) \(-\tfrac{1}{2}\times-\tfrac{1}{4}\) \(-\tfrac{1}{2}\times-\tfrac{1}{2}\times-\tfrac{1}{2}\) \(-\tfrac{1}{2}\times-\tfrac{1}{2}\times-\tfrac{1}{3}\) \(-\tfrac{1}{2}\times-\tfrac{1}{2}\times-\tfrac{1}{4}\)
PL,EL,E2 \(-\tfrac{1}{2}\) \(-\tfrac{1}{2}\) \(\tfrac{1}{2}\) \(-\tfrac{1}{3}\) \(-\tfrac{1}{4}\) \(-\tfrac{1}{2}\times-\tfrac{1}{2}\) \(-\tfrac{1}{2}\times\tfrac{1}{2}\) \(-\tfrac{1}{2}\times-\tfrac{1}{3}\) \(-\tfrac{1}{2}\times-\tfrac{1}{4}\) \(-\tfrac{1}{2}\times\tfrac{1}{2}\) \(-\tfrac{1}{2}\times-\tfrac{1}{3}\) \(-\tfrac{1}{2}\times-\tfrac{1}{4}\) \(-\tfrac{1}{2}\times-\tfrac{1}{2}\times\tfrac{1}{2}\) \(-\tfrac{1}{2}\times-\tfrac{1}{2}\times-\tfrac{1}{3}\) \(-\tfrac{1}{2}\times-\tfrac{1}{2}\times-\tfrac{1}{4}\)
PL,EL,E3 \(-\tfrac{1}{2}\) \(-\tfrac{1}{2}\) \(0\) \(\tfrac{2}{3}\) \(-\tfrac{1}{4}\) \(-\tfrac{1}{2}\times-\tfrac{1}{2}\) \(-\tfrac{1}{2}\times0\) \(-\tfrac{1}{2}\times\tfrac{2}{3}\) \(-\tfrac{1}{2}\times-\tfrac{1}{4}\) \(-\tfrac{1}{2}\times0\) \(-\tfrac{1}{2}\times\tfrac{2}{3}\) \(-\tfrac{1}{2}\times-\tfrac{1}{4}\) \(-\tfrac{1}{2}\times-\tfrac{1}{2}\times0\) \(-\tfrac{1}{2}\times-\tfrac{1}{2}\times\tfrac{2}{3}\) \(-\tfrac{1}{2}\times-\tfrac{1}{2}\times-\tfrac{1}{4}\)
PL,EL,E4 \(-\tfrac{1}{2}\) \(-\tfrac{1}{2}\) \(0\) \(0\) \(\tfrac{3}{4}\) \(-\tfrac{1}{2}\times-\tfrac{1}{2}\) \(-\tfrac{1}{2}\times0\) \(-\tfrac{1}{2}\times0\) \(-\tfrac{1}{2}\times\tfrac{3}{4}\) \(-\tfrac{1}{2}\times0\) \(-\tfrac{1}{2}\times0\) \(-\tfrac{1}{2}\times\tfrac{3}{4}\) \(-\tfrac{1}{2}\times-\tfrac{1}{2}\times0\) \(-\tfrac{1}{2}\times-\tfrac{1}{2}\times0\) \(-\tfrac{1}{2}\times-\tfrac{1}{2}\times\tfrac{3}{4}\)
PL,EH,E1 \(-\tfrac{1}{2}\) \(\tfrac{1}{2}\) \(-\tfrac{1}{2}\) \(-\tfrac{1}{3}\) \(-\tfrac{1}{4}\) \(-\tfrac{1}{2}\times\tfrac{1}{2}\) \(-\tfrac{1}{2}\times-\tfrac{1}{2}\) \(-\tfrac{1}{2}\times-\tfrac{1}{3}\) \(-\tfrac{1}{2}\times-\tfrac{1}{4}\) \(\tfrac{1}{2}\times-\tfrac{1}{2}\) \(\tfrac{1}{2}\times-\tfrac{1}{3}\) \(\tfrac{1}{2}\times-\tfrac{1}{4}\) \(-\tfrac{1}{2}\times\tfrac{1}{2}\times-\tfrac{1}{2}\) \(-\tfrac{1}{2}\times\tfrac{1}{2}\times-\tfrac{1}{3}\) \(-\tfrac{1}{2}\times\tfrac{1}{2}\times-\tfrac{1}{4}\)
PL,EH,E2 \(-\tfrac{1}{2}\) \(\tfrac{1}{2}\) \(\tfrac{1}{2}\) \(-\tfrac{1}{3}\) \(-\tfrac{1}{4}\) \(-\tfrac{1}{2}\times\tfrac{1}{2}\) \(-\tfrac{1}{2}\times\tfrac{1}{2}\) \(-\tfrac{1}{2}\times-\tfrac{1}{3}\) \(-\tfrac{1}{2}\times-\tfrac{1}{4}\) \(\tfrac{1}{2}\times\tfrac{1}{2}\) \(\tfrac{1}{2}\times-\tfrac{1}{3}\) \(\tfrac{1}{2}\times-\tfrac{1}{4}\) \(-\tfrac{1}{2}\times\tfrac{1}{2}\times\tfrac{1}{2}\) \(-\tfrac{1}{2}\times\tfrac{1}{2}\times-\tfrac{1}{3}\) \(-\tfrac{1}{2}\times\tfrac{1}{2}\times-\tfrac{1}{4}\)
PL,EH,E3 \(-\tfrac{1}{2}\) \(\tfrac{1}{2}\) \(0\) \(\tfrac{2}{3}\) \(-\tfrac{1}{4}\) \(-\tfrac{1}{2}\times\tfrac{1}{2}\) \(-\tfrac{1}{2}\times0\) \(-\tfrac{1}{2}\times\tfrac{2}{3}\) \(-\tfrac{1}{2}\times-\tfrac{1}{4}\) \(\tfrac{1}{2}\times0\) \(\tfrac{1}{2}\times\tfrac{2}{3}\) \(\tfrac{1}{2}\times-\tfrac{1}{4}\) \(-\tfrac{1}{2}\times\tfrac{1}{2}\times0\) \(-\tfrac{1}{2}\times\tfrac{1}{2}\times\tfrac{2}{3}\) \(-\tfrac{1}{2}\times\tfrac{1}{2}\times-\tfrac{1}{4}\)
PL,EH,E4 \(-\tfrac{1}{2}\) \(\tfrac{1}{2}\) \(0\) \(0\) \(\tfrac{3}{4}\) \(-\tfrac{1}{2}\times\tfrac{1}{2}\) \(-\tfrac{1}{2}\times0\) \(-\tfrac{1}{2}\times0\) \(-\tfrac{1}{2}\times\tfrac{3}{4}\) \(\tfrac{1}{2}\times0\) \(\tfrac{1}{2}\times0\) \(\tfrac{1}{2}\times\tfrac{3}{4}\) \(-\tfrac{1}{2}\times\tfrac{1}{2}\times0\) \(-\tfrac{1}{2}\times\tfrac{1}{2}\times0\) \(-\tfrac{1}{2}\times\tfrac{1}{2}\times\tfrac{3}{4}\)
PH,EL,E1 \(\tfrac{1}{2}\) \(-\tfrac{1}{2}\) \(-\tfrac{1}{2}\) \(-\tfrac{1}{3}\) \(-\tfrac{1}{4}\) \(\tfrac{1}{2}\times-\tfrac{1}{2}\) \(\tfrac{1}{2}\times-\tfrac{1}{2}\) \(\tfrac{1}{2}\times-\tfrac{1}{3}\) \(\tfrac{1}{2}\times-\tfrac{1}{4}\) \(-\tfrac{1}{2}\times-\tfrac{1}{2}\) \(-\tfrac{1}{2}\times-\tfrac{1}{3}\) \(-\tfrac{1}{2}\times-\tfrac{1}{4}\) \(\tfrac{1}{2}\times-\tfrac{1}{2}\times-\tfrac{1}{2}\) \(\tfrac{1}{2}\times-\tfrac{1}{2}\times-\tfrac{1}{3}\) \(\tfrac{1}{2}\times-\tfrac{1}{2}\times-\tfrac{1}{4}\)
PH,EL,E2 \(\tfrac{1}{2}\) \(-\tfrac{1}{2}\) \(\tfrac{1}{2}\) \(-\tfrac{1}{3}\) \(-\tfrac{1}{4}\) \(\tfrac{1}{2}\times-\tfrac{1}{2}\) \(\tfrac{1}{2}\times\tfrac{1}{2}\) \(\tfrac{1}{2}\times-\tfrac{1}{3}\) \(\tfrac{1}{2}\times-\tfrac{1}{4}\) \(-\tfrac{1}{2}\times\tfrac{1}{2}\) \(-\tfrac{1}{2}\times-\tfrac{1}{3}\) \(-\tfrac{1}{2}\times-\tfrac{1}{4}\) \(\tfrac{1}{2}\times-\tfrac{1}{2}\times\tfrac{1}{2}\) \(\tfrac{1}{2}\times-\tfrac{1}{2}\times-\tfrac{1}{3}\) \(\tfrac{1}{2}\times-\tfrac{1}{2}\times-\tfrac{1}{4}\)
PH,EL,E3 \(\tfrac{1}{2}\) \(-\tfrac{1}{2}\) \(0\) \(\tfrac{2}{3}\) \(-\tfrac{1}{4}\) \(\tfrac{1}{2}\times-\tfrac{1}{2}\) \(\tfrac{1}{2}\times0\) \(\tfrac{1}{2}\times\tfrac{2}{3}\) \(\tfrac{1}{2}\times-\tfrac{1}{4}\) \(-\tfrac{1}{2}\times0\) \(-\tfrac{1}{2}\times\tfrac{2}{3}\) \(-\tfrac{1}{2}\times-\tfrac{1}{4}\) \(\tfrac{1}{2}\times-\tfrac{1}{2}\times0\) \(\tfrac{1}{2}\times-\tfrac{1}{2}\times\tfrac{2}{3}\) \(\tfrac{1}{2}\times-\tfrac{1}{2}\times-\tfrac{1}{4}\)
PH,EL,E4 \(\tfrac{1}{2}\) \(-\tfrac{1}{2}\) \(0\) \(0\) \(\tfrac{3}{4}\) \(\tfrac{1}{2}\times-\tfrac{1}{2}\) \(\tfrac{1}{2}\times0\) \(\tfrac{1}{2}\times0\) \(\tfrac{1}{2}\times\tfrac{3}{4}\) \(-\tfrac{1}{2}\times0\) \(-\tfrac{1}{2}\times0\) \(-\tfrac{1}{2}\times\tfrac{3}{4}\) \(\tfrac{1}{2}\times-\tfrac{1}{2}\times0\) \(\tfrac{1}{2}\times-\tfrac{1}{2}\times0\) \(\tfrac{1}{2}\times-\tfrac{1}{2}\times\tfrac{3}{4}\)
PH,EH,E1 \(\tfrac{1}{2}\) \(\tfrac{1}{2}\) \(-\tfrac{1}{2}\) \(-\tfrac{1}{3}\) \(-\tfrac{1}{4}\) \(\tfrac{1}{2}\times\tfrac{1}{2}\) \(\tfrac{1}{2}\times-\tfrac{1}{2}\) \(\tfrac{1}{2}\times-\tfrac{1}{3}\) \(\tfrac{1}{2}\times-\tfrac{1}{4}\) \(\tfrac{1}{2}\times-\tfrac{1}{2}\) \(\tfrac{1}{2}\times-\tfrac{1}{3}\) \(\tfrac{1}{2}\times-\tfrac{1}{4}\) \(\tfrac{1}{2}\times\tfrac{1}{2}\times-\tfrac{1}{2}\) \(\tfrac{1}{2}\times\tfrac{1}{2}\times-\tfrac{1}{3}\) \(\tfrac{1}{2}\times\tfrac{1}{2}\times-\tfrac{1}{4}\)
PH,EH,E2 \(\tfrac{1}{2}\) \(\tfrac{1}{2}\) \(\tfrac{1}{2}\) \(-\tfrac{1}{3}\) \(-\tfrac{1}{4}\) \(\tfrac{1}{2}\times\tfrac{1}{2}\) \(\tfrac{1}{2}\times\tfrac{1}{2}\) \(\tfrac{1}{2}\times-\tfrac{1}{3}\) \(\tfrac{1}{2}\times-\tfrac{1}{4}\) \(\tfrac{1}{2}\times\tfrac{1}{2}\) \(\tfrac{1}{2}\times-\tfrac{1}{3}\) \(\tfrac{1}{2}\times-\tfrac{1}{4}\) \(\tfrac{1}{2}\times\tfrac{1}{2}\times\tfrac{1}{2}\) \(\tfrac{1}{2}\times\tfrac{1}{2}\times-\tfrac{1}{3}\) \(\tfrac{1}{2}\times\tfrac{1}{2}\times-\tfrac{1}{4}\)
PH,EH,E3 \(\tfrac{1}{2}\) \(\tfrac{1}{2}\) \(0\) \(\tfrac{2}{3}\) \(-\tfrac{1}{4}\) \(\tfrac{1}{2}\times\tfrac{1}{2}\) \(\tfrac{1}{2}\times0\) \(\tfrac{1}{2}\times\tfrac{2}{3}\) \(\tfrac{1}{2}\times-\tfrac{1}{4}\) \(\tfrac{1}{2}\times0\) \(\tfrac{1}{2}\times\tfrac{2}{3}\) \(\tfrac{1}{2}\times-\tfrac{1}{4}\) \(\tfrac{1}{2}\times\tfrac{1}{2}\times0\) \(\tfrac{1}{2}\times\tfrac{1}{2}\times\tfrac{2}{3}\) \(\tfrac{1}{2}\times\tfrac{1}{2}\times-\tfrac{1}{4}\)
PH,EH,E4 \(\tfrac{1}{2}\) \(\tfrac{1}{2}\) \(0\) \(0\) \(\tfrac{3}{4}\) \(\tfrac{1}{2}\times\tfrac{1}{2}\) \(\tfrac{1}{2}\times0\) \(\tfrac{1}{2}\times0\) \(\tfrac{1}{2}\times\tfrac{3}{4}\) \(\tfrac{1}{2}\times0\) \(\tfrac{1}{2}\times0\) \(\tfrac{1}{2}\times\tfrac{3}{4}\) \(\tfrac{1}{2}\times\tfrac{1}{2}\times0\) \(\tfrac{1}{2}\times\tfrac{1}{2}\times0\) \(\tfrac{1}{2}\times\tfrac{1}{2}\times\tfrac{3}{4}\)

Estimating the corresponding linear model with 15 predictors gives the results in Table 8.4.

Table 8.4: Linear model predicting \(\texttt{ApproachAdvantage}\) by factorial contrast-coded predictors.
\(\hat{\beta}\) \(\text{SS}\) \(\text{df}\) \(F\) \(p(\geq \lvert F \rvert)\)
Intercept -23.82 1.13e+05 1 2.4 0.119
Prime (\(\texttt{P}\)) -6.78 2.29e+03 1 0.0 0.824
Belief (\(\texttt{B}\)) 90.44 8.18e+05 1 17.7 0.000
Experimenter 3.84e+05 3 2.8 0.041
\(\quad \texttt{E}_1\) 1.32 4.29e+01 1 0.0 0.976
\(\quad \texttt{E}_2\) -10.60 3.74e+03 1 0.1 0.776
\(\quad \texttt{E}_3\) 100.74 3.80e+05 1 8.2 0.004
Prime \(\times\) Belief 37.38 3.49e+04 1 0.8 0.385
Prime \(\times\) Experimenter 7.36e+04 3 0.5 0.661
\(\quad \texttt{P} \times \texttt{E}_1\) -84.50 4.42e+04 1 1.0 0.329
\(\quad \texttt{P} \times \texttt{E}_2\) 5.82 2.82e+02 1 0.0 0.938
\(\quad \texttt{P} \times \texttt{E}_3\) -55.56 2.89e+04 1 0.6 0.429
Belief \(\times\) Experimenter 2.62e+05 3 1.9 0.130
\(\quad \texttt{B} \times \texttt{E}_1\) -17.45 3.81e+03 1 0.1 0.774
\(\quad \texttt{B} \times \texttt{E}_2\) 11.06 2.04e+03 1 0.0 0.834
\(\quad \texttt{B} \times \texttt{E}_3\) -116.86 2.56e+05 1 5.5 0.019
Prime \(\times\) Belief \(\times\) Experimenter 2.95e+04 3 0.2 0.887
\(\quad \texttt{P} \times \texttt{B} \times \texttt{E}_1\) 48.35 7.30e+03 1 0.2 0.691
\(\quad \texttt{P} \times \texttt{B} \times \texttt{E}_2\) -8.12 2.74e+02 1 0.0 0.939
\(\quad \texttt{P} \times \texttt{B} \times \texttt{E}_3\) 68.24 2.18e+04 1 0.5 0.492
Error 1.77e+07 384

I admit the number of tested effects is large and the results mind-boggling at first. First, note that the main effect of “Experimenter” is entered in the table as an omnibus test, as well as the test for the individual contrasts \((\texttt{E}_1\), \(\texttt{E}_2\), and \(\texttt{E}_3)\). In a conventional ANOVA table, you would only see the omnibus test. But as the individual contrasts are informative, I like to include the tests for each individual contrast as well. The omnibus test listed under “Experimenter” is a test that the slope of \(\texttt{E}_1\), \(\texttt{E}_2\), and \(\texttt{E}_3\) are all equal to 0. This is based on a model comparison between a MODEL G which includes all 16 parameters (intercept and the slopes of all 15 predictors) and a MODEL R which fixes the slopes of \(\texttt{E}_1\), \(\texttt{E}_2\), and \(\texttt{E}_3\) to 0. Note that this MODEL R still includes the slopes of the various product-predictors such as \(\texttt{P} \times \texttt{E}_1\). The omnibus test is significant, which indicates that, aggregating over the levels of Prime and Belief, at least one of the experimenters differs from one other one. The tests of the individual contrasts show a significant effect of \(\texttt{E}_3\), which compares experimenter 4 to the average of experimenters 1, 2, and 3. The estimated slope is positive, which indicates that the approach advantage scores are generally higher for this experimenter than for the other ones. When you inspect Figure 8.2, you can see this: whilst for all other experimenters, average scores in the “EL” conditions are negative and positive in the “EH” conditions, the averages for Experimenter 4 are always positive, regardless of the level of Belief or Prime.

8.4.1 Interpreting interactions

Aggregating over experimenters (i.e., looking at the main effect of Belief), we still obtain a significant effect of experimenter Belief, as we did before. However, the significant effect of the \(\texttt{B} \times \texttt{E}_3\) contrast indicates that the effect of experimenter belief is different for Experimenter 4 compared to the other three experimenters. Let’s consider this more carefully. The slope of \(\texttt{B}\) is estimated as \(\hat{\beta}_\texttt{B} = 90.4\). This simple slope represents the difference between the “experimenter believes high prime” and “experimenter believes low prime” conditions, when all other predictors which moderate this effect equal 0. When using orthogonal “sum-to-zero” contrasts, the simple slope can be thought of as reflecting the average effect of belief over all levels of the other factors in the design (prime condition, and experimenter). So, aggregating over all other conditions, the approach advantage score is 90.4 milliseconds larger when the experimenter believed a participant was given a high-power prime compared to when the experimenter believed the participant received a low-power prime.

As an example of how to interpret an interaction, we will focus on the significant \(\texttt{B} \times \texttt{E}_3\) contrast. Using the slope of this interaction, we can work out the predicted slope of \(\texttt{B}\) for Experimenter 4 as \[\hat{\beta}_{\texttt{B}|\text{Experimenter 4}} = 90.4 + \tfrac{3}{4} \times (-117) = 2.79\] As the scale of the dependent variable is in milliseconds, this seems a rather negligible effect. For the other three experimenters, on average, the predicted slope is \[\hat{\beta}_{\texttt{B}|\text{Experimenter 1, 2, or 3}} = 90.4 + (- \tfrac{1}{4})\times (-117) = 120\] which is more substantial. Apparently, Experimenter 4 was “immune” to the experimenter belief manipulation, whilst the other three experimenters were not.

Note that the omnibus test of the interaction between Belief and Experimenter is not significant, while the test of the \(\texttt{B} \times \texttt{E}_3\) is. It is not uncommon for an individual contrast to be significant while an omnibus test is not. When only a single contrast has a sizeable effect (i.e. it provides a substantial reduction in the Sum of Squared Error), the omnibus test effectively divides the RSS attributable to that contrast over all contrasts that are part of the omnibus test. The omnibus test then has less power than a test of the individual contrast.

None of the other effects are significant. If the threeway interaction were significant, we’d have the tricky task to interpret this. Interpreting threeway interactions is not impossible, but it does require effort. For example, imagine that the \(\texttt{P} \times \texttt{B} \times \texttt{E}_3\) interaction was in fact significant. One way to interpret this is as a moderation of a moderation. We have already interpreted the \(\texttt{B} \times \texttt{E}_3\) interaction as indicating that the effect of Belief is reduced for Experimenter 4 compared to the other experimenters. The \(\texttt{P} \times \texttt{B} \times \texttt{E}_3\) is estimated to be positive. That indicates that this reduction in the effect of Belief is smaller for those participants who received a high-power prime, as compared to a low-power prime. To see this, we can use the same method to determine conditional slopes we have used before, but now looking at the conditional slope of the \(\texttt{B} \times \texttt{E}_3\) interaction, for the different levels of Prime. For the high-power prime conditions, we can determine this interaction slope as \[\hat{\beta}_{\texttt{B} \times \texttt{E}_3|\text{high-power prime}} = -117 + \tfrac{1}{2} \times (68.2) = -82.7\] For the low-power prime conditions, the slope is \[\hat{\beta}_{\texttt{B} \times \texttt{E}_3|\text{low-power prime}} = -117 - \tfrac{1}{2} \times (68.2) = -151\] We can thus conclude that the moderation of the effect of belief by experimenter (4 vs 1, 2, or 3) is larger for those participants who received a low-power prime, as compared to those who received a high-power prime.

We can use these conditional interaction slopes in the same way as before to work out the effect of belief for Experimenter 4 and participants who received a high-power prime: \[\hat{\beta}_{\texttt{B}|\text{Experimenter 4 and high-power prime}} = 90.4 + \tfrac{3}{4} \times (-82.7) = 28.4\] Similarly, the effect of belief for Experimenter 4 and participants who received a low-power prime is: \[\hat{\beta}_{\texttt{B}|\text{Experimenter 4 and low-power prime}} = 90.4 + \tfrac{3}{4} \times (-151) = -22.8\]

8.5 Orthogonal contrast codes and unequal sample sizes

Using orthogonal contrast codes in factorial designs generally leads to interpretable parameters. While interaction effects can be difficult to interpret initially, with practice, you will become better at this. Another benefit is that, if the conditions have equal sample sizes, the contrast-coded predictors will be independent. But when the sample sizes are unequal, this independence between contrast-coded predictors does not hold, even if the contrast codes are orthogonal.

To keep the following discussion relatively straightforward, let’s go back to the two-way factorial ANOVA where we look at the effect of prime condition (\(\texttt{P}\)) and experimenter belief (\(\texttt{B}\)). To analyse this, we used a linear model \[\text{MODEL G:} \quad \texttt{ApproachAdvantage}_i = \beta_0 + \beta_\texttt{P} \times \texttt{P}_i + \beta_\texttt{B} \times \texttt{B}_i + \beta_{\texttt{P} \times \texttt{B}} \times (\texttt{P} \times \texttt{B})_i + \epsilon_i\] When all four conditions have equal sample sizes, i.e. \(n_{\text{PL},\text{EL}} = n_{\text{PL},\text{EH}} = n_{\text{PH},\text{EL}} = n_{\text{PH},\text{EH}}\), then the three predictors \(\texttt{P}\), \(\texttt{B}\), and \((\texttt{P} \times \texttt{B})\) are independent. In a linear model with independent predictors, the estimated slope of one predictor, say \(\texttt{P}\), does not depend on the whether the model includes a second predictor (e.g. \(\texttt{B}\)) or not. For example, the slope \(\beta_\texttt{P}\) would be exactly the same in the model above and the model \[\text{MODEL R:} \quad \texttt{ApproachAdvantage}_i = \beta_0 + \beta_\texttt{P} \times \texttt{P}_i + \epsilon_i\] In both models, the estimated slope represents the difference \[\hat{\beta}_\texttt{P} = \frac{\mu_\text{PH,EL} + \mu_\text{PH,EH}}{2} - \frac{\mu_\text{PL,EL} + \mu_\text{PL,EH}}{2}\] When the sample sizes are unequal, however, this would only be the case for MODEL G above. Because of the resulting dependency between the predictors, the estimate of \(\hat{\beta}_\texttt{P}\) in MODEL R will be different. It will still represent a difference between averages, but these would now be weighted by sample size.

To make the example dramatic, I have removed participants randomly from each condition, such that the conditions have rather unequal sizes:
PL-EL PL-EH PH-EL PH-EH
\(n\) 20 40 60 80
mean 45.92 7.84 -59.72 77.04

Estimating MODEL G gives the following estimates \[\texttt{ApproachAdvantage}_i = 17.77 - 18.22 \times \texttt{P}_i + 49.34 \times \texttt{B}_i + 87.43 \times \texttt{}(\texttt{P}\times \texttt{B}){}_i + \hat{\epsilon}_i \] This shows that the estimated slope of \(\texttt{P}\) indeed equals \[\hat{\beta}_\texttt{P} = \frac{-59.72 + 77.04}{2} - \frac{45.92 + 7.836}{2} = -18.22\] For MODEL R, the estimate is \[\texttt{ApproachAdvantage}_i = 19.48 - 2.1 \times \texttt{P}_i + \hat{\epsilon}_i \] Obviously, this is different from the estimate in MODEL G. While the slope still reflects a difference between the high-power and low-power prime conditions, the averages of these conditions are weighted by the sample size as follows: \[\begin{aligned} \hat{\beta}_\texttt{P} &= \frac{n_{\text{PH,EL}} \times \mu_\text{PH,EL} + n_\text{PH,EH} \times \mu_\text{PH,EH}}{n_\text{PH,EL} + n_\text{PH,EH}} - \frac{n_\text{PL,EL} \times \mu_\text{PL,EL} + n_\text{PL,EH} \times \mu_\text{PL-EH}}{n_\text{PL,EL} + n_\text{PL,EH}} \\ &= \frac{60 \times -59.72 + 80 \times 77.04}{140} - \frac{20 \times 45.92 + 40 \times 7.836}{60} \\ &= -2.1 \end{aligned}\]

Why is this of importance? We’re generally interested in the estimates from MODEL G, and use a restricted MODEL R mainly for the purpose of conducting hypothesis tests. Well, the estimates of MODEL R become important when we consider different ways of conducting model comparisons to perform hypothesis tests.

8.5.1 Comparison schemes and SS types

When predictors are dependent, they are partially redundant. Going back to our discussion of multicollinearity, that means that the predictors in a model together can account for more of the variance of the dependent variable than the sum of their unique contributions. In Figure 8.3, as a whole, the model can account for a proportion \(B + C + D\), but the sum of the unique contributions is \(B + C\). This can result in a relative lack of power for the tests of main effects and interactions.

Partitioning the variance in a General Linear Model. Each circle represents the variance of a variable. Overlapping regions represent shared variability (e.g. covariance) between variables.

Figure 8.3: Partitioning the variance in a General Linear Model. Each circle represents the variance of a variable. Overlapping regions represent shared variability (e.g. covariance) between variables.

Up to now, we have performed hypothesis tests by comparing a full MODEL G to a restricted MODEL R where some of the effects in MODEL G are fixed to particular values (generally 0). This procedure is – in the context of the General Linear Model – called a Type 3 Sums of Squares procedure. The Sum of Squares attributed to each predictor or set of predictors reflect their unique contributions (i.e., regressions \(B\) and \(C\) in Figure 8.3). When there is redundancy, these unique contributions do not add up to the total SS that can be attributed to all predictors in the model (i.e. the reduction in the SSE comparing the full model to an intercept-only model). There are two alternative schemes which guarantee that the SS terms do add up to this total SS. These are, as you might have guessed, Type 1 and Type 2 SS procedures.

In the Type 1 SS procedure, also called sequential SS, you build up the model sequentially. You start by comparing a model which includes just the contrast-coded predictors for the main effect of one of the factors in the design (for instance \(\texttt{P}\) in the simple factorial design) to an intercept only model. The reduction in the SSE (e.g. \(B + D\) in Figure 8.3) is the SS assigned to that main effect. You then add all contrast-coded predictors for the main effect of the second factor in the design (e.g. \(\texttt{B}\)), and compare this more general model to the one of the previous step (e.g. the model with only the predictors for \(\texttt{P}\)). The reduction in SSE (e.g. \(C\) in Figure 8.3) is the SSR assigned to that second main effect. If there are more factors in the design, you would then add the contrast-coded predictors for another factor, and compute an SSR for this factor as the reduction in the SSE compared to the model defined in the step before, etc. Once you have done this for all the main effects, you would then continue this procedure for all interactions, until you arrive at the full model. Note that the model comparisons performed on the way are solely to compute SSR terms. You would not perform hypothesis tests at each step. Rather, each SSR term computed is entered in the usual formula to compute the \(F\) statistic, where you’d use the SSE of the full MODEL G: \[F = \frac{\text{SSR}/(\text{npar}(G) - \text{npar}(R))}{\text{SSE}(G)/(n-\text{npar}(G))}\] In this computation, all elements are the same as for the usual Type 3 procedure, apart from SSR, which is the one computed with the sequential procedure.

While this procedure has the benefit of ensuring that all the SSR terms add up to the total SSR, it is important to note that the hypotheses tested are not necessarily those that you expect to test. As was shown earlier, the slope of contrast-coded predictor \(\texttt{P}\), when the only predictor in the model, reflects the sample-size weighted difference between the means. The omnibus test for the main effect that is entered first is thus a test that all these sample-size weighted means are equal to each other, it is not a test that the unweighted means are all equal to each other. For the second main effect, the test is also one of sample-size weighted comparisons, although the precise weights are a more complex function of the sample sizes.

If the sample-sizes are reflective of the actual proportions in which you might find the various factor levels in the Data Generating Process, then it might make sense to test such sample-size weighted equality between means. However, unequal sample sizes often do not reflect such meaningful differences in the DGP. In that case, the hypotheses tested with a Type 1 procedure might not be meaningful. Also, the results of a Type 1 SS procedure depend on the order in which you enter the main effects. If you start with the main effect of \(\texttt{B}\) rather than \(\texttt{P}\), then the SSR of \(\texttt{B}\) would have been \(C+D\) rather than just \(C\) in Figure 8.3. There often isn’t a clear reason to prefer one order over another. Hence, because of these issues, I generally don’t recommend using a Type 1 SS procedure. The reason for mentioning it here is that some statistical software (e.g. base R) use Type 1 SS procedures by default, and it is important to be aware of this.

The Type 2 SS procedure is (even) more complicated, in terms of the hypotheses that are tested. The comparisons involved in computing the SSR terms for the different effects can be described reasonably clearly though. The idea is to determine the SSR associated to an effect whilst controlling for any effects that do not fully contain that effect. For example, let’s consider the threeway Prime by Belief by Experimenter ANOVA we conducted earlier. To compute the SSR of main effect of Prime, you would construct a model with the contrast codes corresponding to the main effects of Prime, Belief, and Experimenter, as well as the Belief \(\times\) Experimenter interaction, as the latter does not contain Prime. Because the Prime \(\times\) Experimenter and Prime \(\times\) Belief \(\times\) Experimenter interactions do contain Prime, they would not be included. You would then compute the SSR for the main effect of Prime by computing the difference in the SSE of this model and a model that excludes the Prime main effect. Similarly, to compute the SSR for Belief, you would compare a model with the main effects of Prime, Belief, and Experimenter, and Prime \(\times\) Experimenter interaction, to a model which excludes the Belief main effect. To compute the SSR for e.g. the Prime \(\times\) Belief interaction, you would compare a model which includes the main effects of Prime, Belief, and Experimenter, and the Prime \(\times\) Belief, Prime \(\times\) Experimenter, and Belief \(\times\) Experimenter interactions. All effects included, apart from the Prime \(\times\) Belief interaction itself, do not fully contain this interaction (i.e. do not include both Prime and Belief). The Prime \(\times\) Belief \(\times\) Experimenter interaction is excluded, because it involves both Prime and Belief. As for the Type 1 procedure, these comparisons are solely used to compute the SSR terms for the main effects and interactions. Once these have been obtained, they are used in the usual formula for the \(F\) statistic, where everything else is the same as in the Type 3 procedure.

As for the Type 1 procedure, the hypotheses tested by the Type 2 procedure reflect a (complex) weighting of means by sample size. But, unlike the Type 1 procedure, the results are not dependent on the order in which you include effects. Some authors argue that a Type 2 procedure is preferable to a Type 3 procedure when you don’t expect interactions, as the tests of the main effects are more powerful. That may be so, but interactions are often difficult to rule out on theoretical grounds. Moreover, because the hypothesis tests are generally more straightforward to interpret in the Type 3 procedure, as they involve comparisons between unweighted population means, I would generally advice the use of Type 3 tests, unless you have a strong conviction that there are no interactions, and a good grasp of the hypotheses tested by the Type 2 procedure.

The issue of different testing schemes for unbalanced designs (unequal sample sizes) is a complex one, and opinions differ on which scheme is preferred. I mainly want you to be aware of these different approaches, and you can determine your own preference once you have gained enough experience with the General Linear Model to do so. A more comprehensive treatment of the different schemes is given in Chapter 7 of Maxwell et al. (2017).

8.6 In practice

In practice, a factorial ANOVA is not that different from a oneway ANOVA. So the steps are again

  1. Explore the data. Check the distribution of the dependent variable in each condition/group (i.e. each combination of experimental manipulations). Are there outlying or otherwise “strange” observations? If so, you may consider removing these from the dataset. Do the distributions look roughly Normal or are the distributions at least similar over the groups? Calculate the sample variances for each group. Is the largest variance no more than 4 times larger than the smallest sample variance? If not, then you may consider an alternative analysis than an ANOVA. If you have doubts about the homogeneity of variance, perform a Levene test. If this test is significant, you may still perform the ANOVA analysis as usual if the largest variance is less than 4 times larger than the smallest sample variance and the sample sizes in the groups are equal.

  2. Define a useful set of contrast codes for the main effects of the manipulations in your study. Aim for these codes to represent the most important comparisons between the levels of the experimental factors. Then compute the interaction contrasts as (pairwise, threeway, fourway, …) products of the main-effects contrasts. Estimate the factorial ANOVA model. Then check again for potential issues in the assumptions with e.g. histograms for the residuals and QQ-plots. If there are clear outliers in the data, remove these, and then re-estimate the model.

  3. Consider both the results of the omnibus tests for the main effects and interactions, and the tests of the individual contrasts. Some people try to interpret significant interactions by conducting follow-up ANOVAs separately for different levels of one of the factors involved in the interaction. For example, an interaction between prime and belief might be investigated by testing the effect of belief twice, once for participants who received a high-power prime, and once for participants who received a low-power prime. If the effect of belief is significant for one subset of the data, but not another, this is then taken to “explain” the interaction. This is not a good procedure, for a number of reasons. Firstly, the power in these separate analyses will be reduced (each subset has less observations than the full dataset). Moreover, there will be many situations where the test is significant for multiple subsets. An interaction indicates that the size of an effect of one experimental manipulation is different for the different levels of another experimental manipulation, not that there is an effect for one level but not for another. So you should compare the size of an effect between different levels of another manipulation. This is exactly what the product contrasts do. Hence, to interpret an interaction, you should compute the (predicted) effect for each level of the other factor. If there are more comparisons of interest than those encoded in the contrast-coded predictors, perform additional follow-up tests. If there are many of these tests, consider correcting for this by using e.g. a Scheffe-adjusted critical value.

  4. Report the results. When reporting the results, make sure that you include all relevant statistics. For example, one way to write the results of the analysis of Table 8.2, is as follows:

For each participant, we calculated an approach advantage score as the average difference in reaction time (in milliseconds) between avoid and appeoach trials. We then asessed the effect of power prime and experimenter belief on approach advantage in a 2 (Prime: high-power, low-power) by 2 (Belief: high-power, low-power) factorial ANOVA. We found no significant effect of Prime, \(F(1, 396) = 0.31\), \(p = .577\), \(\hat{\eta}^2_p = .001\). However, the effect of experimenter Belief was significant, \(F(1, 396) = 17.82\), \(p < .001\), \(\hat{\eta}^2_p = .043\). The approach advantage score was positive when the experimenter believed participants received a high-power prime, \(M = 66.72\), 95% CI \([36.99, 96.45]\), and negative when the experimenter believed they received a low-power prime, \(M = -23.79\), 95% CI \([-53.67, 6.09]\). The interaction between Prime and Belief was not significant, \(F(1, 396) = 0.77\), \(p = .381\), \(\hat{\eta}^2_p = .002\). As such, we found clear evidence for the experimenter-belief hypothesis: whether the experimenter believes participants were given a high- or low-power prime affected participants’ readiness to avoid or approach targets. The prime participants actually received appeared to have no effect. As such, we found no evidence for the social priming hypothesis.

References

Gilder, T. S. E., & Heerey, E. A. (2018). The role of experimenter belief in social priming. Psychological Science, 29, 403–417.
Maxwell, S. E., Delaney, H. D., & Kelley, K. (2017). Designing experiments and analyzing data: A model comparison perspective. Routledge.

  1. \(\sum_{k=1}^g c_{1,k} = 0\), \(\sum_{k=1}^g c_{2,k} = 0\), and \(\sum_{k=1}^g c_{1,k} \times c_{2,k} = \tfrac{1}{4} - \tfrac{1}{4} - \tfrac{1}{4} + \tfrac{1}{4} = 0\).↩︎

  2. Remember that the slope of a predictor reflects the increase in the dependent variable for every one-unit increase in that predictor. Every one-unit increase in \(X_3\) corresponds to a half-unit increase in \(X_3'\). That implies that a one-unit increase in \(X_3'\) is the same as a two-unit increase in \(X_3\), so \(\beta'_{3} = 2 \times \beta_3\).↩︎