bayescogsci/06-coding.Rmd at main · bnicenboim/bayescogsci · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
output: html_document
editor_options:
  chunk_output_type: console
---
# Contrast coding {#ch-contr}

Whenever one uses a categorical variable as a predictor in a Bayesian regression model, for example when estimating the difference in a dependent variable among two or three experimental conditions, it becomes necessary to code the discrete factor levels into numeric predictor variables. This coding is termed *contrast coding*. For example, in the previous chapter (section \@ref(sec-stroop)), we coded two experimental conditions as $-1$ and $+1$; this is called a sum contrast. Contrasts are the values that we assign to predictor variables to encode specific comparisons between factor levels and to create predictor terms to estimate these comparisons in any type of regression, including Bayesian regressions.

Contrast coding in Bayesian models works more or less the same way as in frequentist models, and the same principles and tools can be used in both cases.
As one important difference, the prior can matter in Bayesian models [especially when working with Bayes factors: @dablander2022puzzle; @rouder2012default]; the prior needs to be adapted to the scaling of contrasts. For example, given two conditions in an experiment, if we code a sum contrast (discussed below) with $\pm 1$ coding, the parameter representing the difference between the two conditions will be half as large as the observed difference in the data, compared to the case when we code the contrasts with $\pm 0.5$ coding. In the latter case, the parameter of interest does represent the observed difference in means between the conditions. The two contrast coding options ($\pm 1$ and $\pm 0.5$) obviously imply different prior specifications for the parameter of interest.

This chapter will introduce contrast coding in the context of Bayesian models. The descriptions are in large parts taken from @schadHowCapitalizePriori2020 (which is published under a CC-BY 4.0 license: https://creativecommons.org/licenses/by/4.0/) and adapted for the current context.

Consider a situation where we want to estimate differences in a dependent variable between three levels of a factor. An example could be differences in response times between three levels of word class (noun, verb, adjective). We might be interested in whether word class influences response times. In frequentist statistics, one way to approach this question would be to run an ANOVA and compute an omnibus $F$-test for whether word class explains response times.^[A Bayesian analog to the frequentist $F$-test is Bayesian model comparison (i.e., Bayes factors), where we might compare an alternative model including word class as a predictor term with a null model lacking this predictor. We will discuss such Bayesian model comparison using Bayes factors in chapter \@ref(ch-bf).] However, if based on such omnibus approaches we find support for an influence of word class on response times, it remains unclear where this effect actually comes from, i.e., whether it originated from the nouns, verbs, or adjectives. This is problematic for inference because scientists typically have specific expectations about which groups differ from each other. In this chapter, we will show how to estimate specific comparisons directly in a Bayesian linear model. This gives the researcher a lot of control over Bayesian analyses. Specifically, we show how \index{Planned comparison} planned comparisons between specific conditions \index{Group} (groups) or clusters of conditions, are implemented as contrasts. This is a very effective way to align expectations with the statistical model. In Bayesian models, any specific comparisons can also be computed after the model is fit. Nevertheless, coding a priori expectations into contrasts for model fitting will make it much more straightforward to estimate certain comparisons between experimental conditions, and will allow us to perform Bayesian model comparisons using Bayes factors to provide evidence for  or against very specific hypotheses.

For this and the next chapter, although knowledge of the matrix formulation of the linear model is not necessary, for a deeper understanding of contrast coding some exposure to the matrix formulation is desirable. We discuss the matrix formulation in the online section \@ref(app-matrixHierachicalModel).

## Basic concepts illustrated using a two-level factor

We first consider the simplest case: suppose we want to compare the means of a dependent variable (DV) such as response times between two groups of subjects. A simulated data set is available in the package `bcogsci` as the data set `df_contrasts1`. The simulations assumed longer response times in condition $F1$ ($\mu_1 = 0.8$ sec) than $F2$ ($\mu_2 = 0.4$ sec). The data from the $10$ simulated subjects are aggregated and summary statistics are computed for the two groups.

\Begin{samepage}
```{r, echo = TRUE}
data("df_contrasts1")
df_contrasts1
```
\End{samepage}

```{r, echo = FALSE}
table1 <- df_contrasts1 %>% group_by(F) %>% # Table for main effect F
  summarize(N = n(), M = mean(DV), SD = sd(DV), SE = round(SD / sqrt(N), 1))
table1a <- as.data.frame(table1)
names(table1a) <- c("Factor", "N data", "Est. means", "Std. dev.", "Std. errors")
(GM <- mean(table1$M)) # Grand Mean
```

```{r cTab1Means, echo = FALSE, results = "asis"}
 kableExtra::kable(table1a %>%
  as.data.frame() %>%
    mutate(across(where(is.integer), ~ paste0("$",.x, "$"))) %>%
  mutate(across(where(is.double), ~ paste0("$",formatC(round(.x, 1),
                                                       format = "f", digits = 1), "$"))), escape = FALSE,align = c("l",rep("r",4)),
 booktabs = TRUE,  vline = "",
  caption = "Summary statistics per condition for the simulated data."
)
```


(ref:cFig1Means) Means and standard errors of the simulated dependent variable (e.g., response times in seconds) in two conditions $F1$ and $F2$.

```{r cFig1Means, fig = TRUE, include = TRUE, echo = FALSE, cache = FALSE, fig.width = 4, fig.height = 2.3, fig.cap = "(ref:cFig1Means)"}
#(plot1 <- qplot(x = F, y = M, group = 1, data = table1, geom = c("point", "line")) +
(plot1 <- ggplot(aes(x = F, y = M, group = 1), data = table1) +
  geom_point() + geom_line() +
  geom_errorbar(aes(max = M + SE, min = M - SE), width = 0) +
  # scale_y_continuous(breaks = c(250,275,300)) +
  scale_y_continuous(breaks = seq(0, 1, .2)) + coord_cartesian(ylim = c(0, 1)) +
  labs(y = "Mean Response Time [sec]", x = "Factor F"))
```

The results, displayed in Figure \@ref(fig:cFig1Means) and in Table \@ref(tab:cTab1Means), show that the assumed true condition means are exactly realized with the simulated data. The numbers are exact because the  \index{\texttt{mvrnorm}} `mvrnorm()` function used here (see `?df_contrasts1`) ensures that the data are generated so that the sample mean yields the true means for each level. In real data sets, of course, the sample means will vary from experiment to experiment.

A simple Bayesian linear model of `DV` on $F$ yields a straightforward estimate of the difference between the group means. We use relatively uninformative priors. The estimates for the population-level effects are presented below using the function `fixef()`:

```{r fitmodel, echo = TRUE, message = FALSE, results = "hide"}
fit_F <- brm(DV ~ F,
             data = df_contrasts1,
             family = gaussian(),
             prior = c(prior(normal(0, 2), class = Intercept),
                       prior(normal(0, 2), class = sigma),
                       prior(normal(0, 1), class = b)))
```
```{r, echo = TRUE}
fixef(fit_F)
```

Comparing the means for each condition with the coefficients (`Estimates`) reveals that (i) the intercept ($0.8$) is the mean for condition $F1$, $\hat\mu_1$; and (ii) the slope (`FF2`: $-0.4$) is the difference between the estimated means for the two groups, $\hat\mu_2 - \hat\mu_1$ [@Bolker2018]:

\begin{equation}
\begin{array}{lcl}
\text{Intercept} = & \hat{\mu}_1 & = \text{estimated mean for } F1 \\
\text{Slope (FF2)} = & \hat{\mu}_2 - \hat{\mu}_1 & = \text{estim. mean for } F2 - \text{estim. mean for }F1
\end{array}
(\#eq:betac)
\end{equation}

The new information is the $95$% credible interval for the difference between the two groups.

### Default contrast coding: \index{Treatment contrasts} Treatment contrasts {#treatmentcontrasts}

How does the function `brm()` arrive at these particular values for the intercept and slope? That is, why does the intercept assess the mean of condition $F1$ and how do we know the slope measures the difference in means between $F2$-$F1$? This result is a consequence of the \index{Default contrast coding} default contrast coding of the factor $F$. R assigns treatment contrasts to factors and orders their levels alphabetically. The alphabetically first factor level (here: $F1$) is coded in R by default as $0$ and the second level (here: $F2$) is coded as $1$.^[When using treatment coding, it's important to be aware that the intercept won't be centered. As mentioned in online section \@ref(app-intercept), this means the prior one sets for the intercept in `brms` will actually be for assigned to its centered version. Although this distinction may not have any impact when one uses wide priors, it's still worth keeping in mind.] This becomes clear when we inspect the current contrast attribute of the factor using the \index{\texttt{contrasts}} `contrasts()` command:

```{r, echo = TRUE}
contrasts(df_contrasts1$F)
```

Why does this contrast coding yield these particular regression coefficients? Let's take a look at the \index{Regression equation} regression equation.
Let $\alpha$ represent the intercept, and $\beta_1$ the slope. Then, the simple regression above expresses the belief that the expected response time $\hat{y}$ (or $E[Y]$)  is a linear function of the factor $F$.

\begin{equation}
E[Y] = \alpha + \beta_1 x
\label{eq:lm1}
\end{equation}

So, if $x = 0$ (condition $F1$), the expectation is $\alpha + \beta_1 \cdot 0 = \alpha$; and if $x = 1$ (condition $F2$), the expectation is $\alpha + \beta_1 \cdot 1 = \alpha + \beta_1$.

Expressing the above in terms of the estimated coefficients:

\begin{equation}
\begin{array}{lccll}
\text{estim. value for }F1 = & \hat{\mu}_1 = & \hat{\alpha} = & \text{Intercept} \\
\text{estim. value for }F2 = & \hat{\mu}_2 = & \hat{\alpha} + \hat{\beta}_1 = & \text{Intercept} + \text{Slope (FF2)}
\end{array}
\label{eq:predVal}
\end{equation}

It is useful to think of such unstandardized regression coefficients as \index{Difference score} difference scores; they express the increase in the dependent variable $y$ associated with a change in the independent variable $x$ by one unit, such as going from $0$ to $1$ in this example. The difference between condition means is $0.4 - 0.8 = -0.4$, which is the estimated regression coefficient $\hat{\beta}_1$. The sign of the slope is negative because we have chosen to subtract the larger mean $F1$ score from the smaller mean $F2$ score.

### Defining comparisons {#inverseMatrix}

The analysis of the regression equation demonstrates that in the treatment contrast the intercept assesses the average response in the baseline condition, whereas the slope estimates the difference between condition means. However, these are just verbal descriptions of what each coefficient assesses. Is it also possible to formally write down what each coefficient assesses?

From the perspective of parameter estimation, the slope represents the effect of main interest, so we consider this first. The treatment contrast specifies that the slope $\beta_1$ estimates the difference in means between the two levels of the factor $F$. This can formally be written as:

\begin{equation}
\beta_1 = \mu_{F2} - \mu_{F1}
\end{equation}

or equivalently:

\begin{equation}
\beta_1 = - 1 \cdot \mu_{F1} + 1 \cdot \mu_{F2}
\end{equation}

The $\pm 1$ weights in the parameter estimation directly express which means are compared by the treatment contrast.

The intercept in the treatment contrast estimates a quantity that is usually of little interest: it estimates the mean in condition $F1$.
Formally, the parameter $\alpha$ estimates the following quantity:

\begin{equation}
\alpha = \mu_{F1}
\end{equation}

\noindent
or equivalently:

\begin{equation}
\alpha = 1 \cdot \mu_{F1} + 0 \cdot \mu_{F2} .
\end{equation}

\noindent
The fact that the intercept term formally estimates the mean of condition $F1$ is in line with our previous derivation (see equation \@ref(eq:betac)).

In R, factor levels are ordered alphabetically and by default the first level is used as the baseline in treatment contrasts. Obviously, this default mapping will depend on the levels' alphabetical ordering. If a different \index{Baseline condition} baseline condition is desired, it is possible to re-order the levels. Here is one way of re-ordering the levels:

```{r, echo = TRUE}
df_contrasts1$Fb <- factor(df_contrasts1$F, levels = c("F2", "F1"))
contrasts(df_contrasts1$Fb)
```

\noindent
This re-ordering did not change any data associated with the factor, only one of its attributes. With this new contrast attribute, a simple Bayesian model yields the following result.

```{r, echo = TRUE, message = FALSE, results = "hide"}
fit_Fb <- brm(DV ~ Fb,
              data = df_contrasts1,
              family = gaussian(),
              prior = c(prior(normal(0, 2), class = Intercept),
                        prior(normal(0, 2), class = sigma),
                        prior(normal(0, 1), class = b)))
```
```{r, echo = TRUE}
fixef(fit_Fb)
```

The model now estimates different quantities. The intercept now codes the mean of condition $F2$, and the slope measures the difference in means between $F1$ minus $F2$. This represents an alternative coding of the treatment contrast.

These model posteriors do not allow us to claim that we have evidence for the hypothesis that the effect of factor $F$ is different from zero. If the research focus is on such hypothesis testing, Bayesian hypothesis tests can be carried out using Bayes factors, by comparing a model containing a contrast of interest with a model lacking this contrast. We will discuss details of Bayesian hypothesis testing based on Bayes factors in chapter \@ref(ch-bf). For now, our focus in obtaining estimates of the relevant comparison(s) we are interested in when we have two or more conditions.

### \index{Sum contrasts} Sum contrasts {#effectcoding}

Treatment contrasts are only one of many options. It is also possible to use sum contrasts, which code one of the conditions as $-1$ and the other as $+1$, effectively *centering* the effects at the \index{Grand mean} grand mean (GM, i.e., the mean of the two group means). Here, we rescale the contrast to values of $-0.5$ and $+0.5$, which makes the estimated treatment effect the same as for treatment coding and are easier to interpret.

To define this contrast in a linear regression, one way is to use the `contrasts` function (another way is to define a column containing $+0.5$ and $-0.5$ for the corresponding levels of the factor).

```{r, echo = TRUE, message = FALSE, results = "hide"}
contrasts(df_contrasts1$F) <- c(-0.5, +0.5)
fit_mSum <- brm(DV ~ F,
                data = df_contrasts1,
                family = gaussian(),
                prior = c(prior(normal(0, 2), class = Intercept),
                          prior(normal(0, 2), class = sigma),
                          prior(normal(0, 1), class = b)))
```
```{r, echo = TRUE}
fixef(fit_mSum)
```

Here, the slope ($F1$) again codes the difference of the groups associated with the first and second factor levels. It has the same value as in the treatment contrast.
One important difference from the treatment contrast is that the intercept now represents the estimate of the average of condition means for $F1$ and $F2$, that is, the grand mean.  For the scaled sum contrast:

\begin{equation}
\begin{array}{lcl}
\text{Intercept} = & (\hat{\mu}_1 + \hat{\mu}_2)/2 & = \text{estimated mean of }F1 \text{ and }F2 \\
\text{Slope (F1)} = & \hat{\mu}_2 - \hat{\mu}_1 & = \text{estim. mean of }F2 - \text{estim. mean for} F1
\end{array}
\label{eq:beta2}
\end{equation}

Why does the intercept assess the grand mean and why does the slope estimate the group difference? This is the result of rescaling the sum contrast. The first factor level ($F1$) was coded as $-0.5$, and the second factor level ($F2$) as $+0.5$:

```{r, echo = TRUE}
contrasts(df_contrasts1$F)
```

Look again at the regression equation to better understand what computations are performed. Again, $\alpha$ represents the intercept, $\beta_1$ represents the slope, and the predictor variable $x$ represents the factor $F$. The regression equation is written as:

\begin{equation}
E[Y] = \alpha + \beta_1 x
\label{eq:lm2}
\end{equation}

The group of $F1$ subjects is then coded as $-0.5$, and the response time for the group of $F1$ subjects is $\alpha + \beta_1 \cdot x_1 = 0.6 + (-0.4) \cdot (-0.5) = 0.8$. By contrast, the $F2$ group is coded as $+0.5$. By implication, the mean of the $F2$ group must be $\alpha + \beta_1 \cdot x_1 = 0.6 + (-0.4) \cdot 0.5 = 0.4$.
Expressed in terms of the estimated coefficients:

\begin{equation}
\begin{array}{lccll}
\text{estim. value for }F1 = & \hat{\mu}_1 = & \hat{\alpha} - 0.5 \cdot \hat{\beta}_1 = & \text{Intercept} - 0.5 \cdot \text{Slope (F1)}\\
\text{estim. value for }F2 = & \hat{\mu}_2 = & \hat{\alpha} + 0.5 \cdot \hat{\beta}_1 = & \text{Intercept} + 0.5 \cdot \text{Slope (F1)}
\end{array}
\label{eq:predVal2}
\end{equation}

The \index{Unstandardized regression coefficient} unstandardized regression coefficient is a \index{Difference score} difference score: Taking a step of one unit on the predictor variable $x$, e.g., from $-0.5$ to $+0.5$, reflecting a step from condition $F1$ to $F2$, changes the dependent variable from $0.8$ (for condition $F1$) to $0.4$ (condition $F2$). This reflects a difference of $0.4 - 0.8 = -0.4$; this is again the estimated regression coefficient $\hat{\beta}_1$.
Moreover, as mentioned above, the intercept now assesses the grand mean, i.e., the unweighted mean of the means for conditions $F1$ and $F2$: it is in the middle between condition means for $F1$ and $F2$.

So far we gave verbal statements about what is estimated by the intercept and the slope in the case of the scaled sum contrast. It is possible to write these statements as formal parameter estimates.
In scaled sum contrasts, the slope parameter $\beta_1$ assesses the following quantity:

\begin{equation}
\beta_1 = -1 \cdot \mu_{F1} + 1 \cdot \mu_{F2}
\end{equation}

\noindent
This estimates the same quantity as the slope in the treatment contrast.
The intercept now assesses a different quantity: the average of the two conditions $F1$ and $F2$:

\begin{equation}
\alpha = 1/2 \cdot \mu_{F1} + 1/2 \cdot \mu_{F2} = \frac{\mu_{F1} + \mu_{F2}}{2}
\end{equation}

\noindent
In balanced data, i.e., in data sets where there are no missing data points, the average of the two conditions $F1$ and $F2$ is the \index{Grand mean} grand mean. In unbalanced data sets, where there are missing values, this average is the \index{Weighted grand mean} weighted grand mean.
To illustrate this point, consider an example with fully balanced data and two equal group sizes of $5$ subjects for each group $F1$ and $F2$. Here, the grand mean is also the mean across all subjects. Next, consider a highly simplified unbalanced data set, where in condition $F1$ two observations of the dependent variable are available with values of $2$ and $3$, and where in condition $F2$ only one observation of the dependent variable is available with a value of $4$. In this data set, the mean across all subjects is $\frac{2 + 3 + 4}{3} = \frac{9}{3} = 3$. However, the (weighted) grand mean as assessed in the intercept in a model using sum contrasts for factor $F$ would first compute the mean for each group separately (i.e., $\frac{2 + 3}{2} = 2.5$, and $4$), and then compute the mean across conditions $\frac{2.5 + 4}{2} = \frac{6.5}{2} = 3.25$. The grand mean of $3.25$ is different from the mean $3$ that was computed using all the subjects' data all at once.

To summarize, treatment contrasts and sum contrasts are two possible ways to parameterize the difference between two groups; they generally estimate different quantities. Treatment contrasts compare one or more means against a baseline condition, whereas sum contrasts compare a condition's mean to the grand mean (which in the two-group case also implies estimating the difference between the two group means). One question that comes up here is: how does one know what quantities are estimated by a given set of contrasts? (In the context of Bayes factors, the question would be: what hypothesis test does the contrast coding encode?) This question will be discussed in detail below for the general case of any arbitrary contrast coding.

### \index{Cell means parameterization} Cell means parameterization and \index{Posterior comparison} posterior comparisons {#sec-cellMeans}

One alternative option is to use what is called the cell means parameterization (this coding is also called \index{One-hot encoding} "one-hot encoding" in the context of machine learning). In this approach, one does not estimate an intercept term, and then differences between factor levels. Instead, each free parameter is used to simply estimate the mean of one of the factor levels. As a consequence, no comparisons between condition means are estimated, but simply the mean of each experimental condition is estimated. Cell means parameterization is specified by explicitly removing the intercept term (which is added automatically in `brms`) by adding a $-1$ in the regression formula:

```{r, echo = TRUE, message = FALSE, results = "hide"}
fit_mCM <- brm(DV ~ -1 + F,
               data = df_contrasts1,
               family = gaussian(),
               prior = c(prior(normal(0, 2), class = sigma),
                         prior(normal(0, 2), class = b)))
```
```{r, echo = TRUE}
fixef(fit_mCM)
```

Now, the regression coefficients (see the column labeled `Estimate`) estimate the mean of the first factor level ($0.8$) and the mean of the second factor level ($0.4$). This cell means parameterization usually does not allow us to make inferences about the hypotheses of interest using Bayes factors, as these hypotheses usually relate to differences between conditions rather than to whether each condition differs from zero.

The cell means parameterization provides a good example  demonstrating an advantage of Bayesian data analysis: In Bayesian models, it is possible to use the posterior samples to compute new estimates that were not directly contained in the fitted model. To implement this, we first extract the posterior samples from the `brm()` model object:

```{r}
df_postSamp <- as_draws_df(fit_mCM)
```

In a second step, we can then compute comparisons from these posterior samples. For example, we can compute the difference between conditions $F2$ and $F1$. To do so, we simply take the posterior samples for each condition, and compute their difference.

```{r}
df_postSamp  <- df_postSamp %>%
  mutate(b_dif = b_FF2 - b_FF1)
```

This provides a posterior sample of the difference between conditions. It is possible to investigate this posterior sample by looking at its mean and 95% credible interval:

```{r}
c(Estimate = mean(df_postSamp$b_dif),
  quantile(df_postSamp$b_dif, p = c(0.025, 0.975)))
```

The above summary provides the same estimate (roughly $-0.4$) that we obtained previously when using the treatment contrast or the scaled sum contrast.
(Note that if the priors are sufficiently non-informative, the results will be similar. However, given that the priors differ in comparison to a model estimating the difference directly, the posterior may also differ between models.)

Thus, Bayesian models provide a lot of flexibility in computing new comparisons post-hoc from the posterior samples and in obtaining their posterior distributions. However, what these posterior computations do not provide directly are inferences on null hypotheses. That is, just by looking at the credible intervals, we cannot make inferences about whether a null hypothesis can be rejected; an explicit hypothesis test is needed to answer such a question (see chapter \@ref(ch-bf)).

## The hypothesis matrix illustrated with a three-level factor

Consider an example with the three word classes nouns, verbs, and adjectives. We load simulated data from a lexical decision task with response times as dependent variable. The research question is: do response times differ as a function of the between-subject factor word class with three levels: nouns, verbs, and adjectives? Here, just to illustrate the case of a three-level factor, we make the arbitrary assumption that nouns have longest response times and adjectives the shortest response times. Word class is specified as a between-subject factor. In cognitive science experiments, word class will usually vary within subjects and between items. Because the within- or between-subjects status of an effect is independent of its contrast coding, we assume the manipulation to be between subjects for ease of exposition. The concepts presented here extend to repeated measures designs that are often analyzed using hierarchical Bayesian (linear mixed) models.


Load and display the simulated data.

```{r, echo = TRUE}
data("df_contrasts2")
head(df_contrasts2)
```

```{r, echo = FALSE}
table.word <- df_contrasts2 %>%
  group_by(F) %>%
  summarise(
    N = length(DV), M = mean(DV),
    SD = sd(DV), SE = sd(DV) / sqrt(N)
  )
table.word1 <- as.data.frame(table.word)
names(table.word1) <- c(
  "Factor", "N data",
  "Est. means", "Std. dev.", "Std. errors"
)
```

```{r cTab2Means, echo = FALSE, results = "asis"}
 kableExtra::kable(table.word1 %>%
  as.data.frame() %>%
    mutate(across(where(is.integer), ~ paste0("$",.x, "$"))) %>%
  mutate(across(where(is.double), ~ paste0("$",formatC(round(.x, 1),format = "f", digits = 1), "$"))), escape = FALSE,align = c("l",rep("r",4)),
 booktabs = TRUE,  vline = "",
  caption = "Summary statistics per condition for the simulated data."
)
```

As shown in Table \@ref(tab:cTab2Means), the estimated means reflect our assumptions about the true means in the data simulation: Response times are longest for nouns and shortest for adjectives.
In the following sections, we use this data set to illustrate sum contrasts. Furthermore, we will use an additional data set to illustrate repeated, Helmert, polynomial, and custom contrasts. In practice, usually only one set of contrasts is selected when the expected pattern of means is formulated during the design of the experiment.

### \index{Sum contrasts} Sum contrasts {#sumcontrasts}

We begin with sum contrasts. Suppose that the expectation is that nouns are responded to slower and adjectives are responded to faster than the grand mean response time. Then, the research question could be: By how much do nouns differ from the grand mean and by how much do adjectives differ from the grand mean? And are the responses slower or faster than the grand mean? We want to estimate the following two quantities:

\begin{equation}
\beta_1 = \mu_1 - \frac{\mu_1+\mu_2+\mu_3}{3} = \mu_1 - GM
\end{equation}

\noindent
and

\begin{equation}
\beta_2 = \mu_2 - \frac{\mu_1+\mu_2+\mu_3}{3} = \mu_2 - GM
\end{equation}

$\beta_1$ can also be written as:

\begin{equation} \label{h01}
\beta_1 = \frac{2}{3} \mu_1 - \frac{1}{3}\mu_2 - \frac{1}{3}\mu_3
\end{equation}

\noindent
Here, the weights $2/3, -1/3, -1/3$ are informative about how to combine the condition means to estimate the linear model coefficient.

$\beta_2$ is also rewritten as:

\begin{align}\label{h02}
\beta_2 = & \mu_2 - \frac{\mu_1+\mu_2+\mu_3}{3} \\
\Leftrightarrow \beta_2 = & -\frac{1}{3}\mu_1 + \frac{2}{3} \mu_2 - \frac{1}{3} \mu_3
\end{align}

\noindent
Here, the weights are $-1/3, 2/3, -1/3$, and they again indicate how to combine the condition means for estimating the regression coefficient.

### The \index{Hypothesis matrix} hypothesis matrix

The \index{Weights of the condition means} weights of the condition means are not only useful for defining parameter estimates. They also provide the starting step in a very powerful method which allows the researcher to generate the contrasts that are needed to estimate these comparisons in a linear model. What we did so far is to explain some different contrast codings that exist and the comparisons that are estimated by these contrasts. Given a particular data set, if the goal is to estimate certain comparisons, then the procedure would be to check whether any of the contrasts that we encountered estimate these comparisons of interest. Sometimes it suffices to use one of these existing contrasts. At other times, our research questions may not correspond exactly to any of the contrasts in the default set of standard contrasts provided in R. For these cases, or for more complex designs, it is very useful to know how contrast matrices are created. Indeed, a relatively simple procedure exists in which we write our comparisons formally, extract the weights of the condition means from the comparisons, and then automatically generate the correct \index{Contrast matrix} contrast matrix that we need in order to estimate these comparisons in a linear model. Using this powerful method, it is not necessary to find a match to a contrast matrix provided by the family of functions in R starting with the prefix `contr`.^[These are the functions `contr.treatment()`, `contr.sum()`, `contr.poly()`, `contr.helmert()`, and `contr.sdif()`. The last contrast is available in the `MASS` package.] Instead, it is possible to simply define the comparisons that one wants to estimate, and to obtain the correct contrast matrix for these in an automatic procedure. Here, for pedagogical reasons, we show some examples of how to apply this procedure in cases where the comparisons *do* correspond to some of the existing contrasts.

Defining a \index{Custom contrast matrix} custom contrast matrix involves four steps:

1. Write down the estimated comparisons
2. Extract the weights and write them into what we will call a *hypothesis matrix* (and can also be viewed as a \index{Comparison matrix} *comparison matrix*)
3. Apply the \index{Generalized matrix inverse} *generalized matrix inverse* to the hypothesis matrix to create the contrast matrix
4. Assign the contrast matrix to the factor and run the (Bayesian)  model

The term hypothesis matrix is used here because contrast coding is often done to carry out an explicit hypothesis test; but one could of course use contrast coding just to compute the estimates of an effect and their uncertainty, without doing a hypothesis test.

Let us apply this four-step procedure to our example of the sum contrast. The first step, writing down the estimated comparisons, as done in section \@ref(sumcontrasts). The second step involves writing down the weights that each comparison gives to condition means. The weights for the first comparison are `wH01 = c(+2/3, -1/3, -1/3)`, and the weights for the second comparison are `wH02 = c(-1/3, +2/3, -1/3)`.

Before writing these into a hypothesis matrix, we also define the estimated quantity for the intercept term. The intercept parameter estimates the mean across all conditions:

\begin{align}
\alpha = \frac{\mu_1 + \mu_2 + \mu_3}{3} \\
\alpha = \frac{1}{3} \mu_1 + \frac{1}{3}\mu_2 + \frac{1}{3}\mu_3
\end{align}

This estimate has weights of $1/3$ for all condition means.
The weights from all three model parameters that were defined are now combined and written into a matrix that we refer to as the *hypothesis matrix* (`Hc`):

```{r, echo = TRUE}
HcSum <-
  rbind(cH00 = c(adjectives = 1 / 3, nouns = 1 / 3, verbs = 1 / 3),
        cH01 = c(adjectives = 2 / 3, nouns = -1 / 3, verbs = -1 / 3),
        cH02 = c(adjectives = -1 / 3, nouns = 2 / 3, verbs = -1 / 3))
fractions(t(HcSum))
```

Each set of weights is first entered as a row into the matrix (command `rbind()`). We switch rows and columns of the matrix for easier readability using the command `t()` (this transposes the matrix). The command `fractions()` from the `MASS` package turns the decimals into fractions to improve readability.

Now that the condition weights have been written into the hypothesis matrix, the third step of the procedure is implemented: a matrix operation called the *generalized matrix inverse*[^2a] is used to obtain the contrast matrix that is needed to estimate these comparisons in a linear model.

Use the function `ginv2()` from the `bcogsci` package for the next step. It is similar to `ginv()` from `MASS` but provides nicer formatting of the output.

[^2a]: At this point, there is no need to understand in detail what this means. We refer the interested reader to @schadHowCapitalizePriori2020. For a quick overview, we recommend a vignette explaining the generalized inverse in the `matlib` package [https://cran.r-project.org/web/packages/matlib/vignettes/ginv.html; @friendly_matlib].


Applying the generalized inverse to the hypothesis matrix results in the new matrix `XcSum`. This is the contrast matrix $X_c$ that estimates exactly those comparisons that were specified earlier:

```{r, echo = TRUE}
(XcSum <- ginv2(HcSum))
```

This contrast matrix corresponds exactly to the sum contrasts described above. In the case of the sum contrast, the contrast matrix looks very different from the hypothesis matrix. The contrast matrix in sum contrasts codes with $+1$ the condition that is to be compared to the grand mean. The condition that is never compared to the grand mean is coded as $-1$. Without knowing the relationship between the hypothesis matrix and the contrast matrix, the meaning of the coefficients is completely opaque.

To verify this custom-made contrast matrix, it is compared to the sum contrast matrix as generated by the R function `contr.sum()` in the `stats` package. The resulting contrast matrix is identical to the result when adding the intercept term, a column of ones, to the contrast matrix:

```{r, echo = TRUE}
fractions(cbind(1, contr.sum(3)))
```

In order to estimate model parameters, step four in our procedure involves assigning sum contrasts to the factor $F$ in our example data, and fitting a (Bayesian) linear model. This allows us to estimate the regression coefficients associated with each contrast. We compare these to the data shown above (Table \@ref(tab:cTab2Means)) to test whether the regression coefficients actually correspond to the differences of condition means, as intended. To define the contrast, it is necessary to remove the intercept term, as this is automatically added by the modeling function `brm()`.

```{r, echo = TRUE, message = FALSE, results = "hide"}
contrasts(df_contrasts2$F) <- XcSum[, 2:3]
fit_Sum <- brm(DV ~ F,
               data = df_contrasts2,
               family = gaussian(),
               prior = c(prior(normal(500, 100), class = Intercept),
                         prior(normal(0, 100), class = sigma),
                         prior(normal(0, 100), class = b)))
```
```{r, echo = TRUE}
fixef(fit_Sum)
```

The (Bayesian) linear model regression coefficients show the grand mean response time of $450$ ms in the intercept. Remember that the first regression coefficient `FcH01` was designed to estimate the extent to which adjectives are responded to faster than the grand mean. The regression coefficient `FcH01` (`Estimate`) of $-50$ reflects the difference between adjectives ($400$ ms) and the grand mean of $450$ ms. The second estimate of interest tells us the extent to which response times for nouns differ from the grand mean. The fact that the second regression coefficient `FcH02` is close to $50$ indicates that response times for nouns are ($500$ ms) slower than the grand mean of $450$ ms. Although the nouns are estimated to have $50$ ms longer reading times than the grand mean, the reading times for adjectives are $50$ ms faster than the grand mean.

We have now not only derived contrasts, parameter estimates, and comparisons for the sum contrast, we have also used a powerful and highly general procedure that is used to generate contrasts for many kinds of different comparisons and experimental designs.

### Generating contrasts: The \index{\texttt{hypr}} `hypr` package

To work with the four-step procedure, i.e., to flexibly design contrasts to estimate specific comparisons, we have developed the R package `hypr` [@rabe2020hypr]. This package allows the researcher to specify the desired comparisons, and based on these comparisons, it automatically generates contrast matrices that allow us to estimate these comparisons in linear models. The functions available in this package thus considerably simplify the implementation of the four-step procedure outlined above. Because `hypr` was originally written with the frequentist framework in mind, the comparisons are expressed as null hypotheses. In the Bayesian framework, these should be treated as comparisons between (bundles of) condition means.
To illustrate the functionality of the `hypr` package, we will use the two comparisons that we had defined and analyzed in the previous section:

\begin{equation}
\beta_1 = \mu_1 - \frac{\mu_1+\mu_2+\mu_3}{3} = \mu_1 - GM
\end{equation}

\noindent
and

\begin{equation}
\beta_2 = \mu_2 - \frac{\mu_1+\mu_2+\mu_3}{3} = \mu_2 - GM
\end{equation}

These estimates are effectively comparisons between condition means or between bundles of condition means. That is, both $\mu_1$ and $\mu_2$ are compared to the grand mean. These two comparisons can be directly entered into R using the `hypr()` function from the `hypr` package.
To do so, we use some labels to indicate factor levels. E.g., `adjectives`, `nouns`, and `verbs` can represent factor levels $\mu_1$, $\mu_2$, and $\mu_3$. The first comparison specifies that $\mu_1$ is compared to $\frac{\mu_1+\mu_2+\mu_3}{3}$. This can be written as a formula in R: `adjectives ~ (adjectives + nouns + verbs)/3`. The second comparison is that $\mu_2$ is compared to $\frac{\mu_1+\mu_2+\mu_3}{3}$, which can be written  as `nouns ~ (adjectives + nouns + verbs)/3`.

```{r loadHypr}
HcSum <- hypr(b1 = adjectives ~ (adjectives + nouns + verbs) / 3,
              b2 = nouns ~ (adjectives + nouns + verbs) / 3,
              levels = c("adjectives", "nouns", "verbs"))
HcSum
```

The results show that the comparisons between condition means have been re-written into a form where $0$ is coded on the left side of the equation, and the condition means together with associated weights are written on the right side of the equation. This presentation makes it easy to see the weights of the condition means to code a certain comparison. The next part of the results shows the hypothesis matrix, which contains the weights from the condition means. Thus, `hypr` takes formulas coding comparisons between condition means as input, and automatically extracts the corresponding weights and encodes them into the hypothesis matrix. `hypr` moreover applies the generalized matrix inverse to obtain the contrast matrix from the hypothesis matrix. The different steps correspond exactly to the steps we had carried out manually in the preceding section. `hypr` automatically performs these steps for us. We can now extract the contrast matrix by a simple function call: \index{\texttt{contr.hypothesis}}

```{r}
contr.hypothesis(HcSum)
```

We can assign this contrast to our factor as we did before.

```{r, echo = TRUE}
contrasts(df_contrasts2$F) <- contr.hypothesis(HcSum)
```

Now, we could again run the same  model. However, since the contrast matrix is now the same as used before, the results would also be exactly the same, and we therefore skip the model fitting for brevity.

The `hypr` package can be used to create contrasts for Bayesian models, where the focus lies on estimation of contrasts that code comparisons between condition means or bundles of condition means. (Of course, one can use contrast coding for carrying out hypothesis tests using the Bayes factor; see chapter \@ref(ch-bf); in this case, the priors specified can lead to very different conclusions.) Thus, the comparison that one specifies implies the estimation of a difference between condition means or bundles of condition means. We see this in the output of the `hypr()` function (see the first section of the results); these formulate the comparison in a way that also illustrates the estimation of model parameters. That is, the comparison (expressed in the `hypr` package's syntax) $\mu_1 \sim \frac{\mu_1+\mu_2+\mu_3}{3}$ corresponds to a parameter estimate of `b1 = 2/3*m1 - 1/3*m2 - 1/3*m3`, where $m1$ to $m3$ are the means for each of the conditions. The resulting contrasts will then allow us to estimate the specified differences between condition means or bundles of condition means.

## Other types of contrasts: illustration with a factor of four levels {#sec-4levelFactor}

Here, we introduce repeated difference, Helmert, and polynomial contrasts.^[If one is trying to set a general prior for differences between means, then the function `bayestestR::contr.equalprior()` can be used. The `bayestestR::contr.equalprior()` function can be used when one aims to set equal marginal priors for differences between means across all levels of a factor. This avoids the unequal prior distributions that can result from other contrast methods (e.g., `stats::contr.treatment()`) and supports correct Bayes factor estimation in multi-level factors. It is particularly useful for factors with more than two levels, following the orthogonal-normal contrasts described in @rouder2012default[p. 363]. For more information, see https://easystats.github.io/bayestestR/reference/contr.equalprior.html.]

Consider an experiment with one between-subject factor with four levels (the discussion below does not depend on the factor being between-subjects--the same logic will hold for within-subjects designs). We load a corresponding data set, which contains simulated data containing response times with a four-level between-subject factor.
The sample sizes for each level and the means and standard errors are shown in Table \@ref(tab:cTab3Means), and the means and standard errors are also shown graphically in Figure \@ref(fig:helmertsimdatFig).

```{r, echo = TRUE}
data("df_contrasts3")
```

```{r, echo = FALSE, message = FALSE}
table3 <- df_contrasts3 %>%
  group_by(F) %>%
  summarize(N = length(DV), M = mean(DV), SD = sd(DV), SE = SD / sqrt(N))
GM <- mean(table3$M) # Grand Mean
table3a <- as.data.frame(table3)
names(table3a) <- c("Factor", "N data", "Est. means", "Std. dev.", "Std. errors")
```

```{r helmertsimdatFig, fig = TRUE, include = TRUE, echo = FALSE, cache = FALSE, fig.height = 2.2, fig.cap = "The means and error bars (showing standard errors) for a simulated data set with one between-subjects factor with four levels."}
#(plot2 <- qplot(x = F, y = M, group = 1, data = table3, geom = c("point", "line")) +
(plot2 <- ggplot(aes(x = F, y = M, group = 1), data = table3) +
  geom_point() + geom_line() +
  geom_errorbar(aes(max = M + SE, min = M - SE), width = 0) +
  # scale_y_continuous(breaks = c(250,275,300)) +
  labs(y = "Mean DV", x = "Factor F"))
```

```{r echo = FALSE,cTab3Means, results = "asis"}
 kableExtra::kable(table3a %>%
  as.data.frame() %>%
    mutate(across(where(is.integer), ~ paste0("$",.x, "$"))) %>%
  mutate(across(where(is.double), ~ paste0("$",formatC(round(.x, 1),format = "f", digits = 1), "$"))), escape = FALSE,align = c("l",rep("r",4)),
 booktabs = TRUE,  vline = "",
  caption = "Summary statistics per condition for the simulated data."
)
```

We assume that the four factor levels $F1$ to $F4$ reflect levels of word frequency, including the levels `low`, `medium-low`, `medium-high`, and `high` frequency words, and that the dependent variable reflects response time.

Qualitatively, the simulated pattern of results is similar to empirically observed values for word frequency effects on single fixation durations in eye tracking [see Figure 5.4 in @heister2012analysing].
Normally, one may expect the pattern to be monotonically decreasing, i.e., higher word frequency should lead to faster reading times. However, we choose this more complex pattern because we think it may be helpful in understanding how polynomial and monotonic contrasts work (see below).

### \index{Repeated contrasts} Repeated contrasts {#repeatedcontrasts}

Arguably, the most popular contrast psychologists and psycholinguists are interested in is the comparison between neighboring levels of a factor. This type of contrast is called the repeated contrast. In our example, our research question might be whether the frequency level `low` leads to slower response times than frequency level `medium-low`, whether frequency level `medium-low` leads to slower response times than frequency level `medium-high`, and whether frequency level `medium-high` leads to slower response times than frequency level `high`.

Repeated contrasts are used to implement these comparisons. Consider first how to derive the contrast matrix for repeated contrasts, starting out by specifying the comparisons that are to be estimated. Importantly, this again applies the general strategy of how to translate (any) comparisons between groups or conditions into a set of contrasts, yielding a powerful tool of great value in many research settings. We follow the four-step procedure outlined above.

The first step is to specify our comparisons, and to write them down in a way such that their weights can be extracted easily. For a four-level factor, the three comparisons are:

\begin{equation}
\beta_{2-1} = -1 \cdot \mu_1 + 1 \cdot \mu_2 + 0 \cdot \mu_3 + 0 \cdot \mu_4
\end{equation}

\begin{equation}
\beta_{3-2} = 0 \cdot \mu_1 - 1 \cdot \mu_2 + 1 \cdot \mu_3 + 0 \cdot \mu_4
\end{equation}

\begin{equation}
\beta_{4-3} = 0 \cdot \mu_1 + 0 \cdot \mu_2 - 1 \cdot \mu_3 + 1 \cdot \mu_4
\end{equation}

Here, the $\mu_x$ are the mean response times in condition $x$. Each regression coefficient gives weights to the different condition means. For example, the first estimate ($\beta_{2-1}$) estimates the difference between condition mean for $F2$ ($\mu_2$) minus the condition mean for $F1$ ($\mu_1$), but ignores condition means for $F3$ and $F4$ ($\mu_3$, $\mu_4$). $\mu_1$ has a weight of $-1$, $\mu_2$ has a weight of $+1$, and $\mu_3$ and $\mu_4$ have weights of $0$.

We can write these comparisons into `hypr`:

```{r, echo = TRUE}
HcRep <- hypr(c2vs1 = F2 ~ F1,
              c3vs2 = F3 ~ F2,
              c4vs3 = F4 ~ F3,
              levels = c("F1", "F2", "F3", "F4"))
HcRep
```

The hypothesis matrix shows exactly the weights that we had written down above. Moreover, we see the contrast matrix. In the case of the repeated contrast, the contrast matrix again looks very different from the hypothesis matrix. In this case, the contrast matrix looks a lot less intuitive than the hypothesis matrix, and if one did not know the associated hypothesis matrix, it seems unclear what the contrast matrix would actually test. To verify this custom-made contrast matrix, we compare it to the repeated contrast matrix as generated by the R function `contr.sdif()` in the `MASS` package [@R-MASS]. The resulting contrast matrix is identical to our result:

```{r, echo = TRUE}
fractions(contr.sdif(4))
```

We can thus use either approach (`hypr()` or \index{\texttt{contr.sdif}} `contr.sdif()`) to obtain the contrast matrix in this case.
Next, we apply the repeated contrasts to the factor $F$ in the example data and run a linear model. This allows us to estimate the regression coefficients associated with each contrast. These are compared to the data in Figure\ \@ref(fig:helmertsimdatFig) to test whether the regression coefficients actually correspond to the differences between successive condition means, as intended.

```{r, echo = TRUE, message = FALSE, results = "hide"}
contrasts(df_contrasts3$F) <- contr.hypothesis(HcRep)
fit_Rep <- brm(DV ~ F,
               data = df_contrasts3,
               family = gaussian(),
               prior = c(prior(normal(20, 50), class = Intercept),
                         prior(normal(0, 50), class = sigma),
                         prior(normal(0, 50), class = b)))
```
```{r, echo = TRUE}
fixef(fit_Rep)
```

The results show that as expected, the regression coefficients reflect the differences that were of interest: the regression coefficient (`Estimate`) `Fc2vs1` has a value of approximately $10$, which corresponds to the difference between the condition mean for $F2$ ($20$) minus the condition mean for $F1$ ($10$), i.e., $20 - 10 = 10$. Likewise, the regression coefficient `Fc3vs2` has a value of approximately $-10$, which corresponds to the difference between the condition mean for $F3$ ($10$) minus the condition mean for $F2$ ($20$), i.e., $10 - 20 = -10$. Finally, the regression coefficient `Fc4vs3` has a value of roughly $30$, which reflects the difference between condition $F4$ ($40$) minus condition $F3$ ($10$), i.e., $40 - 10 = 30$. Thus, the regression coefficients estimate differences between successive or neighboring condition means.

### \index{Helmert contrasts} Helmert contrasts {#helmertcontrasts}

Another common contrast is the Helmert contrast. In a Helmert contrast for our four-level factor, the first contrast compares level $F1$ versus $F2$. The second contrast compares level $F3$ to the average of the first two, i.e., `F3 ~ (F1+F2)/2`. The third contrast then compares level $F4$ to the average of the first three. We can code this contrast in `hypr`:

```{r}
HcHel <- hypr(b1 = F2 ~ F1,
              b2 = F3 ~ (F1 + F2) / 2,
              b3 = F4 ~ (F1 + F2 + F3) / 3,
              levels = c("F1", "F2", "F3", "F4"))
HcHel
```

The classical Helmert contrast coded by the function \index{\texttt{contr.helmert}} `contr.helmert()` yields a similar but slightly different result:

```{r}
contr.helmert(4)
```

These contrasts are scaled versions of our custom Helmert contrast. I.e., the first column of our custom Helmert contrast has to be multiplied by 2 to get the classical version, the second column has to be multiplied by 3, and the fourth column has to be multiplied by 4 to get to our custom Helmert contrast.  In our opinion, the custom Helmert contrast defined using the `hypr` function is more appropriate and intuitive to use. Probably the only reason the classical Helmert contrast uses these scaled differences is that the rescaling yields an easier contrast matrix, which consists of integers rather than fractions.  The estimates from our custom Helmert contrast seem much more relevant in Bayesian approaches today. This is because in our custom Helmert contrast, the coefficients estimate the differences between (groups of) conditions (rather than scaled versions of these differences). That means that we can set the priors intuitively (as priors on differences between (groups of) conditions - rather than on scaled differences) and that we can accordingly interpret the posterior much more straightforwardly. We apply the custom Helmert contrast here:

```{r, echo = TRUE, message = FALSE}
contrasts(df_contrasts3$F) <- contr.hypothesis(HcHel)
fit_Hel <- brm(DV ~ F,
               data = df_contrasts3,
               family = gaussian(),
               prior = c(prior(normal(20, 50), class = Intercept),
                         prior(normal(0, 50), class = sigma),
                         prior(normal(0, 50), class = b)))
```
```{r, echo = TRUE}
fixef(fit_Hel)
```

When we fit the Bayesian model using our custom Helmert contrast, we can see that the estimates reflect the comparisons outlined above. The first estimate `Fb1` has a value of roughly $10$, reflecting the difference between conditions $F1$ and $F2$. The second estimate `Fb2` has a value of $5$, which reflects the difference between condition $F3$ ($10$) and the average of the first two conditions ($(10+20)/2 = 15$). The estimate `Fb3` reflects the difference between $F4$ ($40$) minus the average of the first three, which is $(10+20+10)/3 = 13.3$, and is thus $40-13.3 = 26.7$.

Here, we can see that the generalized matrix inverse can be used to create contrasts that are not exactly equivalent to the ones provided by R per default (i.e., here `contr.helmert()`). Another example for this is discussed in online section \@ref(app-cTreatGM), which implements a treatment contrast, where the intercept captures the grand mean.

### Contrasts in linear regression analysis: The design or \index{Model matrix} model matrix

We have discussed how different contrasts are created from the hypothesis matrix. What we have not treated in detail is how exactly contrasts are used in a linear model. Here, we will see that the contrasts for a factor in a linear model are just the same thing as continuous \index{Numeric predictor} numeric predictors (i.e., \index{Covariate} covariates) in a linear/multiple regression analysis. That is, contrasts are the way to encode \index{Discrete factor levels} discrete factor levels into numeric predictor variables to use in linear/multiple regression analysis, by encoding which differences between factor levels are estimated.
The contrast matrix $X_c$ that we have looked at so far has one entry (row) for each experimental condition. For use in a linear model, the contrast matrix is coded into a design or model matrix $X$, where each individual data point has one row. The \index{Design matrix} design matrix $X$ can be extracted using the function \index{\texttt{model.matrix}} `model.matrix()`:

```{r, echo = TRUE}
# contrast matrix:
(contrasts(df_contrasts3$F) <- contr.hypothesis(HcRep))
# design/model matrix:
covars <- model.matrix(~ 1 + F, df_contrasts3)
(covars <- as.data.frame(covars))
```

For each of the $20$ subjects, four numbers are stored in this model matrix. They represent the three values of three predictor variables used to predict response times in the task. Indeed, this matrix is exactly the design matrix $X$ commonly used in multiple regression analysis, where each column represents one numeric predictor variable (covariate), and the first column codes the intercept term.

To further illustrate this, the covariates are extracted from this design matrix and stored separately as numeric predictor variables in the data frame:

```{r, echo = TRUE}
df_contrasts3[, c("Fc2vs1", "Fc3vs2", "Fc4vs3")] <- covars[, 2:4]
```

They are now used as numeric predictor variables in a multiple regression analysis:

```{r, echo = TRUE, message = FALSE, results = "hide"}
fit_m3 <- brm(DV ~ Fc2vs1 + Fc3vs2 + Fc4vs3,
              data = df_contrasts3,
              family = gaussian(),
              prior = c(prior(normal(20, 50), class = Intercept),
                        prior(normal(0, 50), class = sigma),
                        prior(normal(0, 50), class = b)))
```
```{r, echo = TRUE}
fixef(fit_m3)
```

The results show that the regression coefficients are the same as in the contrast-based analysis shown in the previous section (on repeated contrasts). This demonstrates that contrasts serve to code discrete factor levels into a linear/multiple regression analysis by numerically encoding comparisons between specific condition means, or groups of condition means.

### \index{Polynomial contrasts} Polynomial contrasts {#polynomialContrasts}

Polynomial contrasts are another option for analyzing factors. Suppose that we expect a \index{Linear trend} linear trend across conditions, where the response increases by a constant magnitude with each successive factor level. This could be the expectation when four levels of a factor reflect decreasing levels of word frequency (i.e., four factor levels: high, medium-high, medium-low, and low word frequency), where one expects the fastest response for high frequency words, and successively slower responses for lower word frequencies. The effect for each individual level of a factor (e.g., as coded via a repeated contrast) may not be strong enough for detecting it in the statistical model. Specifying a linear trend in a polynomial contrast (see effect `F.L` below) allows us to pool the whole increase (across all four factor levels) into a single coefficient for the linear trend, increasing statistical sensitivity for estimating/detecting the increase. Such a specification constrains the estimate to one interpretable parameter, e.g., a linear increase across factor levels. The larger the number of factor levels, the more parsimonious are polynomial contrasts compared to contrast-based specifications as introduced in the previous sections. Going beyond a linear trend, one may also have expectations about \index{Quadratic trend} quadratic trends (see the estimate for `F.Q` below). For example, one may expect an increase only among very low frequency words, but no difference between high and medium-high frequency words.

Here is an example for how to code polynomial contrasts for a four-level factor. In this case, one can estimate a linear (`F.L`), a quadratic (`F.Q`), and a cubic (`F.C`) trend. If more factor levels are present, then higher order trends can be estimated.

```{r, echo = TRUE}
Xpol <- contr.poly(4)
(contrasts(df_contrasts3$F) <- Xpol)
```
```{r, echo = TRUE, message = FALSE, results = "hide"}
fit_Pol <- brm(DV ~ F,
               data = df_contrasts3,
               family = gaussian(),
               prior = c(prior(normal(20, 50), class = Intercept),
                         prior(normal(0, 50), class = sigma),
                         prior(normal(0, 50), class = b)))
```
```{r, echo = TRUE}
fixef(fit_Pol)
```

In this example, condition means increase across factor levels in a linear fashion: if one draws a linear regression line through all condition means, this line will increase towards the right. However, there may also be quadratic and cubic trends.

However, one difficulty with the `contr.poly()` function is that the scaling of the predictor variables is not very clear, making it difficult to define priors. Therefore, an alternative can be to use custom polynomial contrasts:

```{r, echo = TRUE}
Xpol2           <- data.frame(linear = c(F1 = -3,
                                         F2 = -1,
                                         F3 = 1,
                                         F4 = 3) / 2)
Xpol2$quadratic <- Xpol2$linear^2 - mean(Xpol2$linear^2)
Xpol2$cubic     <- Xpol2$linear^3 - mean(Xpol2$linear^3)
```

Because the linear and cubic trends are highly correlated, we orthogonalize the two:

```{r, echo = TRUE}
Xpol2$cubic <- resid(lm(cubic ~ linear, Xpol2))
(contrasts(df_contrasts3$F) <- as.matrix(Xpol2))
```

```{r, echo = TRUE, message = FALSE, results = "hide"}
fit_Pol2 <- brm(DV ~ F,
               data = df_contrasts3,
               family = gaussian(),
               prior = c(prior(normal(20, 50), class = Intercept),
                         prior(normal(0, 50), class = sigma),
                         prior(normal(0, 50), class = b)))
```
```{r, echo = TRUE}
fixef(fit_Pol2)
```

The effects are now on interpretable scales.

### An alternative to contrasts: \index{Monotonic effects} Monotonic effects

An alternative to specifying contrasts to estimate specific comparisons between factor levels is monotonic effects [https://paul-buerkner.github.io/brms/articles/brms_monotonic.html; @burkner2020modelling]. This simply assumes that the dependent variable increases (or decreases) in a monotonic fashion across levels of an ordered factor. In this kind of analysis, one does not define contrasts specifying differences between (groups of) factor levels. Instead, one estimates one parameter which captures the average increase (or decrease) in the dependent variable associated with two neighboring factor levels. Moreover, one estimates the percentages of the overall increase (or decrease) that is associated with each of the differences between neighboring factor levels (i.e., similar to simple difference contrasts, but measured in percentage increase, and assuming monotonicity, i.e., that the same increase or decrease is present for all simple differences).

To implement a monotonic analysis, we first code the factor $F$ as being an ordered factor (i.e.,`ordered = TRUE`; remember that the default ordering is alphabetical and may need to be changed). Then, we specify that we want to estimate a monotonic effect of $F$ using the notation `mo(F)` in our call to `brms`:

```{r, echo = TRUE, message = FALSE, results = "hide"}
df_contrasts3$F <- factor(df_contrasts3$F, ordered = TRUE)
fit_mo <- brm(DV ~ 1 + mo(F),
              data = df_contrasts3,
              family = gaussian(),
              prior = c(prior(normal(20, 50), class = Intercept),
                        prior(normal(0, 50), class = sigma),
                        prior(normal(0, 50), class = b)))
```
```{r, eval = FALSE}
fit_mo
```

```{r, echo = FALSE}
short_summary(fit_mo)
```

The results show that there is an overall positive population-level effect of the factor $F$ with an estimate (`moF`) of $`r round(fixef(fit_mo)[2,1],2)`$, reflecting an average increase in the dependent variables of  $`r round(fixef(fit_mo)[2,1],2)`$ with each level of $F$. The model summary shows estimates for the simplex parameters, which represent the ratios of the overall increase associated with $F$ that can be attributed to each of the differences between neighboring factor levels. The results show that most of the increase is associated with `moF1[3]`, i.e., with the last difference, reflecting the difference between $F3$ and $F4$, whereas the other two differences (`moF1[1]`, reflecting the difference between $F1$ and $F2$, and `moF1[2]`, reflecting the difference between $F2$ and $F3$) are smaller. Comparing conditional effects (i.e., the condition means estimated by the model) between a model using polynomial contrasts and a model assuming monotonic effects makes it clear that the current model "forces" the effects to increase in a monotonic fashion; see Figure \@ref(fig:condmopol). \index{\texttt{conditional\_effects}}

```{r condmopol, out.width = '48%', fig.show = "hold", fig.cap = "Conditional effects using the polynomial contrasts on the left side vs. assuming monotonic effects on the right side.", tidy = FALSE, fig.width = 2.2, fig.height = 2.2, warning = FALSE, message = FALSE}
ppoly <- conditional_effects(fit_Pol)
pmon  <- conditional_effects(fit_mo)
plot(ppoly, plot = FALSE)[[1]] + ggtitle("Polynomial contrasts")
plot(pmon, plot = FALSE)[[1]] + ggtitle("Monotonic effects")
```

This is regardless of the information provided in the data; see the posterior predictive checks in Figure \@ref(fig:checkmopol).

\index{Posterior predictive distribution}

```{r checkmopol, out.width = '48%', fig.show = "hold", fig.cap = "Posterior predictive distributions by condition using the polynomial contrasts on the left side vs. assuming monotonic effects on the right side.", tidy = FALSE, fig.width = 2.2, fig.height = 2.7, message = FALSE, warning = FALSE, fold = TRUE}
pp_check(fit_Pol, type = "violin_grouped",
         group = "F", y_draw = "points") +
  theme(legend.position = "bottom")+
  coord_cartesian(ylim = c(-55, 105)) +
  ggtitle("Polynomial contrasts")
pp_check(fit_mo, type = "violin_grouped",
         group = "F", y_draw = "points") +
  theme(legend.position = "bottom") +
  coord_cartesian(ylim = c(-55, 105)) +
  ggtitle("Monotonic effects")
```

The monotonicity assumption is violated in the current data set, since the mean is larger in condition $F2$ than in condition $F3$. The monotonic model thus assumes this (negative) difference is due to chance; see Figure \@ref(fig:checkmopol).

Estimating such monotonic effects provides an alternative to the contrast coding we treat in the rest of this chapter. It may be relevant when the specific differences between factor levels are not of interest, but when instead the goal is to estimate the overall monotonic effect of a factor and when this overall effect is not well approximated by a simple linear trend.

## What makes a good set of contrasts? {#nonOrthogonal}

For a factor with $I$ levels one can make only $I-1$ linearly independent comparisons within a single model. For example, in a design with one factor with two levels, only one comparison is possible (between the two factor levels). The reason for this is that the intercept is also estimated. More generally, if we have a factor with $I_1$ and another factor with $I_2$ levels, then the total number of conditions is $I_1\times I_2 = \nu$ (not $I_1 + I_2$), which implies a maximum of $\nu-1$  contrasts.

For example, in a design with one factor with three levels, $A$, $B$, and $C$, in principle one could make three comparisons ($A$ vs. $B$, $A$ vs. $C$, $B$ vs. $C$).
However, after defining an intercept, only two means can be compared (or two linear combinations of means). Therefore, for a factor with three levels, we define two comparisons within one statistical model.

One critical precondition for contrasts is that they implement different comparisons that are not \index{Collinear} collinear, that is, that none of the contrasts can be generated from the other contrasts by linear combination. For example, the contrast `c1 = c(1, 2, 3)` can be generated from the contrast `c2 = c(3, 4, 5)` simply by computing `c2 - 2`. Therefore, contrasts c1 and c2 cannot be used simultaneously. That is, each contrast needs to encode some independent information about the data.

There are (at least) three criteria to decide what a good contrast is. First, *orthogonal contrasts* have advantages as they estimate mutually independent comparisons in the data [see @dobson2011introduction, section 6.2.5, p. 91 for a detailed explanation of orthogonality]. Second, it is crucial that contrasts are defined in a way such that they answer the research questions. One way to accomplish this second point is to use the hypothesis matrix to generate contrasts (e.g., via the `hypr` package), as this ensures that one uses contrasts that exactly estimate the comparisons of interest in a given study. Third, in the Bayesian context, the scaling of contrasts is important, and has to be considered when defining priors: the scaling of the priors has to be tailored to the scaling of each individual contrasts. In situations where we define the same prior for all coefficients of a factor, the scaling of the contrasts has to be set accordingly.

### \index{Centered contrasts} Centered contrasts

Contrasts are often constrained to be centered, such that the individual contrast coefficients $c_i$ for different factor levels $i$ sum to $0$: $\sum_{i = 1}^I c_i = 0$. This has advantages when estimating interactions with other factors or covariates (we discuss interactions between factors in the next chapter).
All contrasts discussed here are centered except for the treatment contrast, in which the contrast coefficients for each contrast do not sum to zero:

```{r, echo = TRUE}
colSums(contr.treatment(4))
```

Other contrasts, such as repeated contrasts, are centered and the contrast coefficients for each contrast sum to $0$:

```{r, echo = TRUE}
colSums(contr.sdif(4))
```

The contrast coefficients mentioned above appear in the contrast matrix. The weights in the hypothesis matrix are always centered. This is also true for the treatment contrast. The reason is that they code comparisons between conditions or bundles of conditions.
The only exception are the weights for the intercept, which are all the same and together always sum to $1$ in the hypothesis matrix. This is done to ensure that when applying the generalized matrix inverse, the intercept results in a constant term with values of $1$ in the contrast matrix.
An important question concerns whether (or when) the intercept needs to be considered in the generalized matrix inversion, and whether (or when) it can be ignored. This question is closely related to  orthogonal contrasts, a concept we turn to below.

### \index{Orthogonal contrasts} Orthogonal contrasts

Two centered contrasts $c_1$ and $c_2$ are orthogonal to each other if the following condition applies. Here, $i$ is the $i$-th cell of the vector representing the contrast.

\begin{equation}
\sum_{i = 1}^I c_{1,i} \cdot c_{2,i} = 0
\end{equation}

Whether contrasts are orthogonal can often be determined easily by computing the correlation between two contrasts. Orthogonal contrasts have a correlation of $0$.

For example, coding two factors in a $2 \times 2$ design (we return to this case in a section on designs with two factors below) using sum contrasts, these \index{Sum contrasts} sum contrasts and their \index{Interaction} interaction are orthogonal to each other:

```{r, echo = TRUE}
(Xsum <- cbind(F1 = c(1, 1, -1, -1),
               F2 = c(1, -1, 1, -1),
               F1xF2 = c(1, -1, -1, 1)))
cor(Xsum)
```

\noindent
The correlations between the different contrasts (i.e., the off-diagonals) are exactly $0$. Notice that sum contrasts coding one multi-level factor are not orthogonal to each other:

```{r, echo = TRUE}
cor(contr.sum(4))
```

\noindent
Here, the correlations between individual contrasts, which appear in the off-diagonals, deviate from $0$, indicating non-orthogonality. The same is also true for treatment and repeated contrasts:

```{r, echo = TRUE}
cor(contr.sdif(4))
cor(contr.treatment(4))
```

Orthogonality of contrasts plays a critical role when computing the \index{Generalized inverse} generalized inverse. In the inversion operation, orthogonal contrasts are converted independently from each other. That is, the presence or absence of another orthogonal contrast does not change the resulting weights. In fact, for orthogonal contrasts, applying the generalized matrix inverse to the hypothesis matrix simply furnishes a scaled version of the hypothesis matrix in the contrast matrix [for mathematical details see @schadHowCapitalizePriori2020].

In Bayesian models, scaling is always important, since we need to interpret the scale in order to define priors or interpret posteriors. Therefore, when working with contrasts in Bayesian models, the generalized matrix inverse is always a good procedure to use.

### The role of the \index{Intercept} intercept in \index{Non-centered contrasts} non-centered contrasts

A related question concerns whether the intercept needs to be considered when computing the generalized inverse for a contrast. This is of key importance when using the generalized matrix inverse to define contrasts: the resulting contrast matrix and also the definition of estimates can completely change between a situation where the intercept is explicitly considered or not considered, and can thus change the resulting estimates in possibly unintended ways. Thus, if the definition of the intercept is incorrect, the estimates of slopes may also be wrong.

More specifically, it turns out that considering the intercept is necessary for contrasts that are not centered. This is the case for treatment contrasts which are not centered; e.g., the treatment contrast for two factor levels `c1vs0 = c(0, 1)`: $\sum_i c_i = 0 + 1 = 1$. One can actually show that the formula to determine whether contrasts are centered (i.e., $\sum_i c_i = 0$) is the same formula as the formula to test whether a contrast is "orthogonal to the intercept." Remember that for the intercept, all contrast coefficients are equal to one: $c_{1,i} = 1$ (here, $c_{1,i}$ indicates the vector of contrast coefficients associated with the intercept). We enter these contrast coefficient values into the formula testing whether a contrast is orthogonal to the intercept (here, $c_{2,i}$ indicates the vector of contrast coefficients associated with some contrast for which we want to test whether it is "orthogonal to the intercept"): $\sum_i c_{1,i} \cdot c_{2,i} = \sum_i 1 \cdot c_{2,i} = \sum_i c_{2,i} = 0$. The resulting formula is: $\sum_i c_{2,i} = 0$, which is exactly the constraint that needs to be satisfied for a contrast to be centered. Because of this analogy, treatment contrasts can be viewed to be `not orthogonal to the intercept.' This means that the intercept needs to be considered when computing the generalized inverse for treatment contrasts. As we have discussed above, when the intercept is included in the hypothesis matrix, the weights for this intercept term should sum to one, as this yields a column of ones for the intercept term in the contrast matrix.

We can see that considering the intercept makes a difference for the treatment contrast. First, we define the comparisons involved in a treatment contrast, where two experimental conditions `b` and `c` are each compared to a baseline condition `a` (`b~a` and `c~a`). In addition, we explicitly code the intercept term, which involves a comparison of the baseline to 0 (`a~0`). We take a look at the resulting contrast matrix:

```{r}
hypr(int = a ~ 0, b1 = b ~ a, b2 = c ~ a)
contr.treatment(c("a", "b", "c"))
```

This shows a contrast matrix that we know from the treatment contrast. The intercept is coded as a column of ones. And each of the comparisons is coded as a $1$ in the condition which is compared to the baseline, and a $0$ in other conditions. The point here is that this gives us the contrast matrix that is expected and known for the treatment contrast.

We can also ignore the intercept in the specification of the comparisons:

```{r}
hypr(b1 = m1 ~ m0, b2 = m2 ~ m0)
```

Notice that the resulting contrast matrix now looks very different from the contrast matrix that we know from the treatment contrast. Indeed, this contrast also estimates a reasonable set of quantities. It again estimates whether the condition mean `m1` differs from the baseline and whether `m2` differs from baseline. However, the intercept now estimates the average dependent variable across all three conditions (i.e., the grand mean). This can be seen by explicitly adding a comparison of the average of all three conditions to $0$:

```{r}
hypr(int = (m0 + m1 + m2) / 3 ~ 0, b1 = m1 ~ m0, b2 = m2 ~ m0)
```

The last two columns of the resulting contrast matrix are now the same as when the intercept was ignored, which confirms that the two columns encode the same comparison.

## Computing condition means from estimated contrasts

As mentioned earlier, one advantage of Bayesian modeling is that based on the posterior samples, it is possible to very flexibly compute new comparisons and estimates. Above (see section\ \@ref(sec-cellMeans)), we had discussed the case where the Bayesian model estimated the condition means instead of contrasts by removing the intercept from the `brms` model (the formula in `brms` was: `DV ~ -1 + F`). This allowed us to get posterior samples from each condition mean, and then to compute any possible comparison between condition means by subtracting the corresponding samples.

Importantly, posterior samples for the condition means can also be obtained after fitting a model with contrasts. We illustrate this here for the case of sum contrasts. Let's use our above example of a design where we assess response times (in milliseconds, `DV`) for three different word classes adjectives, nouns, and verbs, that is, for a 3-level factor $F$. In the above example, factor $F$ was coded using a sum contrast, where the first contrast coded the difference of adjectives from the grand mean, and the second contrast coded the difference of nouns from the grand mean. This was the corresponding contrast matrix:

```{r, echo = TRUE}
contrasts(df_contrasts2$F) <- contr.hypothesis(HcSum)
contrasts(df_contrasts2$F)
```

We had estimated a `brms` model for this data. The posterior estimates show results for the intercept (which is estimated to be $450$ ms) and for our two coded comparisons. The effect `FcH01` codes our first comparison that response times for adjectives differ from the grand mean, and show an estimate that response times for adjectives are about $50$ ms shorter than the grand mean. Moreover, the effect `FcH02` codes our second comparison that response times for nouns differ from the grand mean, and show the estimate that response times for nouns are $50$ ms longer than the grand mean.

```{r, echo = TRUE}
fixef(fit_Sum)
```

Of course other comparisons might be of interest to us as well. For example, we might be interested in estimating how strongly response times for verbs differ from the grand mean.

To do so, one possible first step is to obtain the posteriors for the response times in each of the three conditions. How can this be done? The first step is to again extract the posterior samples from the model:

```{r}
df_postSamp_Sum <- as_draws_df(fit_Sum)
```

We can see the samples for our first contrast (`b_FcH01`) and for our second contrast (`b_FcH02`). How can we now compute the posterior samples for each of the condition means, i.e., for adjectives, nouns, and verbs? For this, we need to take another look at the contrast matrix.

```{r, echo = TRUE}
contrasts(df_contrasts2$F)
```

It tells us how the condition means are computed. For adjectives (see the first row of the contrast matrix), we can see that the response time is computed by taking $1$ times the coefficient for `b1` (i.e., `FcH01`) and $0$ times the coefficient for `b2` (i.e., `FcH02`). Thus, response times for adjectives are simply the samples for the `b1` (i.e., `FcH01`) contrast. The contrast matrix does not show the intercept term, which is implicitly added. Thus, we also have to add the estimates for the intercept. Thus, the condition mean for adjectives is computed as follows:

```{r}
df_postSamp_Sum <- df_postSamp_Sum %>%
  mutate(b_adjectives = b_Intercept + b_FcH01)
```

Similarly, we can obtain the posterior samples for the response times for nouns. The computation can be seen from the second row of the contrast matrix, which shows that the contrast `b1` (i.e., `FcH01`) has weight $0$ times, whereas the contrast `b2` (i.e., `FcH02`) has weight $1$. Adding the intercept thus gives:

```{r}
df_postSamp_Sum <- df_postSamp_Sum %>%
  mutate(b_nouns = b_Intercept + b_FcH02)
```

Finally, we want to obtain posterior samples for the average response times for verbs. For verbs, the third row of the contrast matrix shows two times a $-1$. Thus, contrasts `b1` (i.e., `FcH01`) and `b2` (i.e., `FcH02`) have to be subtracted from the intercept:

```{r}
df_postSamp_Sum$b_verbs <-
  df_postSamp_Sum$b_Intercept - df_postSamp_Sum$b_FcH01 -
  df_postSamp_Sum$b_FcH02
```

This yields posterior samples for the mean response times for verbs.

We can now look at the posterior means and 95\% credible intervals for adjectives, nouns, and verbs by computing the means and quantiles across all computed samples.

```{r, results = "hide", message = FALSE}
postTab <- df_postSamp_Sum %>%
  # remove the meta-data:
  as.data.frame() %>%
  select(b_adjectives, b_nouns, b_verbs) %>%
  # transform from wide to long with tidyr:
  pivot_longer(cols = everything(),
               names_to = "condition",
               values_to = "samp") %>%
  group_by(condition) %>%
  summarize(post_mean = mean(samp),
            `2.5%` = quantile(samp, p = 0.025),
            `97.5%` = quantile(samp, p = 0.975))
```
```{r, echo = TRUE}
postTab
```

The results show that as expected the posterior mean for adjectives is $400$ ms, for nouns it is $500$ ms, and for verbs, the posterior mean is $450$ ms. Moreover, we have now posterior credible intervals for each of these estimates.

In fact, `brms` has a very convenient built-in function that allows us to compute these nested effects automatically (`robust = FALSE` shows the posterior mean; by default `brms` shows the posterior median). Notice that you need to add a `[]` after the function call, otherwise `brms` will plot the results.

```{r}
conditional_effects(fit_Sum, robust = FALSE)[]
```

The same function allows us to visualize the effects, as shown in Figure \@ref(fig:cFigCond).

```{r cFigCond, fig = TRUE, include = TRUE, echo = TRUE, cache = FALSE, fig.height = 2.2, fig.cap = "Estimated condition means, computed from a `brms` model fitted with a sum contrast."}
conditional_effects(fit_Sum, robust = FALSE)
```

Coming back to our hand-crafted computations, the posterior samples can be used to compute additional comparisons. For example, we might be interested in how much response times for verbs differ from the grand mean. This can be computed based on the samples for the condition means: we first compute the grand mean from the three condition means,  and then we compare this to the estimate for verbs.

```{r, echo = TRUE}
df_postSamp_Sum <- df_postSamp_Sum %>%
  mutate(GM = (b_adjectives + b_nouns + b_verbs) / 3,
         b_FcH03 = b_verbs - GM)
c(post_mean = mean(df_postSamp_Sum$b_FcH03),
  quantile(df_postSamp_Sum$b_FcH03, p = c(0.025, 0.975)))
```

The results show that reading times for verbs are quite the same as the grand mean, with a posterior mean estimate for the differences of nearly $0$ ms, and with a 95\% credible interval ranging between $-20$ and $+20$ ms.

The key message here is that based on the contrast matrix, it is possible to compute posterior samples for the condition means, and then to compute any arbitrary further comparisons or contrasts. We want to stress again that just obtaining the posterior distribution of a comparison does not allow us to argue that we have evidence for the effect; to argue that we have evidence for an effect being present/absent, we need Bayes factors. But the approach we outline above does allow us to obtain posterior means and credible intervals for arbitrary comparisons. However, different combinations of contrasts and priors may yield different results, especially when they are used to compute Bayes factors, due to their sensitivity for the priors.

We briefly show how to compute posterior samples for condition means for one more example contrast, namely for repeated contrasts. Here, the contrast matrix is:

```{r, echo = TRUE}
contrasts(df_contrasts3$F) <- contr.hypothesis(HcRep)
contrasts(df_contrasts3$F)
```

The model estimates were:

```{r, echo = TRUE}
fixef(fit_Rep)
```

We first obtain the posterior samples for the contrasts:

```{r, echo = TRUE}
df_postSamp_Rep <- as.data.frame(fit_Rep)
```

Then we compute the posterior samples for condition $F1$. First, we have to add the intercept. Then, we can see in the contrast matrix that to compute the condition mean for $F1$, we have to add up all contrasts, using the weights `c(-3/4, -1/2, -1/4)` for each of the three contrasts (see first row of the contrast matrix). Thus, the posterior samples are computed as follows:

```{r, echo = TRUE}
df_postSamp_Rep <- df_postSamp_Rep %>%
  mutate(b_F1 = b_Intercept +
           -3 / 4 * b_Fc2vs1 +
           -1 / 2 * b_Fc3vs2 +
           -1 / 4 * b_Fc4vs3)