statsres-v2/01-06-LMEM.qmd at main · PsyTeachR/statsres-v2 · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
# Introduction to linear mixed effects models

In this chapter, we will explore linear mixed effects models by simulating data. Experimental designs that sample both subjects and stimuli from a larger population need to account for random effects of both subjects and stimuli using mixed-effects models. However, much of this research is analyzed using analysis of variance on aggregated responses because researchers are not confident specifying and interpreting mixed-effects models. This chapter will explain how to simulate data with random-effects structure (using the faux R package) and analyze the data using linear mixed-effects regression (with the lme4 R package), with a focus on interpreting the output in light of the simulated parameters. You will also learn to calculate power using simulation.

## Learning objectives

By the end of this chapter, you should be able to:

1. Understand how data are structured for a linear mixed effects model

2. Be able to simulate data with a cross-classified structure

3. Understand how to run and interpret a linear mixed effects model

4. Run a power calculation for LMEM

To follow along to this chapter and try the code yourself, please download the Rmd file we will be using in [this zip file](data/LMEM_data.zip).

## Packages and the data sets

We will simulate data using the faux package.

```{r setup, message=FALSE, warning=FALSE}
library(tidyverse)   # for data wrangling, pipes, and good dataviz
library(afex)        # for mixed effect models
library(broom.mixed) # for getting tidy data tables from mixed models
library(faux)        # for simulating correlated variables
```

## Simulation

First, we need to generate some data with the right structure for a mixed effects analysis. Here, we will simulate data for a Stroop task where subjects (`sub`) say the colour of colour words (`stim`) shown in each of two versions (`congruent` and `incongruent`). Subjects are in one of two conditions (`hard` or `easy`). The dependent variable is reaction time (`rt``).

We expect people to have faster reaction times for congruent stimuli than incongruent stimuli (main effect of version) and to be faster in the easy condition than the hard condition (main effect of condition). We'll look at some different interaction patterns below.

### Random Factors

First, set up the overall structure of your data by specifying the number of observations for each random factor. Here, we have a crossed design, so each subject responds to each stimulus. We'll set the numbers to small numbers as a demo first.

```{r}
sub_n  <- 2 # number of subjects in this simulation
stim_n  <- 2 # number of stimuli in this simulation

dat <- add_random(sub = sub_n) |>
  add_random(stim = stim_n)

dat
```

### Fixed Factors

Next, add the fixed factors. Specify if they vary between one of the random factors and specify the names of the levels.

Each subject is in only one condition, so the code below assigns half `easy` and half `hard`. You can change the proportion of subjects assigned each level with the `.prob` argument.

Stimuli are seen in both `congruent` and `incongruent` versions, so this will double the number of rows in our resulting data set.

```{r}
sub_n  <- 2 # number of subjects in this simulation
stim_n  <- 2 # number of stimuli in this simulation

dat <- add_random(sub = sub_n) |>
  add_random(stim = stim_n) |>
  add_between(.by = "sub", condition = c("easy","hard")) |>
  add_within(version = c("congruent", "incongruent"))

dat
```


### Contrast Coding

To be able to calculate the dependent variable, you need to recode categorical variables into numbers. Use the helper function `add_contrast()` for this. The code below creates anova-coded versions of `condition` and `version`. Luckily for us, the factor levels default to a sensible order, with "easy" predicted to have a faster (lower) reactive time than "hard", and "congruent" predicted to have a faster RT than "incongruent", but we can also customise the order of levels with `add_contrast()`; see the [contrasts vignette](https://debruine.github.io/faux/articles/contrasts.html){target="_blank"} for more details.

```{r}
sub_n  <- 2 # number of subjects in this simulation
stim_n  <- 2 # number of stimuli in this simulation

dat <- add_random(sub = sub_n) |>
  add_random(stim = stim_n) |>
  add_between(.by = "sub", condition = c("easy","hard")) |>
  add_within(version = c("congruent", "incongruent")) |>
  add_contrast("condition") |>
  add_contrast("version")

dat
```

The function defaults to very descriptive names that help you interpret the fixed factors. Here, "condition.hard-easy" means the main effect of this factor is interpreted as the RT for hard trials minus the RT for easy trials, and "version.incongruent-congruent" means the main effect of this factor is interpreted as the RT for incongruent trials minus the RT for congruent trials. However, we can change these to simpler labels with the `colnames` argument.


### Random Effects

Now we specify the random effect structure. We'll just add random intercepts to start, but will cover random slopes later.

Each subject will have slightly faster or slower reaction times on average; this is their random intercept (`sub_i`). We'll model it from a normal distribution with a mean of 0 and SD of 100ms.

Each stimulus will have slightly faster or slower reaction times on average; this is their random intercept (`stim_i`). We'll model it from a normal distribution with a mean of 0 and SD of 50ms (it seems reasonable to expect less variability between words than people for this task).

Run this code a few times to see how the random effects change each time. this is because they are **sampled** from populations.

```{r}
sub_n  <- 2 # number of subjects in this simulation
stim_n  <- 2 # number of stimuli in this simulation
sub_sd <- 100 # SD for the subjects' random intercept
stim_sd <- 50 # SD for the stimuli's random intercept

dat <- add_random(sub = sub_n) |>
  add_random(stim = stim_n) |>
  add_between(.by = "sub", condition = c("easy","hard")) |>
  add_within(version = c("congruent", "incongruent")) |>
  add_contrast("condition", colnames = "cond") |>
  add_contrast("version", colnames = "vers") |>
  add_ranef(.by = "sub", sub_i = sub_sd) |>
  add_ranef(.by = "stim", stim_i = stim_sd)

dat
```

### Error Term

Finally, add an error term. This uses the same `add_ranef()` function, just without specifying which random factor it's for with `.by`. In essence, this samples an error value from a normal distribution with a mean of 0 and the specified SD for each trial. We'll also increase the number of subjects and stimuli to more realistic values now.

```{r}
sub_n    <- 200 # number of subjects in this simulation
stim_n   <- 50  # number of stimuli in this simulation
sub_sd   <- 100 # SD for the subjects' random intercept
stim_sd  <- 50  # SD for the stimuli's random intercept
error_sd <- 25  # residual (error) SD

dat <- add_random(sub = sub_n) |>
  add_random(stim = stim_n) |>
  add_between(.by = "sub", condition = c("easy","hard")) |>
  add_within(version = c("congruent", "incongruent")) |>
  add_contrast("condition", colnames = "cond") |>
  add_contrast("version", colnames = "vers") |>
  add_ranef(.by = "sub", sub_i = sub_sd) |>
  add_ranef(.by = "stim", stim_i = stim_sd) |>
  add_ranef(err = error_sd)
```

### Calculate the DV

Now we can calculate the dependent variable (`rt`) by adding together an overall intercept (mean reaction time for all trials), the subject-specific intercept, the stimulus-specific intercept, and an error term, plus the effect of subject condition, the effect of stimulus version, and the interaction between condition and version.

We set these effects in raw units (ms). So when we set the effect of subject condition (`sub_cond_eff`) to 50, that means the average difference between the easy and hard condition is 50ms. `Easy` was coded as -0.5 and `hard` was coded as +0.5, which means that trials in the easy condition have -0.5 \* 50ms (i.e., -25ms) added to their reaction time, while trials in the hard condition have +0.5 \* 50ms (i.e., +25ms) added to their reaction time.

```{r sim-rt}
sub_n         <- 200 # number of subjects in this simulation
stim_n        <- 50  # number of stimuli in this simulation
sub_sd        <- 100 # SD for the subjects' random intercept
stim_sd       <- 50  # SD for the stimuli's random intercept
error_sd      <- 25  # residual (error) SD
grand_i       <- 400 # overall mean rt
cond_eff      <- 50  # mean difference between conditions: hard - easy
vers_eff      <- 50  # mean difference between versions: incongruent - congruent
cond_vers_ixn <-  0  # interaction between version and condition

dat <- add_random(sub = sub_n) |>
  add_random(stim = stim_n) |>
  add_between(.by = "sub", condition = c("easy","hard")) |>
  add_within(version = c("congruent", "incongruent")) |>
  add_contrast("condition", colnames = "cond") |>
  add_contrast("version", colnames = "vers") |>
  add_ranef(.by = "sub", sub_i = sub_sd) |>
  add_ranef(.by = "stim", stim_i = stim_sd) |>
  add_ranef(err = error_sd) |>
  mutate(rt = grand_i + sub_i + stim_i + err +
         (cond * cond_eff) +
         (vers * vers_eff) +
         (cond * vers * cond_vers_ixn) # in this example, this is always 0 and could be omitted
  )
```

As always, graph to make sure you've simulated the general pattern you expected.

```{r plot-rt, fig.cap="Double-check the simulated pattern"}
ggplot(dat, aes(condition, rt, color = version)) +
  geom_hline(yintercept = grand_i) +
  geom_violin(alpha = 0.5) +
  stat_summary(fun = mean,
               fun.min = \(x){mean(x) - sd(x)},
               fun.max = \(x){mean(x) + sd(x)},
               position = position_dodge(width = 0.9)) +
  scale_color_brewer(palette = "Dark2")
```


### Interactions

If you want to simulate an interaction, it can sometimes be tricky to figure out what to set the main effects and interaction effect to. It can often be easier to think about the simple main effects for each cell. Create four new variables and set them to the deviations from the overall mean you'd expect for each condition (so they should add up to 0). Here, we're simulating a small effect of version in the hard condition (50ms difference) and double that effect of version in the easy condition (100ms difference).

```{r sim-simple-main-effects}
# set variables to use in calculations below
easy_congr <- -50
easy_incon <- +50
hard_congr <- -25
hard_incon <- +25
```

Use the code below to transform the simple main effects above into main effects and interactions for use in the equations below. This uses the `sim_design()` function from faux to set up a fixed-design simulated sample with the exact cell means above (using `empirical = TRUE`), runs a linear model on the data, and extracts the coefficients. This only works if you contrast-code the factors in the same way, and use the same formula as your LMEM. It doesn't matter if you set the factors to between or within for this, so it's easiest to set them all to between.

```{r sim-effect-calc}
# simulated the data
fixed_data <- sim_design(
  between = list(condition = c("easy","hard"),
                 version   = c("congruent", "incongruent")),
  mu = c(easy_congr, easy_incon, hard_congr, hard_incon),
  dv = "rt",
  empirical = TRUE,
  plot = FALSE
) |>
  # add the same contrasts you'll use in your LMEM
  add_contrast("condition", colnames = "cond") |>
  add_contrast("version", colnames = "vers")

# run the same formula using lm()
fixed_model <- lm(rt ~ cond * vers, data = fixed_data)

# extract and round the coefficients
fixed_coefs <- coef(fixed_model) |> round(2)

# assign the coefficients to the relevant variables
cond_eff <- fixed_coefs["cond"]
vers_eff <- fixed_coefs["vers"]
cond_vers_ixn <- fixed_coefs["cond:vers"]
```

Then generate the rt the same way we did above, but also add the interaction effect multiplied by the effect-coded subject condition and stimulus version.

```{r sim-ixn}

dat <- add_random(sub = sub_n) |>
  add_random(stim = stim_n) |>
  add_between(.by = "sub", condition = c("easy","hard")) |>
  add_within(version = c("congruent", "incongruent")) |>
  add_contrast("condition", colnames = "cond") |>
  add_contrast("version", colnames = "vers") |>
  add_ranef(.by = "sub", sub_i = sub_sd) |>
  add_ranef(.by = "stim", stim_i = stim_sd) |>
  add_ranef(err = error_sd) |>
  mutate(rt = grand_i + sub_i + stim_i + err +
         (cond * cond_eff) +
         (vers * vers_eff) +
         (cond * vers * cond_vers_ixn)
  )

```

```{r plot-ixn, fig.cap="Double-check the interaction between condition and version"}
ggplot(dat, aes(condition, rt, color = version)) +
  geom_hline(yintercept = grand_i) +
  geom_violin(alpha = 0.5) +
  stat_summary(fun = mean,
               fun.min = \(x){mean(x) - sd(x)},
               fun.max = \(x){mean(x) + sd(x)},
               position = position_dodge(width = 0.9)) +
  scale_color_brewer(palette = "Dark2")
```


```{r table-ixn}
group_by(dat, condition, version) %>%
  summarise(m = mean(rt) - grand_i %>% round(1),
            .groups = "drop") %>%
  pivot_wider(names_from = version,
              values_from = m)
```

## Analysis

Now we will run a linear mixed effects model with `lmer` and look at the summary.

```{r lmer}
mod <- lmer(rt ~ cond * vers +
              (1 | sub) +
              (1 | stim),
            data = dat)

mod.sum <- summary(mod)

mod.sum
```

### Sense checks

First, check that your groups make sense.

* The number of obs should be the total number of trials analysed.
* `sub` should be what we set `sub_n` to above.
* `stim` should be what we set `stim_n` to above.

```{r mod-ngrps, echo = FALSE}
mod.sum$ngrps |>
  as_tibble(rownames = "Random.Fator") |>
  mutate(parameters = c(sub_n, stim_n))
```

Next, look at the random effects.

* The SD for `sub` should be near `sub_sd`.
* The SD for `stim` should be near `stim_sd`.
* The residual SD should be near `error_sd`.

```{r mod-varcor, echo = FALSE}
mod.sum$varcor |>
  as_tibble() |>
  select(Groups = grp, Name = var1, SD = sdcor) |>
  mutate(parameters = c(sub_sd, stim_sd, error_sd),
         SD = round(SD))
```

Finally, look at the fixed effects.

* The estimate for the Intercept should be near the `grand_i`.
* The main effect of `cond` should be near what we calculated for `cond_eff`.
* The main effect of `vers` should be near what we calculated for `vers_eff`.
* The interaction between `cond`:`vers` should be near what we calculated for `cond_vers_ixn`.

```{r mod-coef, echo = FALSE}
mod.sum$coefficients |>
  as_tibble(rownames = "Effect") |>
  select(Effect, Estimate) |>
  mutate(parameters = c(grand_i, cond_eff, vers_eff, cond_vers_ixn),
         Estimate = round(Estimate))
```

### Random effects

Plot the subject intercepts from our code above (`dat$sub_i`) against the subject intercepts calculated by `lmer` (`ranef(mod)$sub_id`).

```{r plot-sub-ranef, fig.cap = "Compare simulated subject random intercepts to those from the model"}
# get simulated random intercept for each subject
sub_sim <- dat |>
  group_by(sub, sub_i) |>
  summarise(.groups = "drop")

# join to calculated random intercept from model
sub_sim_mod <- ranef(mod)$sub |>
  as_tibble(rownames = "sub") |>
  rename(mod_sub_i = `(Intercept)`) |>
  left_join(sub_sim, by = "sub")

# plot to check correspondence
sub_sim_mod |>
  ggplot(aes(sub_i,mod_sub_i)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y~x) +
  xlab("Simulated random intercepts (sub_i)") +
  ylab("Modeled random intercepts")
```

Plot the stimulus intercepts from our code above (`dat$stim_i`) against the stimulus intercepts calculated by `lmer` (`ranef(mod)$stim_id`).

```{r plot-stim-ranef, fig.cap = "Compare simulated stimulus random intercepts to those from the model"}
# get simulated random intercept for each stimulus
stim_sim <- dat |>
  group_by(stim, stim_i) |>
  summarise(.groups = "drop")

# join to calculated random intercept from model
stim_sim_mod <- ranef(mod)$stim |>
  as_tibble(rownames = "stim") |>
  rename(mod_stim_i = `(Intercept)`) |>
  left_join(stim_sim, by = "stim")

# plot to check correspondence
stim_sim_mod |>
  ggplot(aes(stim_i,mod_stim_i)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y~x) +
  xlab("Simulated random intercepts (stim_i)") +
  ylab("Modeled random intercepts")
```


### Function

You can put the code above in a function so you can run it more easily and change the parameters. I removed the plot and set the argument defaults to the same as the example above with all fixed effects set to 0, but you can set them to other patterns.

```{r sim-function}
sim_lmer <- function( sub_n = 200,
                      stim_n = 50,
                      sub_sd = 100,
                      stim_sd = 50,
                      error_sd = 25,
                      grand_i = 400,
                      cond_eff = 0,
                      vers_eff = 0,
                      cond_vers_ixn = 0) {
  dat <- add_random(sub = sub_n) |>
    add_random(stim = stim_n) |>
    add_between(.by = "sub", condition = c("easy","hard")) |>
    add_within(version = c("congruent", "incongruent")) |>
    add_contrast("condition", colnames = "cond") |>
    add_contrast("version", colnames = "vers") |>
    add_ranef(.by = "sub", sub_i = sub_sd) |>
    add_ranef(.by = "stim", stim_i = stim_sd) |>
    add_ranef(err = error_sd) |>
    mutate(rt = grand_i + sub_i + stim_i + err +
           (cond * cond_eff) +
           (vers * vers_eff) +
           (cond * vers * cond_vers_ixn)
    )

  mod <- lmer(rt ~ cond * vers +
                (1 | sub) +
                (1 | stim),
              data = dat)

  return(mod)
}
```

Run the function with the default values (so all fixed effects set to 0).

```{r sim-lmer-default}
sim_lmer() %>% summary()
```

Try changing some variables to simulate different patterns of fixed effects.

```{r sim-lmer-changes}
sim_lmer(cond_eff = 0,
         vers_eff = 75,
         cond_vers_ixn = -50) %>%
  summary()
```

::: {.callout-tip}
#### Try this
How would you modify the function above to take the input of the four cell means (`easy_congr`, `easy_incon` `hard_congr`, `hard_incon`), instead of the coefficients (`cond_eff`, `vers_eff`, `cond_vers_ixn`)?
:::

### Power analysis

Calculating power for a LMEM can seem really tricky, but if you can simulate the data you expect, you can calculate power for anything!

First, wrap your simulation function inside of another function that takes the argument of a replication number, runs a simulated analysis, and returns a data table of the fixed and random effects (made with `broom.mixed::tidy()`). You can use purrr's `map_df()` function to create a data table of results from multiple replications of this function. We're only running 10 replications here in the interests of time, but you'll want to run 100 or more for a proper power calculation.

```{r power1}

sim_lmer_pwr <- function(rep) {
  s <- sim_lmer(cond_eff = 0,
                vers_eff = 75,
                cond_vers_ixn = 50)

  # put just the fixed effects into a data table
  broom.mixed::tidy(s, "fixed") %>%
    mutate(rep = rep) # add a column for which rep
}

my_power <- map_df(1:10, sim_lmer_pwr)

```

You can then plot the distribution of estimates across your simulations.

```{r}
ggplot(my_power, aes(estimate, color = term)) +
  geom_density() +
  facet_wrap(~term, scales = "free")
```

You can also just calculate power as the proportion of p-values less than your alpha.

```{r}
my_power %>%
  group_by(term) %>%
  summarise(power = mean(p.value < 0.05),
            .groups = "drop")
```

## Random slopes

In the example so far we've ignored random variation among subjects or stimuli in the size of the fixed effects (i.e., **random slopes**).

First, let's reset the parameters we set above.

```{r}
sub_n         <- 200 # number of subjects in this simulation
stim_n        <- 50  # number of stimuli in this simulation
sub_sd        <- 100 # SD for the subjects' random intercept
stim_sd       <- 50  # SD for the stimuli's random intercept
error_sd      <- 25  # residual (error) SD
grand_i       <- 400 # overall mean rt
cond_eff      <- 50  # mean difference between conditions: hard - easy
vers_eff      <- 50  # mean difference between versions: incongruent - congruent
cond_vers_ixn <-  0  # interaction between version and condition

```

### Slopes

In addition to generating a random intercept for each subject, now we will also generate a random slope for any within-subject factors. The only within-subject factor in this design is `version`. The main effect of `version` is set to 50 above, but different subjects will show variation in the size of this effect. That's what the random slope captures. We'll set `sub_vers_sd` below to the SD of this variation and use this to calculate the random slope (`sub_version_slope`) for each subject.

Also, it's likely that the variation between subjects in the size of the effect of version is related in some way to between-subject variation in the intercept. So we want the random intercept and slope to be correlated. Here, we'll simulate a case where subjects who have slower (larger) reaction times across the board show a smaller effect of condition, so we set `sub_i_vers_cor` below to a negative number (-0.2).

We just have to edit the first `add_ranef()` to add two variables (`sub_i`, `sub_vers_slope`) that are correlated with r = -0.2, means of 0, and SDs equal to what we set `sub_sd` above and `sub_vers_sd` below.

```{r sim-subject-cor}
sub_vers_sd <- 20
sub_i_vers_cor <- -0.2

dat <- add_random(sub = sub_n) |>
    add_random(stim = stim_n) |>
    add_between(.by = "sub", condition = c("easy","hard")) |>
    add_within(version = c("congruent", "incongruent")) |>
    add_contrast("condition", colnames = "cond") |>
    add_contrast("version", colnames = "vers") |>
    add_ranef(.by = "sub", sub_i = sub_sd,
              sub_vers_slope = sub_vers_sd,
              .cors = sub_i_vers_cor)
```


### Correlated Slopes

In addition to generating a random intercept for each stimulus, we will also generate a random slope for any within-stimulus factors. Both `version` and `condition` are within-stimulus factors (i.e., all stimuli are seen in both `congruent` and `incongruent` versions and both `easy` and `hard` conditions). So the main effects of version and condition (and their interaction) will vary depending on the stimulus.

They will also be correlated, but in a more complex way than above. You need to set the correlations for all pairs of slopes and intercept. Let's set the correlation between the random intercept and each of the slopes to -0.4 and the slopes all correlate with each other +0.2 (You could set each of the six correlations separately if you want, though).


```{r rslope-sim-stimuli}

stim_vers_sd <- 10 # SD for the stimuli's random slope for stim_version
stim_cond_sd <- 30 # SD for the stimuli's random slope for sub_cond
stim_cond_vers_sd <- 15 # SD for the stimuli's random slope for sub_cond:stim_version
stim_i_cor <- -0.4 # correlations between intercept and slopes
stim_s_cor <- +0.2 # correlations among slopes

# specify correlations for rnorm_multi (one of several methods)
stim_cors <- c(stim_i_cor, stim_i_cor, stim_i_cor,
                           stim_s_cor, stim_s_cor,
                                       stim_s_cor)

dat <- add_random(sub = sub_n) |>
    add_random(stim = stim_n) |>
    add_between(.by = "sub", condition = c("easy","hard")) |>
    add_within(version = c("congruent", "incongruent")) |>
    add_contrast("condition", colnames = "cond") |>
    add_contrast("version", colnames = "vers") |>
    add_ranef(.by = "sub", sub_i = sub_sd,
              sub_vers_slope = sub_vers_sd,
              .cors = sub_i_vers_cor) |>
    add_ranef(.by = "stim", stim_i = stim_sd,
              stim_vers_slope = stim_vers_sd,
              stim_cond_slope = stim_cond_sd,
              stim_cond_vers_slope = stim_cond_vers_sd,
              .cors = stim_cors)

```


### Calculate the DV

Now we can calculate the rt by adding together an overall intercept (mean RT for all trials), the subject-specific intercept, the stimulus-specific intercept, the effect of subject condition, the stimulus-specific slope for condition, the effect of stimulus version, the stimulus-specific slope for version, the subject-specific slope for condition, the interaction between condition and version (set to 0 for this example), the stimulus-specific slope for the interaction between condition and version, and an error term.

```{r rslope-sim-rt}
dat <- add_random(sub = sub_n) |>
    add_random(stim = stim_n) |>
    add_between(.by = "sub", condition = c("easy","hard")) |>
    add_within(version = c("congruent", "incongruent")) |>
    add_contrast("condition", colnames = "cond") |>
    add_contrast("version", colnames = "vers") |>
    add_ranef(.by = "sub", sub_i = sub_sd,
              sub_vers_slope = sub_vers_sd,
              .cors = sub_i_vers_cor) |>
    add_ranef(.by = "stim", stim_i = stim_sd,
              stim_vers_slope = stim_vers_sd,
              stim_cond_slope = stim_cond_sd,
              stim_cond_vers_slope = stim_cond_vers_sd,
              .cors = stim_cors) |>
    add_ranef(err = error_sd) |>
  mutate(
    trial_cond_eff = cond_eff + stim_cond_slope,
    trial_vers_eff = vers_eff + sub_vers_slope + stim_vers_slope,
    trial_cond_vers_ixn = cond_vers_ixn + stim_cond_vers_slope,
    rt = grand_i + sub_i + stim_i + err +
         (cond * trial_cond_eff) +
         (vers * trial_vers_eff) +
         (cond * vers * trial_cond_vers_ixn)
  )

```

As always, graph to make sure you've simulated the general pattern you expected.

```{r rslope-plot-rt, fig.cap="Double-check the simulated pattern"}
ggplot(dat, aes(condition, rt, color = version)) +
  geom_hline(yintercept = grand_i) +
  geom_violin(alpha = 0.5) +
  stat_summary(fun = mean,
               fun.min = \(x){mean(x) - sd(x)},
               fun.max = \(x){mean(x) + sd(x)},
               position = position_dodge(width = 0.9)) +
  scale_color_brewer(palette = "Dark2")
```

## Analysis

New we'll run a linear mixed effects model with `lmer` and look at the summary. You specify random slopes by adding the within-level effects to the random intercept specifications. Since the only within-subject factor is version, the random effects specification for subjects is `(1 + vers | sub)`. Since both condition and version are within-stimuli factors, the random effects specification for stimuli is `(1 + vers*cond | stim)`.

This model will take a lot longer to run than one without random slopes specified. This might be a good time for a coffee break.

```{r rslope-lmer}
mod <- lmer(rt ~ cond * vers +
              (1 + vers | sub) +
              (1 + vers*cond | stim),
            data = dat)

mod.sum <- summary(mod)

mod.sum
```

### Sense checks

First, check that your groups make sense.

* `sub` = `sub_n` (`r sub_n`)
* `stim` = `stim_n` (`r stim_n`)

```{r rslope-mod-ngrps, echo = FALSE}
mod.sum$ngrps |>
  as_tibble(rownames = "Random.Fator") |>
  mutate(parameters = c(sub_n, stim_n))
```

Next, look at the SDs for the random effects.

* Group:`sub`
   * `(Intercept)` ~= `sub_sd`
   * `vers` ~= `sub_vers_sd`
* Group: `stim`
   * `(Intercept)` ~= `stim_sd`
   * `vers` ~= `stim_vers_sd`
   * `cond` ~= `stim_cond_sd`
   * `vers:cond` ~= `stim_cond_vers_sd`
* Residual ~= `error_sd`

```{r rslope-mod-varcor, echo = FALSE}
mod.sum$varcor |>
  as_tibble() |>
  filter(is.na(var2)) |>
  select(Groups = grp, Name = var1, SD = sdcor) |>
  mutate(parameters = c(sub_sd, sub_vers_sd, stim_sd, stim_vers_sd, stim_cond_sd, stim_cond_vers_sd, error_sd),
         SD = round(SD))
```

The correlations are a bit more difficult to parse. The first column under `Corr` shows the correlation between the random slope for that row and the random intercept. So for `vers` under `sub`, the correlation should be close to `sub_i_vers_cor`. For all three random slopes under `stim`, the correlation with the random intercept should be near `stim_i_cor` and their correlations with each other should be near `stim_s_cor`.


Finally, look at the fixed effects.

* `(Intercept)` ~= `grand_i`
* `sub_cond.e` ~= `sub_cond_eff`
* `stim_version.e` ~= `stim_vers_eff`
* `sub_cond.e`:`stim_version.e` ~= `cond_vers_ixn`

```{r rslope-mod-coef, echo = FALSE}
mod.sum$coefficients |>
  as_tibble(rownames = "Effect") |>
  select(Effect, Estimate) |>
  mutate(parameters = c(grand_i, cond_eff, vers_eff, cond_vers_ixn),
         Estimate = round(Estimate))
```


### Function

You can put the code above in a function so you can run it more easily and change the parameters. I removed the plot and set the argument defaults to the same as the example above, but you can set them to other patterns.

```{r rslope-sim-function}
sim_lmer_slope <- function( sub_n = 200,
                            stim_n = 50,
                            sub_sd = 100,
                            sub_vers_sd = 20,
                            sub_i_vers_cor = -0.2,
                            stim_sd = 50,
                            stim_vers_sd = 10,
                            stim_cond_sd = 30,
                            stim_cond_vers_sd = 15,
                            stim_i_cor = -0.4,
                            stim_s_cor = +0.2,
                            error_sd = 25,
                            grand_i = 400,
                            sub_cond_eff = 0,
                            stim_vers_eff = 0,
                            cond_vers_ixn = 0) {
  dat <- add_random(sub = sub_n) |>
    add_random(stim = stim_n) |>
    add_between(.by = "sub", condition = c("easy","hard")) |>
    add_within(version = c("congruent", "incongruent")) |>
    add_contrast("condition", colnames = "cond") |>
    add_contrast("version", colnames = "vers") |>
    add_ranef(.by = "sub", sub_i = sub_sd,
              sub_vers_slope = sub_vers_sd,
              .cors = sub_i_vers_cor) |>
    add_ranef(.by = "stim", stim_i = stim_sd,
              stim_vers_slope = stim_vers_sd,
              stim_cond_slope = stim_cond_sd,
              stim_cond_vers_slope = stim_cond_vers_sd,
              .cors = stim_cors) |>
    add_ranef(err = error_sd) |>
    mutate(
      trial_cond_eff = cond_eff + stim_cond_slope,
      trial_vers_eff = vers_eff + sub_vers_slope + stim_vers_slope,
      trial_cond_vers_ixn = cond_vers_ixn + stim_cond_vers_slope,
      rt = grand_i + sub_i + stim_i + err +
           (cond * trial_cond_eff) +
           (vers * trial_vers_eff) +
           (cond * vers * trial_cond_vers_ixn)
    )

  mod <- lmer(rt ~ cond * vers +
                (1 + vers | sub) +
                (1 + vers*cond | stim),
              data = dat)

  return(mod)
}
```

Run the function with the default values (null fixed effects).

```{r rslope-sim-lmer-default}
sim_lmer_slope() %>% summary()
```

Try changing some variables to simulate fixed effects.

```{r rslope-sim-lmer-null}
sim_lmer_slope(sub_cond_eff = 50,
               stim_vers_eff = 50,
               cond_vers_ixn = 0)
```

## Exercises

1. Calculate power for the parameters in the last example using the `sim_lmer_slope()` function.

```{r ex1, include=FALSE}
sim_lmer_slope_pwr <- function(rep) {
  s <- sim_lmer_slope(sub_cond_eff = 50,
                      stim_vers_eff = 50,
                      cond_vers_ixn = 0)

  # put just the fixed effects into a data table
  broom.mixed::tidy(s, "fixed") %>%
    mutate(rep = rep) # add a column for which rep
}

# run it only twice to test first in the interests of time
my_power_s <- map_df(1:2, sim_lmer_slope_pwr)
```


2. Simulate data for the following design:

* 100 raters rate 50 faces from group A and 50 faces from group B
* The rt has a mean value of 50
* Group B values are 5 points higher than group A
* Rater intercepts have an SD of 5
* Face intercepts have an SD of 10
* The residual error has an SD of 8

```{r ex2, include=FALSE}

rater_n <- 50
face_n <- 100
rater_sd <- 5
face_sd <- 10
error_sd <- 8
grand_i <- 50
grp_effect <- 5

dat <- add_random(rater = rater_n) |>
  add_random(face = face_n) |>
  add_between("face", group = c("A", "B")) |>
  add_ranef("rater", rater_i = rater_sd) |>
  add_ranef("face", face_i = face_sd) |>
  add_contrast("group", colnames = "grp") |>
  add_ranef(err = error_sd) |>
  mutate(rt = grand_i + (grp_effect * grp) + err)


ggplot(dat, aes(rt, color = group)) + geom_density()
```

3. For the design from exercise 2, write a function that simulates data and runs a mixed effects analysis on it.

```{r ex3, include=FALSE}

# Lisa write this soon

```

4. The package `faux` has a built-in dataset called `fr4`. Type `?faux::fr4` into the console to view the help for this dataset. Run a mixed effects model on this dataset looking at the effect of `face_sex` on ratings. Remember to include a random slope for the effect of face sex and explicitly add a contrast code.

```{r, include=FALSE}

# code female = -0.5, male = +0.5
fr_coded <- faux::fr4 |>
  select(rater_id, face_id, face_sex, rating) |>
  add_contrast("face_sex", levels = c("female", "male"), colnames = "face_sex.e")

mod <- lmer(rating ~ face_sex.e +
              (1 + face_sex.e | rater_id) +
              (1 | face_id), data = fr_coded)

summary(mod)
```

5. Use the parameters from this analysis to simulate a new dataset with 50 male and 50 female faces, and 100 raters.

```{r, include=FALSE}

# get data into a table
stats <- broom.mixed::tidy(mod)

grand_i <- filter(stats, term == "(Intercept)") %>% pull(estimate)
rater_n <- 100
face_n <- 100
rater_sd <- filter(stats, group == "rater_id", term == "sd__(Intercept)") %>% pull(estimate)
face_sex_sd <- filter(stats, group == "rater_id", term == "sd__face_sex.e") %>% pull(estimate)
face_sd <- filter(stats, group == "face_id", term == "sd__(Intercept)") %>% pull(estimate)
error_sd <- filter(stats, group == "Residual") %>% pull(estimate)
face_sex_eff <- filter(stats, term == "face_sex.e") %>% pull(estimate)

dat <- add_random(rater = rater_n) |>
  add_random(face = face_n) |>
  add_between("face", face_sex = c("female", "male")) |>
  add_ranef("rater", rater_i = rater_sd) |>
  add_ranef("face", face_i = face_sd) |>
  add_contrast("face_sex", levels = c("female", "male"), colnames = "face_sex.e") |>
  add_ranef(err = error_sd) |>
  mutate(rt = grand_i + (face_sex_eff * face_sex.e) + err)

ggplot(dat, aes(rt, fill = face_sex)) +
  geom_histogram(binwidth = 1, color = "black") +
  facet_wrap(~face_sex)

```