RCT-ONS-Behavioural-Insights/PermutedBlockRandomisation.Rmd at master · exfalsoquodlibet/RCT-ONS-Behavioural-Insights · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
title: "Permuted Block Randomisation"
author: "Alessia Tosi"
date: "15/01/2018"
always_allow_html: yes
output: github_document

---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

required_packages <- c('dplyr', 'blockrand', 'ggplot2', 'ggthemes')
lapply(required_packages, require, character.only = TRUE)

```

```{r data, echo=FALSE, include=FALSE}

bsn_df <- read.csv("Data/TrialSample_JanApr_full.csv", header=T, sep=';', stringsAsFactors = FALSE)

# clean size variable
bsn_df <- bsn_df %>%
      mutate(size = if_else(size == 'Oct-19', '1019', if_else(size == '05-Sep', '0509', if_else(size == '0-4', '0004', size))))

```

### Aims

In this script,

1. I perform a stratified randomisation, with Month of first selection (Jan-April) and business size (0-4, 5-9, and 10-19) as prognostic (i.e., stratifying) variables. This produces 12 strata which will then be taken into account in the analysis model.

2. I then use a permuted block randomisation with varying block sizes to randomly assign businesses to conditions within each strata. That means:

    - Businesses are randomly assigned with a 1:1 ratio to Control vs. Intervention.
    - I perform the assignment according to a computer-generated random sequence in permuted block of (2,4,6,8) within each strata.

    An example of one possible of such sequences (where B/C stands for BI intervention/Control, one letter = one business = one assignment): [C B] [B C] [C B B C B C] [C B] [B B C B B C C C] ..


### Why are these two steps important?

1. Stratified randomisation accounts for possible baseline imbalance of potentially confounding variables, and

2. Permuted block randomisation ensures that we get an equal number of units (i.e., businesses) in each condition (Control vs. Intervention).


### Data

The data have been previously-preprocessed and fully anonymised. A subset of the tidy anonymised data are available in the Data folder.


### 1. Stratification

Let us reshuffle the rows in the dataset.

```{r strata_1}

set.seed(42)
bsn_df <- bsn_df[sample(nrow(bsn_df)),]

```


Split data into individual strata datasets (i.e., one for each strata)

```{r strata_2}

list_of_strata_df <- split(bsn_df, list(factor(bsn_df$newly_selected_month), factor(bsn_df$size)))

```

Let's take a look at one:

```{r strata_3, include=TRUE, eacho=FALSE}

head(list_of_strata_df[[1]])

```


### 2. Randomisation

In order to achieve stratified randomisation with block random assignment to condition within each strata (with varying-size blocks), we'll use the `? blockrand::blockrand` function and apply it to our list of lists.

Let's take a look at the arguments that `blockrand::blockrand` requires:

- `n`           This is the number of businesses to randomise in each strata
- `num.levels`  The number of conditions to randomise between, 2 in our cases
- `levels`      The names of the conditions, so `c("Control","BI")` in our case
- `id.prefix`   Prefix for the id column values, we'll use the stratum name
- `stratum`     Character string specifying the stratum generated, again the stratum name

Note that the `blockrand::blockrand` function uses the `sample` function internally to generate the blocks, so we'll use `set.seed` to define the RNG seed to enable us to reproduce the block sequencing in the future at implmentation.

We will also make sure that the randomisation sequence is as long as the number of cases in each stratum. This is necessary as the number of randomizations may end up being more than n (see `?blockrand`).


```{r random}

set.seed(123)

list_of_strata_df_2 <- lapply(names(list_of_strata_df), function(x) blockrand::blockrand(
      n = dim(list_of_strata_df[[x]])[1],
      num.levels = 2,
      levels = c('Control', 'BI'),
      id.prefix = x,
      block.prefix = x,
      stratum = x)[
            1:dim(list_of_strata_df[[x]])[1],] #Ensures number of randomisations is of right length
      )

```


Finally, we append the randomisation list to the original list of strata to assign businesses to experimental conditions.

```{r merge}

random_assigned_bsn_df <- NULL

for(l_idx in 1:length(list_of_strata_df)){
      random_assigned_bsn_df <- rbind(random_assigned_bsn_df, do.call(cbind, list(list_of_strata_df[[l_idx]], list_of_strata_df_2[[l_idx]])))
}

```


We verify that we obtained a balance design:

```{r fina_check}

with(random_assigned_bsn_df, xtabs(~treatment + newly_selected_month + size))

```