RCT-ONS-Behavioural-Insights/PowerBySimulation.Rmd at master · exfalsoquodlibet/RCT-ONS-Behavioural-Insights · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
---
title: "A-Priori Power by Simulation"
author: "Alessia Tosi"
date: "15/01/2018"
always_allow_html: yes
output: github_document

---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

library(plotly)
```


In this script, I calculate the sample size necessary to detect the minimal effect of practical significance given the desired power using simulation.

The behavioural-insights RCT for the ONS Construction Survey uses a two-arm (i.e., between-subject) design. Stratification was used to control for possible confounding factors with Month of First Selection (Jan-May) as stratifying variable.


### Steps

Our data will be binary (1 if the business has responded by the deadline, 0 otherwise) and we'll analyse them using a generalised linear model (GLM) with a logit link function.

We will:

1. Establish the effect that we want to be able to detect (the minimal effect size that is of PRACTICAL significance, i.e. that we want to be able to pick up if it is there) and sample size N

2. Simulate N data for that possible effect, and do the GLM analysis on this simulated data.

3. Collect the significant test result for the intervention effect.

4. Re-run steps 3-4 a number of times (e.g., 1000 repetition). The proportion of times that we'll reject the null hypothesis is the power at that N.

5. To determine the a-priori sample size to detect the desired effect for the desired power, we'll search over possible n values of N, re-run steps 2-4 for each value of N, and find the value that yeilds our desire power.


### Step 1: Determine minimal effect of practical significance

The trial focuses on the April 2018 data for businesses that newly joined the survey between January and April 2018, and May 2018 data for businesses that newly join the survey in that month.

Using historical data of the ONS Construction Survey, we can estimate the baseline (i.e., control-group) pre-deadline response rate. This is the weighted average of the pre-deadline response rates for the month of April and May in previous years, specifically:

- historical April pre-deadline response rate for businesses which newly joined in January, February or March (`apr_prev_p0`)
- historical April pre-deadline response rate for businesses which newly joined in that month (`apr_new_p0`)
- historical May pre-deadline response rate for businesses which newly joined in that month (`may_new_p0`)

```{r baselines, echo=TRUE}
apr_new_p0 = 0.14
apr_prev_p0 = 0.17
may_new_p0 = 0.29

# control-group baseline as weighted average of past response rate by deadline
baseline_p0 <- (apr_new_p0 + apr_prev_p0*3 + may_new_p0)/5

```

The minimum intervention effect can be seen as the minimum odds ratio of practical interest associated with intervention allocation. In our case, we established that an increase of 4 percentage points in the rate of businesses that respond to the survey by the deadline would be considered successful in terms of resources saved from response chasing.

```{r min_effect, echo=TRUE}

# minimum desired effect
min_de <- 0.04

intervention_p1 <- baseline_p0 + min_de

( odds_intervention <-  intervention_p1/(1-intervention_p1) )
( odds_baseline <-  baseline_p0/(1-baseline_p0) )
( odds_ratio <- odds_intervention / odds_baseline )

# minimum desired effect as log-odds given the expected control-group frequency
( beta_logodds <- log(odds_ratio))
```


### Step 2-4: Simulate N data for that possible effect, and do the analyse on this simulated data.

Let us assume a sample size N = 2300 businesses and calculate what the power to detect our minimum desired effect (i.e., an increase of 4 percentage points in the rate of businesses that respond by the deadline) would be.

Set sample size N:

```{r sample_N, echo=TRUE}
N = 2300

# sample size in each group (taking into account strata)
n = N/10
```

Set number of repetition:

```{r reps, echo=TRUE}
repetitions = 1000
```

Create variables refecting the trial design:

```{r vars, echo=TRUE}
var_cond <- rep(c('Inter', 'Control'), each=5)
var_cond <- relevel(factor(var_cond), ref='Control')

var_strata <- rep(c( 'jan', 'feb', 'mar', 'apr', 'may'),2)

var_cond <- rep(var_cond, times=n)
var_strata <- rep(var_strata, times=n)

```


Expected control-group and intervenion-group response rate given the minimum desired effect:

```{r rates}
rates  = c( apr_prev_p0, apr_prev_p0, apr_prev_p0, apr_new_p0, may_new_p0,
      apr_prev_p0+min_de, apr_prev_p0+min_de, apr_prev_p0+min_de, apr_new_p0+min_de,may_new_p0+min_de)
```


Calculation of power:

```{r boh, echo=TRUE}

set.seed(42)

significant = matrix(nrow=repetitions, ncol=7)

startT = proc.time()[3]

for(i in 1:repetitions){

      # simulate responses based on control- and desired inervention-group response rates
      responses          <- rbinom(n=N, size=1, prob=rates)

      # analyse the simulated data
      model              <- glm(responses~var_cond+var_strata, family=binomial(link="logit"))

      # extract the one-sided (right-tail) Wald's z test for intervention
      significant[i,1]   <- (summary(model)$coefficients[2,4]/2 <.05)

      # extract the significance results for the strata/co-variates (we are not interested in them)
      significant[i,2:5] <- (summary(model)$coefficients[3:6,4]<.05)

      significant[i,6]   <- sum(significant[i,1:5])

      modelDev           <- model$null.deviance-model$deviance
      significant[i,7]   <- (1-pchisq(modelDev, 5))<.05

      }

endT = proc.time()[3]
endT-startT

```


If there was an effect of +`r min_de*100` percentage points in timed responses due to the intervention (from baseline `r baseline_p0*100`% to `r intervention_p1*100`% (i.e., odds ratio of `r odds_ratio`) and we had a sample size of `r N` businesses, then we would have `r sum(significant[,1])/repetitions` power to detect that effect.

```{r apriori_power}
sum(significant[,1])/repetitions
```


### Step 5: Determine the sample size to detect the desired effect given the desired power

To determine the a-priori sample size to detect the desired effect for the desired power, we'll search over possible n values of N, re-run steps 2-4 for each value of N, to find the value that yeilds our desire power.

To do this, I created a function, `power_by_sim_fun()` that reproduces steps 2-4 above.

```{r source_fun, echo = TRUE}
source("power_by_sim_fun.R")
```

Let's take a look a its arguments but you can find more info in the .R file.

```{r arg_fun, echo = TRUE}
args(power_by_sim_fun)
```


Let's iterate over possible sample size values N and possible baseline response rates.

```{r iterate_2, echo = FALSE}

# possible sample sizes from 1600 to 2400 by 100
N_pool <- seq(from = 2100, to = 2600, by = 100)

# possible baseline response rates +/- 3 percentage points with respect to historical baselines
bf_pool <- seq(from = -.03, to = .03, by = .01)

```


```{r iterate_N_bf, eacho = FALSE}

# WARNING: it will take a while...

num_rows <- length(bf_pool)

collect_results_perN <- list()    # empty list to collect results

index = 0

set.seed(321)

for (n in N_pool){      # iterate over possible sample size values N
      index = index + 1
      collect_bf_results <- matrix(nrow = num_rows, ncol=3)
      collect_bf_results[, 3] <- n

      for (i in 1:length(bf_pool)){ # iterate over possible baseline response rates
            bf <- bf_pool[i]
            collect_bf_results[i, 1] <- bf
            collect_bf_results[i, 2] <- power_by_sim_fun(rep=1000, N=n, baseline_factor=bf, min_de=0.04)

            }

      collect_results_perN[[index]] <- as.data.frame(collect_bf_results)
      }


power_results_df <- do.call('rbind', collect_results_perN)
colnames(power_results_df) <- c('baseline_factor', 'power', 'sample_size_N')

```

As the plot below shows, a sample size between 2400 and 2600 would give us a power between `r power_results_df[power_results_df$sample_size_N == 2400 & power_results_df$baseline_factor == 0.03, ]$power` and `r power_results_df[power_results_df$sample_size_N == 2600 & power_results_df$baseline_factor == -0.03,]$power` to detect our minimum desired effect, depending on the deviation of the by-deadline response rate in the observed control group compared to the historical rate.

```{r plot, echo = FALSE}

power_results_df$baseline_factor <- factor(power_results_df$baseline_factor)

ggplot(power_results_df, aes(y=sample_size_N, x=power, group=baseline_factor)) +
  geom_point(aes(shape=baseline_factor, color=baseline_factor), size=2)+
  scale_shape_manual(values=c( rep(16, 3), 8, rep(16, 3)))

```


### References

https://stats.stackexchange.com/questions/22406/power-analysis-for-ordinal-logistic-regression/22410#22410

https://stats.stackexchange.com/questions/35940/simulation-of-logistic-regression-power-analysis-designed-experiments/36040#36040

https://stats.stackexchange.com/questions/111308/why-is-power-analysis-with-logistic-regression-so-liberal-compared-to-chi-square?rq=1