-
Notifications
You must be signed in to change notification settings - Fork 18
Open
Description
library(tidyverse)
d <- haven::read_dta('/Users/fr/Documents/Teaching/SRQM/data/qog2019.dta')
tibble(
var = names(d),
# data sources
src = str_extract(names(d), ".*?_"),
n = apply(d, 2, function(x) sum(!is.na(x)))
) %>%
group_by(src) %>%
summarise(n_vars = n(), min_N = min(n), max_N = max(n)) %>%
arrange(min_N) %>%
# arbitrary threshold at N = 50
filter(!is.na(src), min_N < 50) %>%
print(n = 100)PSI, EU, OECD, WWBI and a few others are particularly at fault:
# A tibble: 28 x 5
src n_vars min_N med_N max_N
<chr> <int> <int> <dbl> <int>
1 psi_ 6 1 10.5 20
2 mad_ 4 15 29 163
3 eu_ 277 16 34 48
4 une_ 47 16 146 193
5 wwbi_ 38 17 41 62
6 oecd_ 281 19 37 44
7 wdi_ 278 19 156 192
8 dev_ 4 20 20 20
9 dpi_ 70 26 160. 175
10 bs_ 8 28 28 28
11 ess_ 9 28 28 28
12 ideavt_ 6 28 107 180
13 wel_ 36 29 32 189
14 wvs_ 42 29 34 34
15 aid_ 6 31 139 139
16 cses_ 2 31 31.5 32
17 gol_ 20 33 127 129
18 wiid_ 18 34 35 35
19 ucdp_ 2 35 70 105
20 cpds_ 49 36 36 36
21 h_ 11 37 165 185
22 lis_ 23 37 37 37
23 r_ 5 40 98 144
24 sgi_ 29 41 41 41
25 top_ 2 41 41 41
26 nelda_ 10 44 45 45
27 vi_ 13 45 48 50
28 qs_ 9 47 112 115
Not a bug, but leads students to build designs with low sample sizes.