paf-inclusion-health/pafs-inclusion-health.qmd at main · ddelcastillof/paf-inclusion-health · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
---
title: "PAFs for inclusion health"
format: docx
author: Rob Aldridge & Darwin Del Castillo
editor_options:
  chunk_output_type: console
---

```{r setup}
#| include: FALSE
knitr::opts_chunk$set(echo = TRUE,
                      message = FALSE,
                      include =  FALSE,
                      warning = FALSE)
```

```{r packages}
pacman::p_load(googlesheets4,
               metafor,
               tidyverse,
               skimr,
               mc2d,
               patchwork,
               rjags,
               coda)
pacman::p_load_gh("JLSteenwyk/ggpubfigs")
```

```{r importing datasets}
#| warning: FALSE

googlesheets4::gs4_deauth()

input_smr_all <- read_sheet("https://docs.google.com/spreadsheets/d/1onGl6q4gkPI4a2dtUvEeqgQEee0NYceAHBBpyOOXs88/edit#gid=570599438", sheet = "input_mortality", col_types = "ccccccccdcddccdcdccccdddccccc??????")
input_country_region_codes <- read_sheet("https://docs.google.com/spreadsheets/d/1onGl6q4gkPI4a2dtUvEeqgQEee0NYceAHBBpyOOXs88/edit#gid=1722368264", sheet = "input_country_and_region_codes")
input_mortality_numbers <- read_sheet("https://docs.google.com/spreadsheets/d/1onGl6q4gkPI4a2dtUvEeqgQEee0NYceAHBBpyOOXs88/edit#gid=1722368264", sheet = "input_IHME-GBD_2019_DATA")
input_population_numbers <- read_sheet("https://docs.google.com/spreadsheets/d/1onGl6q4gkPI4a2dtUvEeqgQEee0NYceAHBBpyOOXs88/edit#gid=1722368264", sheet = "input_world_bank_population_totals")
input_country_name_clean <- read_sheet("https://docs.google.com/spreadsheets/d/1onGl6q4gkPI4a2dtUvEeqgQEee0NYceAHBBpyOOXs88/edit#gid=1722368264", sheet = "input_country_name_clean")
input_homeless_pop <- read_sheet("https://docs.google.com/spreadsheets/d/1onGl6q4gkPI4a2dtUvEeqgQEee0NYceAHBBpyOOXs88/edit#gid=1722368264", sheet = "input_homeless")
input_prison_pop <- read_sheet("https://docs.google.com/spreadsheets/d/1onGl6q4gkPI4a2dtUvEeqgQEee0NYceAHBBpyOOXs88/edit#gid=1722368264", sheet = "input_prison")
input_sud_pop <- read_sheet("https://docs.google.com/spreadsheets/d/1onGl6q4gkPI4a2dtUvEeqgQEee0NYceAHBBpyOOXs88/edit#gid=1722368264", sheet = "input_sud")
input_hic_country_list<- read_sheet("https://docs.google.com/spreadsheets/d/1onGl6q4gkPI4a2dtUvEeqgQEee0NYceAHBBpyOOXs88/edit#gid=1722368264", sheet = "input_hic")

glimpse(input_country_region_codes)
```

```{r additional settings}
## setting the number of iterations
sim_run <- 1000
## disable scientific notation
options(scipen = 999999)
```

```{r basic cleaning}
# filter the data to only include non-excluded or duplicate SMR data
smr_filter_all_cause <- input_smr_all %>%
  filter(Outcome_measure=="SMR", ICD10_chapter_number==0, is.na(Duplicate), is.na(Exclude), !is.na(lci), !is.na(uci)) %>%
  select (lci, uci, Outcome, Population, Country, First_author, Year_of_publication, Female) %>%
  as.data.frame() %>%
  rename (sex = Female)

# create standard error estimates from confidence intervals from study
smr_filter_all_cause$sei <- (log(smr_filter_all_cause$uci/smr_filter_all_cause$lci))/3.92
```

```{r advanced cleaning and monte carlo}
#| warning: FALSE
# clean the total mortality data
glimpse(input_mortality_numbers)

high_income_countries_paf <- input_mortality_numbers %>%
  group_by(country_name, sex_id) %>%
  summarise(
    deaths = sum(val, na.rm = T),
    deaths_lci = sum(lower, na.rm = T),
    deaths_uci = sum(upper, na.rm = T)
  )%>%
  filter (deaths >50000)%>%
  left_join(input_country_name_clean, by = "country_name")%>%
  filter (!is.na(alpha_3_country)) %>%
  ungroup() %>%
  mutate(
    sex = as.character (
      case_when ( sex_id == 1 ~ "Male",
                  sex_id == 2 ~ "Female",
                  sex_id == 3 ~ "Both"))
  ) %>%
  group_by(alpha_3_country, sex) %>%
  mutate(
    id = alpha_3_country,
    idsex = str_c(alpha_3_country, sex, sep = "-")
  ) %>%
  ungroup() %>%
  select (id, deaths, sex, idsex)

# clean the homeless population data
glimpse(input_homeless_pop)

input_homeless_pop <- input_homeless_pop %>%
  select(alpha_3_country, homeless_pop, homeless_pop_lower, homeless_pop_upper) %>%
  mutate(sim_run = sim_run) %>%
  uncount(sim_run)  %>%
  rowwise() %>%
  mutate(
    homeless_mc_se = (homeless_pop_upper - homeless_pop_lower)/(2*1.96),
    homeless_lambda = rnorm(1, mean = homeless_pop, sd = homeless_mc_se),
    homeless_lambda = ifelse(homeless_lambda < 0, 0, homeless_lambda),
    homeless_mc_poisson = rpois(1, lambda = homeless_lambda)) %>%
  group_by(alpha_3_country) %>%
  mutate(
    row_id = as.character(row_number()),
    id = str_c(alpha_3_country, row_id, sep = "-")
  ) %>%
  ungroup() %>%
  select (id, homeless_mc_poisson)

# clean the prison population data
glimpse(input_prison_pop)

input_prison_pop <- input_prison_pop  %>%
  left_join(input_country_name_clean, by = "country_name")  %>%
  right_join(input_hic_country_list, by = "alpha_3_country")   %>%
  filter (hic == 1) %>%
  select(alpha_3_country, prison_pop, prison_pop_lower, prison_pop_upper) %>%
  mutate(sim_run = sim_run) %>%
  uncount(sim_run) %>%
  rowwise() %>%
  mutate(
    prison_mc_se = (prison_pop_upper - prison_pop_lower)/(2*1.96),
    prison_lambda = rnorm(1, mean = prison_pop, sd = prison_mc_se),
    prison_lambda = ifelse(prison_lambda < 0, 0, prison_lambda),
    prison_mc_poisson = rpois(1, lambda = prison_lambda)) %>%
  group_by(alpha_3_country) %>%
  mutate(
    row_id = as.character(row_number()),
    id = str_c(alpha_3_country, row_id, sep = "-")
  ) %>%
  ungroup() %>%
  select (id, prison_mc_poisson)

# clean the sud population data
glimpse(input_sud_pop)

input_sud_pop <- input_sud_pop %>%
  group_by(country_name) %>%
  summarise(
    sud_pop = sum(val, na.rm = T),
    sud_pop_lower = sum(lower, na.rm = T),
    sud_pop_upper = sum(upper, na.rm = T),
  )%>%
  left_join(input_country_name_clean, by = "country_name")%>%
  filter (!is.na(alpha_3_country)) %>%
  left_join(input_hic_country_list, by = "alpha_3_country")   %>%
  filter (hic == 1) %>%
  select(alpha_3_country, sud_pop, sud_pop_lower, sud_pop_upper) %>%
  mutate(sim_run = sim_run) %>%
  uncount(sim_run) %>%
  rowwise() %>%
  mutate(
    sud_mc_se = (sud_pop_upper - sud_pop_lower)/(2*1.96),
    sud_lambda = rnorm(1, mean = sud_pop, sd = sud_mc_se),
    sud_lambda = ifelse(sud_lambda < 0, 0, sud_lambda),
    sud_mc_poisson = rpois(1, lambda = sud_lambda)
  ) %>%
  group_by(alpha_3_country) %>%
  mutate(
    row_id = as.character(row_number()),
    id = str_c(alpha_3_country, row_id, sep = "-")
  ) %>%
  ungroup() %>%
  select (id, sud_mc_poisson)

# clean the total population data
glimpse(input_population_numbers)

population_numbers <-  input_population_numbers %>%
  pivot_longer(
    cols = `1960`:`2021`
  ) %>%
  rename(
    year = name,
    gen_pop_tot = value
  ) %>%
  filter(year == 2019)%>%
  mutate(sim_run = sim_run) %>%
  uncount(sim_run)%>%
  group_by(alpha_3_country) %>%
  mutate(
    row_id = as.character(row_number()),
    id = str_c(alpha_3_country, row_id, sep = "-")
  ) %>%
  ungroup() %>%
  select (id, gen_pop_tot)
```

```{r gibbs sampling}
#| warning: FALSE

# Joint to the dataset
inclusion_health_pop_est <- reduce(list(input_homeless_pop,
                                        input_prison_pop,
                                        input_sud_pop,
                                        population_numbers),
                                   left_join, by = "id") %>% rowwise()

# Creating identifier for strata
inclusion_health_pop_est$country <- substr(inclusion_health_pop_est$id, 1, 3)

# List of countries (strata)
countries <- unique(inclusion_health_pop_est$country)

# Define alpha outside the loop (hyperparameters for the prior Dirichlet distribution)
alpha <- c(6, 4, 4, 4, 21, 9, 9, 3)

# List that contains the data for each country
data_jags_list <- list()

for (i in countries) {
  country_data <- subset(inclusion_health_pop_est, country == i)
  n <- nrow(country_data)

  data_jags <- list(
    N_homeless = country_data$homeless_mc_poisson,
    N_prison = country_data$prison_mc_poisson,
    N_sud = country_data$sud_mc_poisson,
    N_total = country_data$gen_pop_tot,
    n = n,
    alpha = alpha
  )

  data_jags_list[[i]] <- data_jags
}

# Defining JAGS model string
model_string <- "
model {
  for (i in 1:n) {
    # The observed counts are sums over the relevant subsets
    N_sud[i] ~ dpois(mu_sud[i])
    N_prison[i] ~ dpois(mu_prison[i])
    N_homeless[i] ~ dpois(mu_homeless[i])

    # Expected counts from overlaps
    mu_sud[i] <- N_total[i] * (p[2] + p[5] + p[6] + p[8])
    mu_prison[i] <- N_total[i] * (p[3] + p[5] + p[7] + p[8])
    mu_homeless[i] <- N_total[i] * (p[4] + p[6] + p[7] + p[8])
  }

  # Priors for the probabilities (sum to 1)
  p[1:8] ~ ddirich(alpha[])  # Dirichlet prior with alpha reflecting prior information

  # Assign probabilities to meaningful names (renamed)
  p_none <- p[1]             # None of the categories
  p_sud_only <- p[2]         # Only substance drug abusers
  p_prison_only <- p[3]      # Only prisoners
  p_homeless_only <- p[4]    # Only homeless
  p_sud_prison <- p[5]       # Substance drug abusers and prisoners
  p_sud_homeless <- p[6]     # Substance drug abusers and homeless
  p_prison_homeless <- p[7]  # Prisoners and homeless
  p_all_three <- p[8]        # All three categories
}
"

# Parameters to monitor
parameters <- c("p[1]", "p[2]", "p[3]", "p[4]", "p[5]", "p[6]", "p[7]", "p[8]")

# Initialize list to store results
jags_results <- list()

# Run the model for each country
for (c in countries) {
  data_jags <- data_jags_list[[c]]

  # Initial values
  inits <- function() {
    p_init <- as.vector(rdirichlet(1, alpha))
    init_list <- list()
    for (k in 1:8) {
      init_list[[paste0("p[", k, "]")]] <- p_init[k]
    }
    return(init_list)
  }

  # Create JAGS model
  model <- jags.model(textConnection(model_string), data = data_jags, inits = inits,
                      n.chains = 3, n.adapt = 1000)

  # Burn-in
  update(model, 1000)

  # Sample from the posterior
  samples <- coda.samples(model, variable.names = parameters, n.iter = 5000)

  # Store results
  jags_results[[c]] <- samples
}

# Summarizing results of posterior means for each country in a data frame

# Define descriptive names for the parameters
param_names <- c(
  "p_none",
  "p_sud_only",
  "p_prison_only",
  "p_homeless_only",
  "p_sud_prison",
  "p_sud_homeless",
  "p_prison_homeless",
  "p_all_three"
)

# Creating an empty dataframe
country_proportions_inclusion <- data.frame(
  country = character(),
  p_none = numeric(),
  p_sud_only = numeric(),
  p_prison_only = numeric(),
  p_homeless_only = numeric(),
  p_sud_prison = numeric(),
  p_sud_homeless = numeric(),
  p_prison_homeless = numeric(),
  p_all_three = numeric(),
  stringsAsFactors = FALSE
)

# Loop through each country in results_list
for (country in names(jags_results)) {

  # Retrieve the MCMC samples for the current country
  samples <- jags_results[[country]]

  # Summarize the samples to get posterior statistics
  summary_stats <- summary(samples)

  # Construct the parameter labels as they appear in the summary
  param_labels <- paste0("p[", 1:8, "]")

  # Check if all parameters are present
  if (!all(param_labels %in% rownames(summary_stats$statistics))) {
    warning(paste("Missing parameters for country:", country))
    next  # Skip to the next iteration if parameters are missing
  }

  # Extract posterior means for p[1] to p[8]
  posterior_means <- summary_stats$statistics[param_labels, "Mean"]

  # Assign descriptive names to the posterior means
  names(posterior_means) <- param_names

  # Create a dataframe row with the country name and posterior means
  country_row <- data.frame(
    country = country,
    t(as.data.frame(posterior_means)),
    stringsAsFactors = FALSE
  )

  # Append the row to the results dataframe
  country_proportions_inclusion <- rbind(country_proportions_inclusion, country_row)
}
```

```{r final mods in results data frame}
#| warning: FALSE
# Calculating total population affected by any of these conditions per country
## Creating total population column in country_proportions_inclusion data frame

country_pop <- input_population_numbers %>%
  pivot_longer(
    cols = `1960`:`2021`
  ) %>%
  rename(
    year = name,
    gen_pop_tot = value
  ) %>%
  filter(year == 2019) %>%
  group_by(alpha_3_country) %>%
  select (alpha_3_country, gen_pop_tot) %>%
  filter(alpha_3_country %in% input_hic_country_list$alpha_3_country) %>%
  rename(country = alpha_3_country,
         total_pop = gen_pop_tot)

country_inclusion_pop_total <- country_proportions_inclusion %>%
  left_join(country_pop) %>%
  rename(id = country) %>%
  mutate(total_none = round(p_none*total_pop,0),
         total_sud_only = round(p_sud_only*total_pop,0),
         total_prison_only = round(p_prison_only*total_pop,0),
         total_homeless_only = round(p_homeless_only*total_pop,0),
         total_sud_prison = round(p_sud_prison*total_pop,0),
         total_sud_homeless = round(p_sud_homeless*total_pop,0),
         total_prison_homeless = round(p_prison_homeless*total_pop,0),
         total_all_three = round(p_all_three*total_pop,0),
         inclusion_health_pop_tot = rowSums(across(total_sud_only:total_all_three)),
         prop_inc_health = rowSums(across(p_sud_only:p_all_three)))
```

```{r random effects model for SMR}
smr_filter_all_cause_both <- smr_filter_all_cause %>%
  filter(sex=="Both")

res_smr_all_cause_both <- rma.uni(yi = log(smr_filter_all_cause_both$Outcome), sei=smr_filter_all_cause_both$sei, data = smr_filter_all_cause_both, method="REML")
log_smr.point_est_both <- as.numeric(exp(res_smr_all_cause_both$b))
log_smr.se_both <- as.numeric(res_smr_all_cause_both$se)


smr_filter_all_cause_male <- smr_filter_all_cause %>%
  filter(sex=="Male")

res_smr_all_cause_male <- rma.uni(yi = log(smr_filter_all_cause_male$Outcome), sei=smr_filter_all_cause_male$sei, data = smr_filter_all_cause_male, method="REML")
log_smr.point_est_male <- as.numeric(exp(res_smr_all_cause_male$b))
log_smr.se_male <- as.numeric(res_smr_all_cause_male$se)


smr_filter_all_cause_female <- smr_filter_all_cause %>%
  filter(sex=="Female")

res_smr_all_cause_female <- rma.uni(yi = log(smr_filter_all_cause_female$Outcome), sei=smr_filter_all_cause_female$sei, data = smr_filter_all_cause_female, method="REML")
log_smr.point_est_female <- as.numeric(exp(res_smr_all_cause_female$b))
log_smr.se_female <- as.numeric(res_smr_all_cause_female$se)


skim(high_income_countries_paf)

```

```{r calculating PAFs}
#| message: FALSE
#| warning: FALSE

source("functions/functions.R")

high_income_countries_paf_sex <- high_income_countries_paf %>%
  left_join(country_inclusion_pop_total, by = "id") %>%
  mutate(
    inclusion_health_pop_tot = ifelse(sex == "Male", inclusion_health_pop_tot/2, inclusion_health_pop_tot),
    inclusion_health_pop_tot = ifelse(sex == "Female", inclusion_health_pop_tot/2, inclusion_health_pop_tot),
    prop_inc_health = ifelse(sex == "Male", prop_inc_health * 0.8, prop_inc_health),
    prop_inc_health = ifelse(sex == "Female", prop_inc_health * 0.2, prop_inc_health)
  ) %>%
    rowwise() %>%
    mutate(
      smr = ifelse(sex == "Male", rnorm(1, mean=log_smr.point_est_male, sd = log_smr.se_male), 0),
      smr = ifelse(sex == "Female", rnorm(1, mean=log_smr.point_est_female, sd = log_smr.se_female), smr),
      smr = ifelse(sex == "Both", rnorm(1, mean=log_smr.point_est_both, sd = log_smr.se_both), smr),
      attr_deaths = attr_deaths(prop_inc_health, smr, deaths),
      paf = paf (prop_inc_health, smr)
    ) %>%
    ungroup() %>%
    group_by(id, sex)%>%
    summarise(
      paf_50 = quantile(paf, probs=(0.5)),
      paf_025 = quantile(paf, probs=(0.025)),
      paf_975 = quantile(paf, probs=(0.975)),
      prop_inc_health_50 = quantile(prop_inc_health, probs=(0.5)),
      prop_inc_health_025 = quantile(prop_inc_health, probs=(0.025)),
      prop_inc_health_975 = quantile(prop_inc_health, probs=(0.975)),
      smr_50 = quantile(smr, probs=(0.5)),
      smr_025 = quantile(smr, probs=(0.025)),
      smr_975 = quantile(smr, probs=(0.975)),
      attr_deaths_50 = quantile(attr_deaths, probs=(0.5)),
      attr_deaths_025 = quantile(attr_deaths, probs=(0.025)),
      attr_deaths_975 = quantile(attr_deaths, probs=(0.975)),
      deaths_50 = quantile(deaths, probs=(0.5))
    )%>%
    mutate(
      not_attr_deaths_50 = deaths_50 - attr_deaths_50
    )
```

```{r graphs}
#| warning: FALSE
#| message: FALSE

# Total PAF
  p1_1 <- high_income_countries_paf_sex %>%
    filter(sex=="Male") %>%
    ggplot(aes(x=id, y=paf_50))+
    geom_bar(stat="identity", fill="#004488", alpha=0.5) +
    xlab("Countries") + ylab("Population attibutable fraction") +
    ggtitle("Males") +
    ylim(0, 0.15)

  p1_2 <- high_income_countries_paf_sex %>%
    filter(sex=="Female") %>%
    ggplot(aes(x=id, y=paf_50))+
    geom_bar(stat="identity", fill="#BB5566", alpha=0.5) +
    xlab("Countries") + ylab("Population attibutable fraction") +
    ggtitle("Females") +
    ylim(0, 0.15)

  p1_3 <-  high_income_countries_paf_sex %>%
    filter(sex=="Both") %>%
    ggplot(aes(x=id, y=paf_50))+
    geom_bar(stat="identity", fill="#DDAA33", alpha=0.5) +
    xlab("Countries") + ylab("Population attibutable fraction") +
    ggtitle("Both") +
    ylim(0, 0.15)

## Change colors for sex

# Proportion inclusion health
p2 <- high_income_countries_paf_sex %>%
      filter(sex=="Both") %>%
      ggplot(aes(x=id, y=prop_inc_health_50, fill = id))+
      geom_bar(stat="identity", alpha=0.5, fill = "#44AA99") +
      xlab("Countries") + ylab("Proportion inclusion heealth (p50)") +
      ggtitle("Proportion of socially-excluded groups by countries")


# SMR
p3 <-  high_income_countries_paf_sex %>%
       filter(sex=="Both") %>%
       ggplot(aes(x=id, y=smr_50, fill= id )) +
       geom_bar(stat="identity", alpha=0.5, fill ="#44AA99") +
       xlab("Countries") + ylab("Standarized mortality ratios (p50)") +
       ggtitle("Standarized mortality ratios by countries")


#total and attributable deaths

p4_1 <- high_income_countries_paf_sex %>%
    filter(sex=="Male") %>%
    select(id, not_attr_deaths_50, attr_deaths_50)%>%
    pivot_longer(
      cols = not_attr_deaths_50:attr_deaths_50,
      names_to = "deaths",
      values_to = "counts"
    ) %>%
    ggplot(aes(fill=deaths, y=id, x=counts)) +
    xlab("Number of attributable deaths") + ylab("Countries") +
    labs(fill = "Deaths") +
    geom_bar(position="stack", stat="identity") +
    ggtitle("Males") +
    scale_fill_manual(
    values = c("attr_deaths_50" = "#CC79A7", "not_attr_deaths_50" = "#0072B2"),
    labels = c("attr_deaths_50" = "Attributable Deaths (p50)",
               "not_attr_deaths_50" = "Non-Attributable Deaths (p50)")
  )

p4_2 <- high_income_countries_paf_sex %>%
    filter(sex=="Female") %>%
    select(id, not_attr_deaths_50, attr_deaths_50)%>%
    pivot_longer(
      cols = not_attr_deaths_50:attr_deaths_50,
      names_to = "deaths",
      values_to = "counts"
    ) %>%
    ggplot(aes(fill=deaths, y=id, x=counts)) +
    xlab("Number of attributable deaths") + ylab("Countries") +
    labs(fill = "Deaths") +
    geom_bar(position="stack", stat="identity")+
    ggtitle("Females") +
    scale_fill_manual(
    values = c("attr_deaths_50" = "#CC79A7", "not_attr_deaths_50" = "#0072B2"),
    labels = c("attr_deaths_50" = "Attributable Deaths (p50)",
               "not_attr_deaths_50" = "Non-Attributable Deaths (p50)")
  )

p4_3 <- high_income_countries_paf_sex %>%
  filter(sex == "Both") %>%
  select(id, not_attr_deaths_50, attr_deaths_50) %>%
  pivot_longer(
    cols = not_attr_deaths_50:attr_deaths_50,
    names_to = "deaths",
    values_to = "counts"
  ) %>%
  ggplot(aes(fill = deaths, y = id, x = counts)) +
  xlab("Number of attributable deaths") +
  ylab("Countries") +
  labs(fill = "Deaths") +
  geom_bar(position = "stack", stat = "identity") +
  ggtitle("Both") +
  scale_fill_manual(
    values = c("attr_deaths_50" = "#CC79A7", "not_attr_deaths_50" = "#0072B2"),
    labels = c("attr_deaths_50" = "Attributable Deaths (p50)",
               "not_attr_deaths_50" = "Non-Attributable Deaths (p50)")
  )
```

```{r jaccard-heatmap-build}
#| warning: FALSE
#| message: FALSE

# Build a Jaccard heatmap (pairwise overlaps) from current posterior samples
# Assumes `jags_results` exists (as created earlier in the script)

jaccard_from_samples <- function(samples_mcmc_list) {
  mat <- as.matrix(samples_mcmc_list)  # combine chains
  needed <- paste0("p[", 1:8, "]")
  # Stop early if the required columns are missing
  if (!all(needed %in% colnames(mat))) {
    stop("Missing p[1:8] in MCMC samples")
  }
  p2 <- mat[, "p[2]"]; p3 <- mat[, "p[3]"]; p4 <- mat[, "p[4]"]
  p5 <- mat[, "p[5]"]; p6 <- mat[, "p[6]"]; p7 <- mat[, "p[7]"]; p8 <- mat[, "p[8]"]

  # Marginals
  pSUD  <- p2 + p5 + p6 + p8
  pPRIS <- p3 + p5 + p7 + p8
  pHOME <- p4 + p6 + p7 + p8

  # Intersections
  i_SP <- p5 + p8
  i_SH <- p6 + p8
  i_PH <- p7 + p8

  # Jaccard indices (guard against division by ~0)
  J_SP <- i_SP / pmax(pSUD + pPRIS - i_SP, .Machine$double.eps)
  J_SH <- i_SH / pmax(pSUD + pHOME - i_SH, .Machine$double.eps)
  J_PH <- i_PH / pmax(pPRIS + pHOME - i_PH, .Machine$double.eps)

  tibble::tibble(
    pair = c("SUD–Prison", "SUD–Homeless", "Prison–Homeless"),
    J_median = c(stats::median(J_SP), stats::median(J_SH), stats::median(J_PH)),
    J_lci    = c(stats::quantile(J_SP, 0.025), stats::quantile(J_SH, 0.025), stats::quantile(J_PH, 0.025)),
    J_uci    = c(stats::quantile(J_SP, 0.975), stats::quantile(J_SH, 0.975), stats::quantile(J_PH, 0.975))
  )
}

# Compute summary across all countries present in jags_results
jaccard_all <- purrr::imap_dfr(
  jags_results,
  ~ jaccard_from_samples(.x) %>% dplyr::mutate(country = .y),
  .progress = FALSE
)

# Heatmap plot object
p_jaccard_heatmap <- ggplot2::ggplot(jaccard_all, ggplot2::aes(x = pair, y = country, fill = J_median)) +
  ggplot2::geom_tile() +
  ggplot2::geom_text(ggplot2::aes(label = sprintf("%.1f%%", 100 * J_median)), size = 3) +
  ggplot2::scale_fill_gradient(limits = c(0, 1), low = "white", high = "#004488", name = "Legend") +
  ggplot2::labs(
    x = NULL, y = NULL,
    title = "Pairwise overlap by country"
  ) +
  ggplot2::theme_minimal() +
  ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, hjust = 1))
```
```

# Summary of analysis
## Packages to be used
The packages used in this project were the following: \
- googlesheets4: to import data from Google Sheets \
- metafor: to calculate the SMR based on the meta-analysis data \
- tidyverse: data wrangling and visualization \
- skimr: to summarize the data after importation or manipulation \
- mc2d: to run Monte Carlo simulations and setting prior distributions for Gibbs sampling \
- patchwork: to arrange multiple plots \
- rjags: to run the Bayesian Gibbs Sampling for estimating the overlapping proportions \
- coda: to summarize the results of the Bayesian Gibbs Sampling \
- ggpubfigs: to use color-blind-friendly palettes \

```{r packages, eval=FALSE}
```

## Data import and basic cleaning
First, I imported the data from the Google Sheets verifying the data types and the structure of the data.
Additionally, I assigned the number of iterations to the vector `sim_run` and deactivated the scientific notation for the analysis.

```{r importing datasets, eval=FALSE}
```

```{r additional settings, eval=FALSE}
```

Then, I filtered the SMR data from the `input_smr_all` object into the new object `smr_filter_all_cause` to include only the non-excluded or duplicate data and calculated the standard error estimates from the confidence intervals.

```{r basic cleaning, eval=FALSE}
```

## Further cleaning and Monte Carlo simulations
I cleaned the total mortality data, the homeless population data, the prison population data, the substance use disorder (SUD) population data, and the total population data. I also ran Monte Carlo simulations for the homeless, prison, and SUD populations to estimate the total population affected by these conditions per country with 1000 iteration to account for uncertainty.

```{r advanced cleaning and monte carlo, eval=FALSE}
```

## Estimating the total inclusion health groups for each country and calculating overlaps between populations
I set the Bayesian network assumptions for the Gibbs Sampling model. The full set of assumptions for the prior distributions of the overlap between inclusion health groups are: \
- 0.2 overlap between homelessness and substance drug abusers \
- 0.4 overlap between substance drug abuse and incarceration \
- 0.2 overlap between incarceration and homelessness \
- 0.05 overlap between the three conditions \
The assumption for the overlap between homelessness and substance drug disorder come from previous literature (\href{https://pubmed.ncbi.nlm.nih.gov/36660275/}{PMID: 36660275}). The other overlaps are based on good faith (no better literature was found). I also assumed independence between populations.

The vector `alpha` contains the hyperparameters to specify the Dirichlet distribution for the prior probabilities of the Gibbs Sampling model. Given the prior probabilities assumed, the counts for the Dirichlet distribution sums 60. Then, the JAGS model string was defined, and the parameters to monitor were set. I ran the Gibbs Sampling model for each country using a loop and stored the results. Finally, I summarized the results of the posterior means for each country in a data frame.

```{r gibbs sampling, eval=FALSE}
```

After obtaining the proportions, I calculated the total population affected by any of these conditions per country and the proportion of the population affected by inclusion health.

```{r final mods in results data frame, eval=FALSE}
```

## Calculating SMRs and PAFs
I also calculated the SMR for each country using data from the systematic review and applying a random effects meta-analysis. The results were stored in the `res_smr_all_cause_both`,

```{r random effects model for SMR, eval=FALSE}
```

I calculated the PAFs for each country based on the inclusion health proportions and the SMR data. \
The functions to calculate the PAFs and the attributable deaths are stored in the `functions.R` file.

```{r calculating PAFs, eval=FALSE}
```

## Graphs
```{r graphs, eval=FALSE}
```

Lastly, I created graphs that will help with the visualization of the results. \
The first set of graphs shows the PAFs for inclusion health for each country stratified by sex.

```{r fig.width=10, fig.height=10, fig.align='center', include=TRUE, echo=FALSE}
p1_1 + p1_2 + plot_layout(ncol = 2, nrow = 1)
```

Second, I created a graph showing the proportion of inclusion health for each country.

```{r fig.width=10, fig.height=10, fig.align='center', include=TRUE, echo=FALSE}
p2
```

Third, I created a graph showing the SMR for each country.

```{r fig.width=10, fig.height=10, fig.align='center', include=TRUE, echo=FALSE}
p3
```

Finally, I created a graph showing the total and attributable deaths for each country stratified by sex.

```{r fig.width=10, fig.height=10, fig.align='center', include=TRUE, echo=FALSE}
p4_1 + p4_2 + p4_3 + plot_layout(ncol = 2)
```

```{r fig.width=10, fig.height=10, fig.align='center', include=TRUE, echo=FALSE}
p_jaccard_heatmap
```