@@ -88,19 +88,19 @@ on the posterior distribution of $\gamma$, and are thus related.
88
88
` r Biocpkg("IgGeneUsage") ` has a couple of built-in Ig gene usage datasets.
89
89
Some were obtained from studies and others were simulated.
90
90
91
- Lets look into the simulated dataset ` d_zibb_2 ` . This dataset was generated
91
+ Lets look into the simulated dataset ` d_zibb_3 ` . This dataset was generated
92
92
by a zero-inflated beta-binomial (ZIBB) model, and ` r Biocpkg("IgGeneUsage") `
93
93
was designed to fit ZIBB-distributed data.
94
94
95
95
``` {r}
96
- data("d_zibb_2 ", package = "IgGeneUsage")
97
- knitr::kable(head(d_zibb_2 ))
96
+ data("d_zibb_3 ", package = "IgGeneUsage")
97
+ knitr::kable(head(d_zibb_3 ))
98
98
```
99
99
100
- We can also visualize ` d_zibb_2 ` with ` r CRANpkg("ggplot") ` :
100
+ We can also visualize ` d_zibb_3 ` with ` r CRANpkg("ggplot") ` :
101
101
102
102
``` {r, fig.width=6, fig.height=3.25}
103
- ggplot(data = d_zibb_2 )+
103
+ ggplot(data = d_zibb_3 )+
104
104
geom_point(aes(x = gene_name, y = gene_usage_count, col = condition),
105
105
position = position_dodge(width = .7), shape = 21)+
106
106
theme_bw(base_size = 11)+
@@ -113,10 +113,10 @@ ggplot(data = d_zibb_2)+
113
113
114
114
## DGU analysis
115
115
As main input ` r Biocpkg("IgGeneUsage") ` uses a data.frame formatted as e.g.
116
- ` d_zibb_2 ` . Other input parameters allow you to configure specific settings
116
+ ` d_zibb_3 ` . Other input parameters allow you to configure specific settings
117
117
of the ` r CRANpkg("rstan") ` sampler.
118
118
119
- In this example, we analyze ` d_zibb_2 ` with 3 MCMC chains, 1500 iterations
119
+ In this example, we analyze ` d_zibb_3 ` with 3 MCMC chains, 1500 iterations
120
120
each including 500 warm-ups using a single CPU core (Hint: for parallel
121
121
chain execution set parameter ` mcmc_cores ` = 3). We report for each model
122
122
parameter its mean and 95% highest density interval (HDIs).
@@ -129,8 +129,8 @@ issue with a reproducible script at the Bioconductor support site or on
129
129
Github[ ^ 3 ] .
130
130
131
131
``` {r}
132
- M <- DGU(ud = d_zibb_2 , # input data
133
- mcmc_warmup = 500 , # how many MCMC warm-ups per chain (default: 500)
132
+ M <- DGU(ud = d_zibb_3 , # input data
133
+ mcmc_warmup = 300 , # how many MCMC warm-ups per chain (default: 500)
134
134
mcmc_steps = 1500, # how many MCMC steps per chain (default: 1,500)
135
135
mcmc_chains = 3, # how many MCMC chain to run (default: 4)
136
136
mcmc_cores = 1, # how many PC cores to use? (e.g. parallel chains)
@@ -182,7 +182,7 @@ summary(M)
182
182
rstan::check_hmc_diagnostics(M$fit)
183
183
```
184
184
185
- * rhat < 1.03 and n_eff > 0
185
+ * rhat < 1.05 and n_eff > 0
186
186
187
187
188
188
``` {r, fig.height = 3, fig.width = 6}
@@ -197,7 +197,7 @@ Error bars show 95% HDI of mean posterior prediction. The predictions can be
197
197
compared with the observed data (x-axis). For points near the diagonal
198
198
$\rightarrow$ accurate prediction.
199
199
200
- ``` {r, fig.height = 3.25 , fig.width = 7}
200
+ ``` {r, fig.height = 4 , fig.width = 7}
201
201
ggplot(data = M$ppc$ppc_rep)+
202
202
facet_wrap(facets = ~individual_id, ncol = 5)+
203
203
geom_abline(intercept = 0, slope = 1, linetype = "dashed", col = "darkgray")+
@@ -366,7 +366,7 @@ by evaluating their variability for a specific gene.
366
366
This analysis can be computationally demanding.
367
367
368
368
``` {r}
369
- L <- LOO(ud = d_zibb_2 , # input data
369
+ L <- LOO(ud = d_zibb_3 , # input data
370
370
mcmc_warmup = 500, # how many MCMC warm-ups per chain (default: 500)
371
371
mcmc_steps = 1000, # how many MCMC steps per chain (default: 1,500)
372
372
mcmc_chains = 1, # how many MCMC chain to run (default: 4)
@@ -376,6 +376,7 @@ L <- LOO(ud = d_zibb_2, # input data
376
376
max_treedepth = 10) # tree depth evaluated at each step (default: 12)
377
377
```
378
378
379
+
379
380
Next, we collected the results (GU and DGU) from each LOO iteration:
380
381
381
382
``` {r}
@@ -388,32 +389,32 @@ L_dgu <- do.call(rbind, lapply(X = L, FUN = function(x){return(x$dgu)}))
388
389
389
390
## LOO-DGU: variability of effect size $\gamma$
390
391
391
- ``` {r, fig.width=6.5 , fig.height=4 }
392
+ ``` {r, fig.width=6, fig.height=5 }
392
393
ggplot(data = L_dgu)+
394
+ facet_wrap(facets = ~contrast, ncol = 1)+
393
395
geom_hline(yintercept = 0, linetype = "dashed", col = "gray")+
394
396
geom_errorbar(aes(x = gene_name, y = es_mean, ymin = es_L,
395
397
ymax = es_H, col = contrast, group = loo_id),
396
- width = 0.1, position = position_dodge(width = 0.5 ))+
398
+ width = 0.1, position = position_dodge(width = 0.75 ))+
397
399
geom_point(aes(x = gene_name, y = es_mean, col = contrast,
398
400
group = loo_id), size = 1,
399
- position = position_dodge(width = 0.5 ))+
401
+ position = position_dodge(width = 0.75 ))+
400
402
theme_bw(base_size = 11)+
401
- theme(legend.position = "top")+
402
- ylab(expression(gamma))+
403
- theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.4))
403
+ theme(legend.position = "none")+
404
+ ylab(expression(gamma))
404
405
```
405
406
406
407
## LOO-DGU: variability of $\pi$
407
408
408
- ``` {r, fig.width=6.5 , fig.height=4 }
409
+ ``` {r, fig.width=6, fig.height=5 }
409
410
ggplot(data = L_dgu)+
411
+ facet_wrap(facets = ~contrast, ncol = 1)+
410
412
geom_point(aes(x = gene_name, y = pmax, col = contrast,
411
413
group = loo_id), size = 1,
412
414
position = position_dodge(width = 0.5))+
413
415
theme_bw(base_size = 11)+
414
- theme(legend.position = "top")+
415
- ylab(expression(pi))+
416
- theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.4))
416
+ theme(legend.position = "none")+
417
+ ylab(expression(pi))
417
418
```
418
419
419
420
@@ -425,24 +426,16 @@ ggplot(data = L_gu)+
425
426
geom_errorbar(aes(x = gene_name, y = prob_mean, ymin = prob_L,
426
427
ymax = prob_H, col = condition,
427
428
group = interaction(loo_id, condition)),
428
- width = 0.1, position = position_dodge(width = 0.5 ))+
429
+ width = 0.1, position = position_dodge(width = 1 ))+
429
430
geom_point(aes(x = gene_name, y = prob_mean, col = condition,
430
431
group = interaction(loo_id, condition)), size = 1,
431
- position = position_dodge(width = 0.5 ))+
432
+ position = position_dodge(width = 1 ))+
432
433
theme_bw(base_size = 11)+
433
434
theme(legend.position = "top")+
434
435
ylab("GU [probability]")+
435
436
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.4))
436
437
```
437
438
438
- # Hierarchical clustering analaysis
439
-
440
- ``` {r, fig.width=6, fig.height=4}
441
- # x <- M$theta
442
- x <- acast(individual_id~gene_name, data = M$theta, value.var = "theta_mean")
443
-
444
- plot(hclust(dist(x, method = "euclidean"), method = "average"))
445
- ```
446
439
447
440
448
441
# Case Study B: analyzing IRRs containing biological replicates
0 commit comments