@@ -18,7 +18,6 @@ knitr::opts_chunk$set(comment = FALSE,
18
18
```
19
19
20
20
21
-
22
21
``` {r}
23
22
require(IgGeneUsage)
24
23
require(rstan)
@@ -30,6 +29,7 @@ require(reshape2)
30
29
require(patchwork)
31
30
```
32
31
32
+
33
33
# Introduction
34
34
Decoding the properties of immune receptor repertoires (IRRs) is key to
35
35
understanding how our adaptive immune system responds to challenges, such
@@ -92,11 +92,13 @@ Lets look into the simulated dataset `d_zibb_3`. This dataset was generated
92
92
by a zero-inflated beta-binomial (ZIBB) model, and ` r Biocpkg("IgGeneUsage") `
93
93
was designed to fit ZIBB-distributed data.
94
94
95
+
95
96
``` {r}
96
97
data("d_zibb_3", package = "IgGeneUsage")
97
98
knitr::kable(head(d_zibb_3))
98
99
```
99
100
101
+
100
102
We can also visualize ` d_zibb_3 ` with ` r CRANpkg("ggplot") ` :
101
103
102
104
``` {r, fig.width=6, fig.height=3.25}
@@ -128,6 +130,7 @@ adjust the inputs accordingly. If the warnings persist, please submit an
128
130
issue with a reproducible script at the Bioconductor support site or on
129
131
Github[ ^ 3 ] .
130
132
133
+
131
134
``` {r}
132
135
M <- DGU(ud = d_zibb_3, # input data
133
136
mcmc_warmup = 300, # how many MCMC warm-ups per chain (default: 500)
@@ -151,6 +154,7 @@ In the output of DGU, we provide the following objects:
151
154
* ` fit ` : rstan ('stanfit') object of the fitted model $\rightarrow$ used
152
155
for model checks (see section 'Model checking')
153
156
157
+
154
158
``` {r}
155
159
summary(M)
156
160
```
@@ -189,6 +193,7 @@ rstan::check_hmc_diagnostics(M$fit)
189
193
rstan::stan_rhat(object = M$fit)|rstan::stan_ess(object = M$fit)
190
194
```
191
195
196
+
192
197
## PPC: posterior predictive checks
193
198
### PPCs: repertoire-specific
194
199
The model used by ` r Biocpkg("IgGeneUsage") ` is generative, i.e. with the
@@ -248,6 +253,7 @@ deviation (sd), L (low bound of 95% HDI), H (high bound of 95% HDI)
248
253
kable(x = head(M$dgu), row.names = FALSE, digits = 2)
249
254
```
250
255
256
+
251
257
### DGU: differential gene usage
252
258
We know that the values of ` \gamma ` and ` \pi ` are related to each other.
253
259
Lets visualize them for all genes (shown as a point). Names are shown for
@@ -333,6 +339,7 @@ ggplot()+
333
339
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.4))
334
340
```
335
341
342
+
336
343
## GU: gene usage summary
337
344
` r Biocpkg("IgGeneUsage") ` also reports the inferred gene usage (GU)
338
345
probability of individual genes in each condition. For a given gene we
@@ -352,6 +359,7 @@ ggplot(data = M$gu)+
352
359
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.4))
353
360
```
354
361
362
+
355
363
# Leave-one-out (LOO) analysis
356
364
To assert the robustness of the probability of DGU ($\pi$) and the effect
357
365
size ($\gamma$), ` r Biocpkg("IgGeneUsage") ` has a built-in procedure for
@@ -365,6 +373,7 @@ by evaluating their variability for a specific gene.
365
373
366
374
This analysis can be computationally demanding.
367
375
376
+
368
377
``` {r}
369
378
L <- LOO(ud = d_zibb_3, # input data
370
379
mcmc_warmup = 500, # how many MCMC warm-ups per chain (default: 500)
@@ -404,6 +413,7 @@ ggplot(data = L_dgu)+
404
413
ylab(expression(gamma))
405
414
```
406
415
416
+
407
417
## LOO-DGU: variability of $\pi$
408
418
409
419
``` {r, fig.width=6, fig.height=5}
@@ -437,34 +447,35 @@ ggplot(data = L_gu)+
437
447
```
438
448
439
449
440
-
441
450
# Case Study B: analyzing IRRs containing biological replicates
442
451
443
452
``` {r}
444
453
data("d_zibb_4", package = "IgGeneUsage")
445
454
knitr::kable(head(d_zibb_4))
446
455
```
447
456
457
+
448
458
We can also visualize ` d_zibb_4 ` with ` r CRANpkg("ggplot") ` :
449
459
450
- ``` {r, fig.width=6, fig.height=3.25}
460
+ ``` {r, fig.width=6.5 , fig.height=3.25}
451
461
ggplot(data = d_zibb_4)+
452
- geom_point(aes(x = gene_name, y = gene_usage_count, col = condition),
453
- position = position_dodge(width = .7), shape = 21 )+
462
+ geom_point(aes(x = gene_name, y = gene_usage_count, col = condition,
463
+ shape = replicate), position = position_dodge(width = 0.8) )+
454
464
theme_bw(base_size = 11)+
455
465
ylab(label = "Gene usage [count]")+
456
466
xlab(label = '')+
457
467
theme(legend.position = "top")+
458
468
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.4))
459
469
```
460
470
471
+
461
472
## Modeling
462
473
463
474
``` {r}
464
475
M <- DGU(ud = d_zibb_4, # input data
465
476
mcmc_warmup = 500, # how many MCMC warm-ups per chain (default: 500)
466
477
mcmc_steps = 1500, # how many MCMC steps per chain (default: 1,500)
467
- mcmc_chains = 3 , # how many MCMC chain to run (default: 4)
478
+ mcmc_chains = 2 , # how many MCMC chain to run (default: 4)
468
479
mcmc_cores = 1, # how many PC cores to use? (e.g. parallel chains)
469
480
hdi_lvl = 0.95, # highest density interval level (de fault: 0.95)
470
481
adapt_delta = 0.8, # MCMC target acceptance rate (default: 0.95)
@@ -487,11 +498,13 @@ ggplot(data = M$ppc$ppc_rep)+
487
498
ylab(label = "PPC usage [counts]")
488
499
```
489
500
501
+
490
502
## Analysis of estimated effect sizes
491
503
The top panel shows the average gene usage (GU) in different biological
492
504
conditions. The bottom panels shows the differential gene usage (DGU)
493
505
between pairs of biological conditions.
494
506
507
+
495
508
``` {r, fig.weight = 7, fig.height = 4}
496
509
g1 <- ggplot(data = M$gu)+
497
510
geom_errorbar(aes(x = gene_name, y = prob_mean, ymin = prob_L,
@@ -531,8 +544,6 @@ g2 <- ggplot(data = stats)+
531
544
```
532
545
533
546
534
-
535
-
536
547
# Session
537
548
538
549
``` {r}
0 commit comments