Skip to content

Commit b6da936

Browse files
committed
minor modifications, final
1 parent 3fe8e0a commit b6da936

File tree

4 files changed

+23
-12
lines changed

4 files changed

+23
-12
lines changed

data/d_zibb_4.RData

-1.1 KB
Binary file not shown.

inst/scripts/d_zibb_4.R

+3-3
Original file line numberDiff line numberDiff line change
@@ -69,10 +69,10 @@ m <- rstan::stan_model(model_code = sim_stan)
6969

7070
# generate data based on the following parameters parameters
7171
set.seed(1021)
72-
N_gene <- 10
72+
N_gene <- 8
7373
N_replicates <- 4
74-
N_condition <- 3
75-
N_individual_per_condition <- 7
74+
N_condition <- 2
75+
N_individual_per_condition <- 5
7676
N_individual <- N_individual_per_condition * N_condition
7777
N_sample <- N_individual * N_replicates
7878
condition_id <- rep(1:N_condition, each = N_individual_per_condition)

man/d_zibb_4.Rd

+1-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
A small example dataset that has the following features:
88

99
\itemize{
10-
\item 3 conditions
10+
\item 2 conditions
1111
\item 7 individuals per condition
1212
\item 4 replicates per individual
1313
\item 8 Ig genes

vignettes/User_Manual.Rmd

+19-8
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@ knitr::opts_chunk$set(comment = FALSE,
1818
```
1919

2020

21-
2221
```{r}
2322
require(IgGeneUsage)
2423
require(rstan)
@@ -30,6 +29,7 @@ require(reshape2)
3029
require(patchwork)
3130
```
3231

32+
3333
# Introduction
3434
Decoding the properties of immune receptor repertoires (IRRs) is key to
3535
understanding how our adaptive immune system responds to challenges, such
@@ -92,11 +92,13 @@ Lets look into the simulated dataset `d_zibb_3`. This dataset was generated
9292
by a zero-inflated beta-binomial (ZIBB) model, and `r Biocpkg("IgGeneUsage")`
9393
was designed to fit ZIBB-distributed data.
9494

95+
9596
```{r}
9697
data("d_zibb_3", package = "IgGeneUsage")
9798
knitr::kable(head(d_zibb_3))
9899
```
99100

101+
100102
We can also visualize `d_zibb_3` with `r CRANpkg("ggplot")`:
101103

102104
```{r, fig.width=6, fig.height=3.25}
@@ -128,6 +130,7 @@ adjust the inputs accordingly. If the warnings persist, please submit an
128130
issue with a reproducible script at the Bioconductor support site or on
129131
Github[^3].
130132

133+
131134
```{r}
132135
M <- DGU(ud = d_zibb_3, # input data
133136
mcmc_warmup = 300, # how many MCMC warm-ups per chain (default: 500)
@@ -151,6 +154,7 @@ In the output of DGU, we provide the following objects:
151154
* `fit`: rstan ('stanfit') object of the fitted model $\rightarrow$ used
152155
for model checks (see section 'Model checking')
153156

157+
154158
```{r}
155159
summary(M)
156160
```
@@ -189,6 +193,7 @@ rstan::check_hmc_diagnostics(M$fit)
189193
rstan::stan_rhat(object = M$fit)|rstan::stan_ess(object = M$fit)
190194
```
191195

196+
192197
## PPC: posterior predictive checks
193198
### PPCs: repertoire-specific
194199
The model used by `r Biocpkg("IgGeneUsage")` is generative, i.e. with the
@@ -248,6 +253,7 @@ deviation (sd), L (low bound of 95% HDI), H (high bound of 95% HDI)
248253
kable(x = head(M$dgu), row.names = FALSE, digits = 2)
249254
```
250255

256+
251257
### DGU: differential gene usage
252258
We know that the values of `\gamma` and `\pi` are related to each other.
253259
Lets visualize them for all genes (shown as a point). Names are shown for
@@ -333,6 +339,7 @@ ggplot()+
333339
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.4))
334340
```
335341

342+
336343
## GU: gene usage summary
337344
`r Biocpkg("IgGeneUsage")` also reports the inferred gene usage (GU)
338345
probability of individual genes in each condition. For a given gene we
@@ -352,6 +359,7 @@ ggplot(data = M$gu)+
352359
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.4))
353360
```
354361

362+
355363
# Leave-one-out (LOO) analysis
356364
To assert the robustness of the probability of DGU ($\pi$) and the effect
357365
size ($\gamma$), `r Biocpkg("IgGeneUsage")` has a built-in procedure for
@@ -365,6 +373,7 @@ by evaluating their variability for a specific gene.
365373

366374
This analysis can be computationally demanding.
367375

376+
368377
```{r}
369378
L <- LOO(ud = d_zibb_3, # input data
370379
mcmc_warmup = 500, # how many MCMC warm-ups per chain (default: 500)
@@ -404,6 +413,7 @@ ggplot(data = L_dgu)+
404413
ylab(expression(gamma))
405414
```
406415

416+
407417
## LOO-DGU: variability of $\pi$
408418

409419
```{r, fig.width=6, fig.height=5}
@@ -437,34 +447,35 @@ ggplot(data = L_gu)+
437447
```
438448

439449

440-
441450
# Case Study B: analyzing IRRs containing biological replicates
442451

443452
```{r}
444453
data("d_zibb_4", package = "IgGeneUsage")
445454
knitr::kable(head(d_zibb_4))
446455
```
447456

457+
448458
We can also visualize `d_zibb_4` with `r CRANpkg("ggplot")`:
449459

450-
```{r, fig.width=6, fig.height=3.25}
460+
```{r, fig.width=6.5, fig.height=3.25}
451461
ggplot(data = d_zibb_4)+
452-
geom_point(aes(x = gene_name, y = gene_usage_count, col = condition),
453-
position = position_dodge(width = .7), shape = 21)+
462+
geom_point(aes(x = gene_name, y = gene_usage_count, col = condition,
463+
shape = replicate), position = position_dodge(width = 0.8))+
454464
theme_bw(base_size = 11)+
455465
ylab(label = "Gene usage [count]")+
456466
xlab(label = '')+
457467
theme(legend.position = "top")+
458468
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.4))
459469
```
460470

471+
461472
## Modeling
462473

463474
```{r}
464475
M <- DGU(ud = d_zibb_4, # input data
465476
mcmc_warmup = 500, # how many MCMC warm-ups per chain (default: 500)
466477
mcmc_steps = 1500, # how many MCMC steps per chain (default: 1,500)
467-
mcmc_chains = 3, # how many MCMC chain to run (default: 4)
478+
mcmc_chains = 2, # how many MCMC chain to run (default: 4)
468479
mcmc_cores = 1, # how many PC cores to use? (e.g. parallel chains)
469480
hdi_lvl = 0.95, # highest density interval level (de fault: 0.95)
470481
adapt_delta = 0.8, # MCMC target acceptance rate (default: 0.95)
@@ -487,11 +498,13 @@ ggplot(data = M$ppc$ppc_rep)+
487498
ylab(label = "PPC usage [counts]")
488499
```
489500

501+
490502
## Analysis of estimated effect sizes
491503
The top panel shows the average gene usage (GU) in different biological
492504
conditions. The bottom panels shows the differential gene usage (DGU)
493505
between pairs of biological conditions.
494506

507+
495508
```{r, fig.weight = 7, fig.height = 4}
496509
g1 <- ggplot(data = M$gu)+
497510
geom_errorbar(aes(x = gene_name, y = prob_mean, ymin = prob_L,
@@ -531,8 +544,6 @@ g2 <- ggplot(data = stats)+
531544
```
532545

533546

534-
535-
536547
# Session
537548

538549
```{r}

0 commit comments

Comments
 (0)