minor modifications, final

snaketron · snaketron · commit b6da936d40d7 · 2024-04-05T17:17:57.000+02:00
diff --git a/data/d_zibb_4.RData b/data/d_zibb_4.RData
diff --git a/inst/scripts/d_zibb_4.R b/inst/scripts/d_zibb_4.R
@@ -69,10 +69,10 @@ m <- rstan::stan_model(model_code = sim_stan)
 
 # generate data based on the following parameters parameters
 set.seed(1021)
-N_gene <- 10
+N_gene <- 8
 N_replicates <- 4
-N_condition <- 3
-N_individual_per_condition <- 7
+N_condition <- 2
+N_individual_per_condition <- 5
 N_individual <- N_individual_per_condition * N_condition
 N_sample <- N_individual * N_replicates
 condition_id <- rep(1:N_condition, each = N_individual_per_condition)
diff --git a/man/d_zibb_4.Rd b/man/d_zibb_4.Rd
@@ -7,7 +7,7 @@
 A small example dataset that has the following features:
 
   \itemize{
-    \item 3 conditions
+    \item 2 conditions
     \item 7 individuals per condition
     \item 4 replicates per individual
     \item 8 Ig genes
diff --git a/vignettes/User_Manual.Rmd b/vignettes/User_Manual.Rmd
@@ -18,7 +18,6 @@ knitr::opts_chunk$set(comment = FALSE,
 ```
 
 
-
 ```{r}
 require(IgGeneUsage)
 require(rstan)
@@ -30,6 +29,7 @@ require(reshape2)
 require(patchwork)
 ```
 
+
 # Introduction
 Decoding the properties of immune receptor repertoires (IRRs) is key to 
 understanding how our adaptive immune system responds to challenges, such 
@@ -92,11 +92,13 @@ Lets look into the simulated dataset `d_zibb_3`. This dataset was generated
 by a zero-inflated beta-binomial (ZIBB) model, and `r Biocpkg("IgGeneUsage")` 
 was designed to fit ZIBB-distributed data.
 
+
 ```{r}
 data("d_zibb_3", package = "IgGeneUsage")
 knitr::kable(head(d_zibb_3))
 ```
 
+
 We can also visualize `d_zibb_3` with `r CRANpkg("ggplot")`:
 
 ```{r, fig.width=6, fig.height=3.25}
@@ -128,6 +130,7 @@ adjust the inputs accordingly. If the warnings persist, please submit an
 issue with a reproducible script at the Bioconductor support site or on 
 Github[^3].
 
+
 ```{r}
 M <- DGU(ud = d_zibb_3, # input data
          mcmc_warmup = 300, # how many MCMC warm-ups per chain (default: 500)
@@ -151,6 +154,7 @@ In the output of DGU, we provide the following objects:
   * `fit`: rstan ('stanfit') object of the fitted model $\rightarrow$ used 
      for model checks (see section 'Model checking')
 
+
 ```{r}
 summary(M)
 ```
@@ -189,6 +193,7 @@ rstan::check_hmc_diagnostics(M$fit)
 rstan::stan_rhat(object = M$fit)|rstan::stan_ess(object = M$fit)
 ```
 
+
 ## PPC: posterior predictive checks
 ### PPCs: repertoire-specific
 The model used by `r Biocpkg("IgGeneUsage")` is generative, i.e. with the 
@@ -248,6 +253,7 @@ deviation (sd), L (low bound of 95% HDI), H (high bound of 95% HDI)
 kable(x = head(M$dgu), row.names = FALSE, digits = 2)
 ```
 
+
 ### DGU: differential gene usage
 We know that the values of `\gamma` and `\pi` are related to each other. 
 Lets visualize them for all genes (shown as a point). Names are shown for 
@@ -333,6 +339,7 @@ ggplot()+
   theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.4))
 ```
 
+
 ## GU: gene usage summary
 `r Biocpkg("IgGeneUsage")` also reports the inferred gene usage (GU) 
 probability of individual genes in each condition. For a given gene we 
@@ -352,6 +359,7 @@ ggplot(data = M$gu)+
   theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.4))
 ```
 
+
 # Leave-one-out (LOO) analysis
 To assert the robustness of the probability of DGU ($\pi$) and the effect 
 size ($\gamma$), `r Biocpkg("IgGeneUsage")` has a built-in procedure for 
@@ -365,6 +373,7 @@ by evaluating their variability for a specific gene.
 
 This analysis can be computationally demanding.
 
+
 ```{r}
 L <- LOO(ud = d_zibb_3, # input data
          mcmc_warmup = 500, # how many MCMC warm-ups per chain (default: 500)
@@ -404,6 +413,7 @@ ggplot(data = L_dgu)+
   ylab(expression(gamma))
 ```
 
+
 ## LOO-DGU: variability of $\pi$
 
 ```{r, fig.width=6, fig.height=5}
@@ -437,34 +447,35 @@ ggplot(data = L_gu)+
 ```
 
 
-
 # Case Study B: analyzing IRRs containing biological replicates
 
 ```{r}
 data("d_zibb_4", package = "IgGeneUsage")
 knitr::kable(head(d_zibb_4))
 ```
 
+
 We can also visualize `d_zibb_4` with `r CRANpkg("ggplot")`:
 
-```{r, fig.width=6, fig.height=3.25}
+```{r, fig.width=6.5, fig.height=3.25}
 ggplot(data = d_zibb_4)+
-  geom_point(aes(x = gene_name, y = gene_usage_count, col = condition),
-             position = position_dodge(width = .7), shape = 21)+
+  geom_point(aes(x = gene_name, y = gene_usage_count, col = condition, 
+                 shape = replicate), position = position_dodge(width = 0.8))+
   theme_bw(base_size = 11)+
   ylab(label = "Gene usage [count]")+
   xlab(label = '')+
   theme(legend.position = "top")+
   theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.4))
 ```
 
+
 ## Modeling
 
 ```{r}
 M <- DGU(ud = d_zibb_4, # input data
          mcmc_warmup = 500, # how many MCMC warm-ups per chain (default: 500)
          mcmc_steps = 1500, # how many MCMC steps per chain (default: 1,500)
-         mcmc_chains = 3, # how many MCMC chain to run (default: 4)
+         mcmc_chains = 2, # how many MCMC chain to run (default: 4)
          mcmc_cores = 1, # how many PC cores to use? (e.g. parallel chains)
          hdi_lvl = 0.95, # highest density interval level (de fault: 0.95)
          adapt_delta = 0.8, # MCMC target acceptance rate (default: 0.95)
@@ -487,11 +498,13 @@ ggplot(data = M$ppc$ppc_rep)+
   ylab(label = "PPC usage [counts]")
 ```
 
+
 ## Analysis of estimated effect sizes
 The top panel shows the average gene usage (GU) in different biological 
 conditions. The bottom panels shows the differential gene usage (DGU) 
 between pairs of biological conditions.
 
+
 ```{r, fig.weight = 7, fig.height = 4}
 g1 <- ggplot(data = M$gu)+
   geom_errorbar(aes(x = gene_name, y = prob_mean, ymin = prob_L,
@@ -531,8 +544,6 @@ g2 <- ggplot(data = stats)+
 ```
 
 
-
-
 # Session
 
 ```{r}