-
-
Notifications
You must be signed in to change notification settings - Fork 19
Alterations to build system
This page describes the changes to the build system (chiefly the Makefile) to handle Bioconductor packages, to manage R package dependencies, and to reduce build time.
First, I will briefly describe the default build system for the generation of carpentries sites that this repo is based on. It's worth noting that a new system is being designed as of writing (June 2022), but it is uncertain when this will be released. I will then describe the changes to this system that I made to make these
The aims of the changes I made to the build system were as follows:
- To support Bioconductor packages
- To produce a tab-delimited set of dependencies for the management of workshop computational environments (eg, Rstudio)
- To reduce the runtime when building the site locally (eg, not re-installing packages every time)
- To ensure that all relevant resources (including data and figures) are rebuilt when their source code is updated
The basic rule for the standard build system is make site, which rebuilds the markdown pages that are rendered by jekyll on github pages.
For RMarkdown pages, this involves running rmarkdown::render on each Rmd file in _episodes_rmd and rendering to _episodes.
Before building an rmarkdown page, the Makefile runs the install-rmd-deps on that Rmd file, as follows:
## * install-rmd-deps : Install R packages dependencies to build the RMarkdown lesson
install-rmd-deps:
@${SHELL} bin/install_r_deps.sh
which is just:
Rscript -e "source(file.path('bin', 'dependencies.R')); install_required_packages(); install_dependencies(identify_dependencies())"which install the required packages c("rprojroot", "desc", "remotes", "renv") (needed to identify and install dependencies), and
then runs identify_dependencies, which parses the Rmd file to identify all library (etc) calls in the code chunks that are run. ie, if the chunk option eval=FALSE it won't count as a dependency. It uses renv::dependencies() to do that, which means it cannot parse library() calls in in callout or exercise blocks (due to the leading > ). It also installs any dependencies of the bin` directory (as you might expect, we need the deps of the helper functions).
We run install_dependencies on this list, which internally dumps the list of dependencies into a mock DESCRIPTION file, and runs remotes::install_deps(). install_deps() then thinks we're in an R package directory and tries to install all the dependencies we've listed in the mock DESCRIPTION.
In an ideal world this means we've got all our dependencies installed, which means we can now render the Rmd file:
@mkdir -p _episodes
@bin/knit_lessons.sh $< $@
Now, how did I change this? Let's recall the motivations:
- To support Bioconductor packages
- To produce a tab-delimited set of dependencies for the management of workshop computational environments (eg, Rstudio)
- To reduce the runtime when building the site locally (eg, not re-installing packages every time)
- To ensure that all relevant resources (including data and figures) are rebuilt when their source code is updated
Why are these problems?
-
remotes::install_deps()can't tell when we've specified Bioconductor packages, so unlessoptions("repos")is set, we won't be able to install any bioc packages - We dump the dependencies into a mock
DESCRIPTIONfile, but this isn't a full list! It's only the deps for the package being listed, and doesn't have recursive dependencies. Plus, the mockDESCRIPTIONis a build artifact that's removed after the deps are installed. - Every time we try to build an Rmd, we try to rebuild the dependencies. That means a lot of wasted time when maybe all the packages are already installed.
-
make siteassumes that whatever data/figures we use outside of those directly made by the Rmd are fixed, rather than maybe being generated by other scripts in the repo (likedata-rawin an R package).
So how did we solve these?
- Change
remotes::install_depsto useBiocManager::install(). This means we can't dump everything into aDESCRIPTIONany more. - We add a step to create a plain text list of dependencies,
dependencies.csv:This rule is similar to the one I described earlier, but here we dump the list of dependencies identified usingdependencies.csv: _episodes_rmd/*.Rmd @${SHELL} bin/list_r_deps.shidentify_dependenciesintodependencies.csv. - We also run
renv::dependencieson thefigdirectory, in case we have any R scripts there that also need dependencies installed. - We add rules to create any figures from R scripts, eg:
We also need to ensure that all these figures are prerequisites of the
fig/pendulum.gif: fig/pca-animation.R Rscript $< fig/kmeans.gif: fig/kmeans.R Rscript $<siterule.
We also added rules for generating data from similarly-named R scripts:We also need to ensure that we re-generate all the data before rendering the Rmds, so we define a list of all the datasets asdata/%.rds: data/%.R Rscript $<DATA_DSTand ensure that this is listed as a prerequisite of thesiterule (similar to figures).
That summarises most of the changes made to the build system. If anything is unclear please get in touch with me (Alan) or open an issue on this repo.
There's also some functionality for building slides automatically from the lesson material which I have not covered here as, as far as I know, it's not currently used.