Challenges with reproducing analyses

Data are not available
Data are in a format that is not readable (Lotus123 spreadsheet?)
Code is not available to run analysis (only the method section is available)
Correct software is not available to read data or run analysis (i.e, |> requires R version 4.1 or higher)
R packages used in analysis have changed and are not backward compatible (how long will dplyr support spread and gather?)
Computer architecture doesn't exist anymore
The world as we know it doesn't exist anymore

The reproducibility highway

The following are steps on the reproducibility highway. You do not need to drive the full length of the highway for all projects (in fact #7 above suggests that the highway never ends - how do we prepare for the end of the world?). It is important to figure out which highway exit is suitable for your project.

Exit 1: Reproductiblity from GitHub

Add your code to GitHub and have others clone and run your code. Since you don't know the computational environment that other are using you have to be careful about R versioning and package dependencies. For straightforward analyses that use common R packages that maintain backward compatibility, then putting your code on GitHub is a solid foundation for reproducibility.

It is best if you make your GitHub repo an R project so that Rstudio understand relative pathing better.

You would then archive your GitHub repo on Zenodo (https://zenodo.org) so that it persists. Here is information about linking your GitHub repo to Zenodo and then creating a release of your code that gets automatically upload to Zenodo. You then got into Zenodo to update the metadata. See more here: https://coderefinery.github.io/github-without-command-line/doi/

Exit 2: Reproducibility from GitHub + renv

Renv is a package manager for R (https://rstudio.github.io/renv/articles/renv.html) that controls for the version of R packages. I have yet to have a project that didn't involve me throwing a can, bottle, or other nearby object at my computer screen because an renv.lock file gets cross wired. Renv is awesome in principle but not in practice.

Exit 3: Reproducibility using Rocker + GitHub

Starting from a Docker container is the closest thing you can do to starting from a fresh computer that isn't filled with all the packages and data sets that you already have on your computer. A fresh computer forces you to be explicit about package installs, etc. and will make it easier for others to successfully reproduce your analysis.

A Rocker is a Docker container with R (potentially also with Rstudio and other R packages). The one with Rstudio makes it easy to use a Docker container in a familiar interface.

Download and install Docker Desktop on your computer. The instructions are at https://www.docker.com/.
Launch Docker by starting the Docker application.
Find the command line for your computer (terminal in Mac or command prompt in Windows)
In the command line, try to run docker run --rm -ti -e PASSWORD=yourpassword -p 8787:8787 rocker/rstudio that is documented here: https://rocker-project.org/
Point your browser to localhost:8787. Log in with user = rstudio, passwoard = yourpassword.
Start a new project in your rocker container (Exits 3 & 4) from your forked GitHub repo. (new project -> Version Control -> Git)
Test your code

Exit 4: Reproductiblity using Docker + GitHub

In the above example, you start from the rstudio rocker and then install packages on top of it. To be more explicit about the users environment, you can provide your own docker container. You would do this if you have a particular Rocker version that you want to start with or you are worried about packages changing.

Update the Dockerfile in this repo to match your repo
Create user account on hub.docker.com
At the terminal navigate to your github repo on your computer. Then run the following code to create, tag, and push your Docker container

docker build https://github.com/rqthomas/reproducibility-demo.git#main -t thomas_demo#main -t thomas_demo
docker image tag thomas_demo rqthomas/thomas_demo:latest
docker image push rqthomas/thomas_demo:latest

Someone performing a code review or reproducing your analysis will follow the steps in Exit 3, except for changing the docker "image" that is used from "rocker/rstudio" to the custom image ("rqthomas/thomas_demo" in the example)

docker run --rm -ti -e PASSWORD=yourpassword -p 8787:8787 thomas_demo:latest

Point your browser to localhost:8787. Log in with user = rstudio, passwoard = yourpassword.
Start a new project in your rocker container (Exits 3 & 4) from your forked GitHub repo. (new project -> Version Control -> Git)
Test your code

Performing a code review

On GitHub.com, create a fork of the repo that you will review into your GitHub organization. For this example the GitHub is this one: https://github.com/rqthomas/reproducibility-demo. Find the fork button, click it, and then select your GitHub organization.
Start a new project in the on your computer (Exit 1) or rocker container (Exits 3 & 4) from your forked GitHub repo. (new project -> Version Control -> Git)
The following checklist is a list of things to review in the code:

Does the GitHub repo have a Readme?
Does the Readme tell you how to run the analysis?
Can you run the analysis successfully out-of-the-box?
Does the repo have an install.R script that includes all the necessary packages beyond the recommended Rocker container?
Is there a clear step for downloading data that works?
Are custom function in an R subdirectory?
Is there extra code that isn't used in the analysis
Is there a ton of commented-out code in the analysis script that others will be looking at? That should be cleaned up!
Does the Readme list the authors and manuscript title? Does it have a license? (GNU General Public License v2.0 is one that may be appropriate, see: https://www.cmu.edu/cttec/forms/opensourcelicensegridv1.pdf)
Is there repeated code that should be in functions?
Are there single-use functions in R subdirectory that would be better in the main analysis script for improving read ability?

If you find any issues that you can fix, fix them in the code!
Commit and Push the updates to your fork. If you are using a Docker container you will need to set up your GitHub credentials on the container following: https://github.com/frec-3044/git-rmd-intro-template/blob/main/assignment/instructions.md
Go to your fork on GitHub.com and select "Contribute". Open a PR. In the discussion of the PR, describe the key fixes that you to addressed.

Guidelines when requesting a code review from a collaborator

Consider that requesting a collaborator to review your code is similar to asking that person to review a manuscript. As a result, similar to when you are drafting a manuscript, it's best practice to make sure your code is well-organized, concise, and generally interpretable by someone else before you ask for a review, as a professional courtesy. Below are provided a few guidelines to consider when requesting a code review from a collaborator.

Clarify your review objectives when you request a review. Is your primary objective merely to ensure that the code runs with no errors? Or are there sections of it that you would like the reviewer to screen more carefully, such as a particular analysis that you want to make sure is free of mistakes? Again, similar to when you request a manuscript review, giving the reviewer some guidance about the type of feedback you are seeking can both make the reviewer's life simpler and help you accomplish your goals.
Ensure your code repository is well-organized. Is it clear to the reviewer where to begin, and which scripts need to be run in what order? Is there an informative README.md? Are scripts well-annotated so the reviewer can quickly navigate the code and follow along?
Omit needless code. Similar to how you generate an initial draft of text and then revisit and revise it to improve the wording, consider that revising your code to make sure it is concise and efficient can make a reviewer's life much easier. I doubt any of us want to read 1000 lines of code if the same tasks can be executed in 100 lines! While we are ecologists first, not programmers, a little code etiquette can really help a reviewer (and you later on!). If you have code that isn't currently needed but that you aren't ready to part with, consider storing it in an archive folder that can be ignored by the reviewer.
Consider using functions for repeated tasks. The benefits to this approach are many! To name a few: 1) once you get beyond a few hundred lines of code in a script, writing functions makes it easier to navigate your code, as you can "follow the trail" of functions back to the line of code you are looking for rather than scrolling through thousands of lines of code; 2) following on (1), it makes code easier to troubleshoot, as you can pin down which function is misbehaving; 3) it helps avoid errors due to repeated copying and pasting of code. If you find that you are repeatedly copying and pasting very similar code with just a few modifications (e.g., only changing the year of the dataset, the study site, the study organism, the value of the model parameter, etc.), think about writing a function that takes the thing(s) you need to change (year, site, etc.) as arguments.
Scan your code for 'readability'. Just as a page of densely formatted text in 10 pt font with no paragraph breaks can be a little intimidating to tackle, code that is written with little spacing or extremely long lines running off the page can be difficult to read. Have a heart for your reviewer's eyes (and your own!).

Additional resources (please add to this list!):

https://www.browserstack.com/guide/coding-standards-best-practices https://www.michaelagreiler.com/code-review-best-practices/

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
download-data.R		download-data.R
install.R		install.R
manuscript-analysis.Rmd		manuscript-analysis.Rmd
reproducibility-demo.Rproj		reproducibility-demo.Rproj
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Challenges with reproducing analyses

The reproducibility highway

Exit 1: Reproductiblity from GitHub

Exit 2: Reproducibility from GitHub + renv

Exit 3: Reproducibility using Rocker + GitHub

Exit 4: Reproductiblity using Docker + GitHub

Performing a code review

Guidelines when requesting a code review from a collaborator

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

rqthomas/reproducibility-demo

Folders and files

Latest commit

History

Repository files navigation

Challenges with reproducing analyses

The reproducibility highway

Exit 1: Reproductiblity from GitHub

Exit 2: Reproducibility from GitHub + renv

Exit 3: Reproducibility using Rocker + GitHub

Exit 4: Reproductiblity using Docker + GitHub

Performing a code review

Guidelines when requesting a code review from a collaborator

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages