update README

grst · grst · commit d3f0823dfd01 · 2022-10-27T14:09:51.000+02:00
diff --git a/README.md b/README.md
@@ -1,92 +1,99 @@
-# schneeberger-liver-transfusion
+# Immune cell dynamics deconvoluted by single-cell RNA sequencing in normothermic machine perfusion of the liver
 
-Analysis of scRNA-seq data for the Liver transfusion project by Stefan Schneeberger
+This repository contains a [nextflow](https://github.com/nextflow-io/nextflow/) workflow to reproduce the single-cell analysis of
 
-## Getting started
+> Hautz, Salcher, Fodor et al. (2022), Immune cell dynamics deconvoluted by single-cell RNA sequencing in normothermic machine perfusion of the liver. 
 
-To make it easy for you to get started with GitLab, here's a list of recommended next steps.
+Raw sequencing data is availble from GEO ([GSE216584](www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE216584)). 
+The preprocessed data and singularity containers required to run this workflow are available [from zenodo](https://doi.org/10.5281/zenodo.7249006). On zenodo, also the results (i.e. executed jupyter notebooks, plots, etc.) generated by this workflow can be downloaded. 
 
-Already a pro? Just edit this README.md and make it your own. Want to make it easy? [Use the template at the bottom](#editing-this-readme)!
 
-## Add your files
+## Launching the workflows
 
-- [ ] [Create](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#create-a-file) or [upload](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#upload-a-file) files
-- [ ] [Add files using the command line](https://docs.gitlab.com/ee/gitlab-basics/add-file.html#add-a-file-using-the-command-line) or push an existing Git repository with the following command:
+### 1. Prerequisites
 
-```
-cd existing_repo
-git remote add origin https://gitlab.i-med.ac.at/icbi-lab/data-analyses/schneeberger-liver-transfusion.git
-git branch -M main
-git push -uf origin main
-```
-
-## Integrate with your tools
+* [Nextflow](https://www.nextflow.io/index.html#GetStarted), version 22.04.5 or higher
+* [Singularity/Apptainer](https://apptainer.org/), version 3.7 or higher (tested with 3.7.0-1.el7)
 
-- [ ] [Set up project integrations](https://gitlab.i-med.ac.at/icbi-lab/data-analyses/schneeberger-liver-transfusion/-/settings/integrations)
+### 2. Obtain data
 
-## Collaborate with your team
+Before launching the workflow, you need to obtain input data and singularity containers from zenodo.
+First of all, clone this repository:
 
-- [ ] [Invite team members and collaborators](https://docs.gitlab.com/ee/user/project/members/)
-- [ ] [Create a new merge request](https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html)
-- [ ] [Automatically close issues from merge requests](https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#closing-issues-automatically)
-- [ ] [Enable merge request approvals](https://docs.gitlab.com/ee/user/project/merge_requests/approvals/)
-- [ ] [Automatically merge when pipeline succeeds](https://docs.gitlab.com/ee/user/project/merge_requests/merge_when_pipeline_succeeds.html)
+```bash
+git clone https://github.com/icbi-lab/nmp-liver.git
+cd nmp-liver
+ ```
 
-## Test and Deploy
+Then, within the repository, download the data archives and extract then to the corresponding directories:
 
-Use the built-in continuous integration in GitLab.
+```bash
+ # singularity containers
+wget "https://zenodo.org/record/7249006/files/containers.zip?download=1" 
 
-- [ ] [Get started with GitLab CI/CD](https://docs.gitlab.com/ee/ci/quick_start/index.html)
-- [ ] [Analyze your code for known vulnerabilities with Static Application Security Testing(SAST)](https://docs.gitlab.com/ee/user/application_security/sast/)
-- [ ] [Deploy to Kubernetes, Amazon EC2, or Amazon ECS using Auto Deploy](https://docs.gitlab.com/ee/topics/autodevops/requirements.html)
-- [ ] [Use pull-based deployments for improved Kubernetes management](https://docs.gitlab.com/ee/user/clusters/agent/)
-- [ ] [Set up protected environments](https://docs.gitlab.com/ee/ci/environments/protected_environments.html)
+# input data
+wget "https://zenodo.org/record/7249006/files/input_data.zip?download=1" 
 
-***
+unzip containers.zip
+unzip input_data.zip
+```
 
-# Editing this README
+### 3. Configure nextflow
 
-When you're ready to make this README your own, just edit this file and use the handy template below (or feel free to structure it however you want - this is just a starting point!).  Thank you to [makeareadme.com](https://www.makeareadme.com/) for this template.
+Depending on your HPC/cloud setup you will need to adjust the nextflow profile in `nextflow.config`, to tell
+nextflow how to submit the jobs. Using a `withName:...` directive, special
+resources may be assigned to GPU-jobs. You can get an idea by checking out the `icbi_liver` profile - which we used to run the
+workflow on our on-premise cluster. 
 
-## Suggestions for a good README
-Every project is different, so consider which of these sections apply to yours. The sections used in the template are suggestions for most open source projects. Also keep in mind that while a README can be too long and detailed, too long is better than too short. If you think your README is too long, consider utilizing another form of documentation rather than cutting out information.
+### 4. Launch the workflows
 
-## Name
-Choose a self-explaining name for your project.
+```bash
+nextflow run main.nf -resume -profile <YOUR_PROFILE> \
+    --outdir "./data/results"
+```
 
-## Description
-Let people know what your project can do specifically. Provide context and add a link to any reference visitors might be unfamiliar with. A list of Features or a Background subsection can also be added here. If there are alternatives to your project, this is a good place to list differentiating factors.
+## Structure of this repository
 
-## Badges
-On some READMEs, you may see small images that convey metadata, such as whether or not all the tests are passing for the project. You can use Shields to add some to your README. Many services also have instructions for adding a badge.
+* `analyses`: Place for e.g. jupyter/rmarkdown notebooks, gropued by their respective (sub-)workflows.
+* `bin`: executable scripts called by the workflow
+* `conf`: nextflow configuration files for all processes
+* `containers`: place for singularity image files. Not part of the git repo and gets created by the download command.
+* `data`: place for input data and results in different subfolders. Gets populated by the download commands and by running the workflows.
+* `lib`: custom libraries and helper functions
+* `modules`: nextflow DSL2.0 modules
+* `tables`: contains static content that should be under version control (e.g. manually created tables)
 
-## Visuals
-Depending on what you are making, it can be a good idea to include screenshots or even a video (you'll frequently see GIFs rather than actual videos). Tools like ttygif can help, but check out Asciinema for a more sophisticated method.
 
-## Installation
-Within a particular ecosystem, there may be a common way of installing things, such as using Yarn, NuGet, or Homebrew. However, consider the possibility that whoever is reading your README is a novice and would like more guidance. Listing specific steps helps remove ambiguity and gets people to using your project as quickly as possible. If it only runs in a specific context like a particular programming language version or operating system or has dependencies that have to be installed manually, also add a Requirements subsection.
+## Workflow description
 
-## Usage
-Use examples liberally, and show the expected output if you can. It's helpful to have inline the smallest example of usage that you can demonstrate, while providing links to more sophisticated examples if they are too long to reasonably include in the README.
+The analysis workflow comprises the followin steps: 
+ * QC of the unfiltered input data
+ * Cell-type annotation
+ * Pseudobulk generation and DE analysis with DESeq2
+ * Subcluster analysis of Macrophages/Monocytes and Neutrophils
+ * Comparison of timepoints T0 vs T1
 
-## Support
-Tell people where they can go to for help. It can be any combination of an issue tracker, a chat room, an email address, etc.
+## Contact
 
-## Roadmap
-If you have ideas for releases in the future, it is a good idea to list them in the README.
+For reproducibility issues or any other requests regarding single-cell data analysis, please use the [issue tracker](https://github.com/nmp-liver/issues). For anything else, you can reach out to the corresponding author(s) as indicated in the manuscript.
 
-## Contributing
-State if you are open to contributions and what your requirements are for accepting them.
+## Notes on reproducibility
 
-For people who want to make changes to your project, it's helpful to have some documentation on how to get started. Perhaps there is a script that they should run or some environment variables that they need to set. Make these steps explicit. These instructions could also be useful to your future self.
+We aimed at making this workflow reproducible by providing all input data, containerizing all software
+dependencies and integrating all analysis steps into a nextflow workflow.
+In theory, this allows to execute the workflow on any system that can run nextflow and singularity.
+Unfortunately, some single cell analysis algorithms (in particular scVI and UMAP) will yield
+slightly different results on different hardware, trading off computational reproducibility for a
+significantly faster runtime. In particular, results will differ when changing the number of cores, or
+when running on a CPU/GPU of a different architecture. See also https://github.com/scverse/scanpy/issues/2014 for a discussion.
 
-You can also document commands to lint the code or run tests. These steps help to ensure high code quality and reduce the likelihood that the changes inadvertently break something. Having instructions for running tests is especially helpful if it requires external setup, such as starting a Selenium server for testing in a browser.
+Since the cell-type annotation depends on clustering, and the clustering depends on the neighborhood graph,
+which again depends on the scVI embedding, running the workflow on a different machine will likely break the cell-type labels.
 
-## Authors and acknowledgment
-Show your appreciation to those who have contributed to the project.
+Below is the hardware we used to execute the workflow. Theoretically,
+any CPU/CPU of the same generation shoud produce identical results, but we did not have the chance to test this yet.
 
-## License
-For open source projects, say how it is licensed.
+ * Compute node CPU: `Intel(R) Xeon(R) CPU E5-2699A v4 @ 2.40GHz` (2x)
+ * GPU node CPU: `EPYC 7352 24-Core` (2x)
+ * GPU node GPU: `Nvidia Quadro RTX 8000 GPU`
 
-## Project status
-If you have run out of energy or time for your project, put a note at the top of the README saying that development has slowed down or stopped completely. Someone may choose to fork your project or volunteer to step in as a maintainer or owner, allowing your project to keep going. You can also make an explicit request for maintainers.