|
1 | | -# metaGOflow: A workflow for marine Genomic Observatories data analysis |
| 1 | +# metaGOflow: A workflow for marine Genomic Observatories' data analysis |
2 | 2 |
|
3 | | -## An EOSC-Life project |
| 3 | + |
4 | 4 |
|
5 | | -[](https://travis-ci.com/EBI-Metagenomics/pipeline-v5) |
6 | 5 |
|
7 | | -The workflows developed in the framework of this project are based on `pipeline-v5` of the MGnify resource. |
| 6 | +## An EOSC-Life project |
8 | 7 |
|
9 | | -> This branch is a child of the [`pipeline_5.1`](https://github.com/hariszaf/pipeline-v5/tree/pipeline_5.1) branch |
10 | | -that contains all CWL descriptions of the MGnify pipeline version 5.1. |
| 8 | +The workflows developed in the framework of this project are based on `pipeline-v5` of the MGnify resource. |
11 | 9 |
|
| 10 | +> This branch is a child of the [`pipeline_5.1`](https://github.com/hariszaf/pipeline-v5/tree/pipeline_5.1) branch |
| 11 | +> that contains all CWL descriptions of the MGnify pipeline version 5.1. |
12 | 12 |
|
13 | 13 | ## Dependencies |
14 | 14 |
|
15 | | -- python3 [v 3.7+] |
16 | | -- [Docker](https://www.docker.com) [v 19.+] or [Singularity](https://apptainer.org) |
17 | | -- [cwltool](https://github.com/common-workflow-language/cwltool) [v 3.+] |
| 15 | +To run metaGOflow you need to make sure you have the following set on your computing environmnet first: |
| 16 | + |
| 17 | +- python3 [v 3.8+] |
| 18 | +- [Docker](https://www.docker.com) [v 19.+] or [Singularity](https://apptainer.org) [v 3.7.+]/[Apptainer](https://apptainer.org) [v 1.+] |
| 19 | +- [cwltool](https://github.com/common-workflow-language/cwltool) [v 3.+] |
| 20 | +- [rdflib](https://rdflib.readthedocs.io/en/stable/) [v 6.+] |
| 21 | +- [rdflib-jsonld](https://pypi.org/project/rdflib-jsonld/) [v 0.6.2] |
| 22 | +- [ro-crate-py](https://github.com/ResearchObject/ro-crate-py) [v 0.7.0] |
| 23 | +- [pyyaml](https://pypi.org/project/PyYAML/) [v 6.0] |
| 24 | +- [Node.js](https://nodejs.org/) [v 10.24.0+] |
| 25 | +- Available storage ~235GB for databases |
18 | 26 |
|
19 | | -Depending on the analysis you are about to run, disk requirements vary. |
| 27 | +### Storage while running |
| 28 | + |
| 29 | +Depending on the analysis you are about to run, disk requirements vary. |
20 | 30 | Indicatively, you may have a look at the metaGOflow publication for computing resources used in various cases. |
21 | 31 |
|
| 32 | +## Installation |
22 | 33 |
|
23 | 34 | ### Get the EOSC-Life marine GOs workflow |
24 | 35 |
|
25 | 36 | ```bash |
26 | | -git clone https://github.com/emo-bon/pipeline-v5.git |
27 | | -cd pipeline-v5 |
| 37 | +git clone https://github.com/emo-bon/MetaGOflow |
| 38 | +cd MetaGOflow |
28 | 39 | ``` |
29 | 40 |
|
30 | | - |
31 | | -### Download necessary databases |
| 41 | +### Download necessary databases (~235GB) |
32 | 42 |
|
33 | 43 | You can download databases for the EOSC-Life GOs workflow by running the |
34 | 44 | `download_dbs.sh` script under the `Installation` folder. |
35 | 45 |
|
36 | | -If you have one or more already in your system, then create a symbolic link pointing |
37 | | -at the `ref-dbs` folder. |
| 46 | +```bash |
| 47 | +bash Installation/download_dbs.sh -f [Output Directory e.g. ref-dbs] |
| 48 | +``` |
| 49 | +If you have one or more already in your system, then create a symbolic link pointing |
| 50 | +at the `ref-dbs` folder or at one of its subfolders/files. |
| 51 | + |
| 52 | +The final structure of the DB directory should be like the following: |
38 | 53 |
|
| 54 | +````bash |
| 55 | +user@server:~/MetaGOflow: ls ref-dbs/ |
| 56 | +db_kofam/ diamond/ eggnog/ GO-slim/ interproscan-5.57-90.0/ kegg_pathways/ kofam_ko_desc.tsv Rfam/ silva_lsu/ silva_ssu/ |
| 57 | +```` |
39 | 58 |
|
40 | 59 | ## How to run |
41 | 60 |
|
| 61 | +### Ensure that `Node.js` is installed on your system before running metaGOflow |
| 62 | + |
| 63 | +If you have root access on your system, you can run the commands below to install it: |
| 64 | + |
| 65 | +##### DEBIAN/UBUNTU |
| 66 | +```bash |
| 67 | +sudo apt-get update -y |
| 68 | +sudo apt-get install -y nodejs |
| 69 | +``` |
| 70 | + |
| 71 | +##### RH/CentOS |
| 72 | +```bash |
| 73 | +sudo yum install rh-nodejs<stream version> (e.g. rh-nodejs10) |
| 74 | +``` |
42 | 75 |
|
43 | | -- Edit the `config.yml` file to set the parameter values of your choice. |
| 76 | +### Set up the environment |
44 | 77 |
|
45 | | -- Make a job file (e.g., SBATCH file) and |
| 78 | +#### Run once - Setup environment |
46 | 79 |
|
47 | | - - enable Singularity, e.g. `module load Singularity` |
| 80 | +- ```bash |
| 81 | + conda create -n EOSC-CWL python=3.8 |
| 82 | + ``` |
| 83 | + |
| 84 | +- ```bash |
| 85 | + conda activate EOSC-CWL |
| 86 | + ``` |
| 87 | + |
| 88 | +- ```bash |
| 89 | + pip install cwlref-runner cwltool[all] rdflib-jsonld rocrate pyyaml |
| 90 | + |
| 91 | + ``` |
| 92 | + |
| 93 | +#### Run every time |
| 94 | + |
| 95 | +```bash |
| 96 | +conda activate EOSC-CWL |
| 97 | +``` |
| 98 | + |
| 99 | +### Run the workflow |
| 100 | + |
| 101 | +- Edit the `config.yml` file to set the parameter values of your choice. For selecting all the steps, then set to `true` the variables in lines [2-6]. |
| 102 | + |
| 103 | +#### Using Singularity |
| 104 | + |
| 105 | +##### Standalone |
| 106 | +- run: |
| 107 | + ```bash |
| 108 | + ./run_wf.sh -s -n osd-short -d short-test-case -f test_input/wgs-paired-SRR1620013_1.fastq.gz -r test_input/wgs-paired-SRR1620013_2.fastq.gz |
| 109 | + `` |
| 110 | + |
| 111 | +##### Using a cluster with a queueing system (e.g. SLURM) |
| 112 | + |
| 113 | +- Create a job file (e.g., SBATCH file) |
| 114 | + |
| 115 | +- Enable Singularity, e.g. module load Singularity & all other dependencies |
| 116 | + |
| 117 | +- Add the run line to the job file |
| 118 | + |
| 119 | + |
| 120 | +#### Using Docker |
| 121 | + |
| 122 | +##### Standalone |
| 123 | +- run: |
| 124 | + ``` bash |
| 125 | + ./run_wf.sh -n osd-short -d short-test-case -f test_input/wgs-paired-SRR1620013_1.fastq.gz -r test_input/wgs-paired-SRR1620013_2.fastq.gz |
| 126 | + ``` |
| 127 | + HINT: If you are using Docker, you may need to run the above command without the `-s' flag. |
| 128 | +
|
| 129 | +## Testing samples |
| 130 | +The samples are available in the `test_input` folder. |
| 131 | +
|
| 132 | +We provide metaGOflow with partial samples from the Human Metagenome Project ([SRR1620013](https://www.ebi.ac.uk/ena/browser/view/SRR1620013) and [SRR1620014](https://www.ebi.ac.uk/ena/browser/view/SRR1620014)) |
| 133 | +They are partial as only a small part of their sequences have been kept, in terms for the pipeline to test in a fast way. |
48 | 134 |
|
49 | | - - run: |
50 | | - ``` |
51 | | - ./run_wf.sh -n false -n osd-short -d short-test-case -f test_input/wgs-paired-SRR1620013_1.fastq.gz -r test_input/wgs-paired-SRR1620013_2.fastq.gz |
52 | | - ``` |
53 | 135 |
|
54 | 136 | ## Hints and tips |
55 | 137 |
|
56 | 138 | 1. In case you are using Docker, it is strongly recommended to **avoid** installing it through `snap`. |
57 | 139 |
|
58 | | -2. `RuntimeError`: slurm currently does not support shared caching, because it does not support cleaning up a worker after the last job finishes. |
59 | | -Set the `--disableCaching` flag if you want to use this batch system. |
| 140 | +2. `RuntimeError`: slurm currently does not support shared caching, because it does not support cleaning up a worker |
| 141 | + after the last job finishes. |
| 142 | + Set the `--disableCaching` flag if you want to use this batch system. |
| 143 | +
|
| 144 | +3. In case you are having errors like: |
60 | 145 |
|
61 | | -3. In case you are having errors like: |
62 | 146 | ``` |
63 | | -wltool.errors.WorkflowException: Singularity is not available for this tool |
| 147 | +cwltool.errors.WorkflowException: Singularity is not available for this tool |
64 | 148 | ``` |
| 149 | +
|
65 | 150 | You may run the following command: |
| 151 | +
|
66 | 152 | ``` |
67 | 153 | singularity pull --force --name debian:stable-slim.sif docker://debian:stable-sli |
68 | 154 | ``` |
69 | 155 |
|
70 | | -
|
71 | 156 | ## Contribution |
72 | 157 |
|
73 | | -To make contribution to the project a bit easier, all the MGnify `conditionals` and `subworkflows` under the `workflows/` directory that are not used in the metaGOflow framework, have been removed. |
74 | | -However, all the MGnify `tools/` and `utils/` are available in this repo, even if they are not invoked in the current version of metaGOflow. |
75 | | -This way, we hope we encourage people to implement their own `conditionals` and/or `subworkflows` by exploiting the currently supported `tools` and `utils` as well as by developing new `tools` and/or `utils`. |
76 | | -
|
| 158 | +To make contribution to the project a bit easier, all the MGnify `conditionals` and `subworkflows` under |
| 159 | +the `workflows/` directory that are not used in the metaGOflow framework, have been removed. |
| 160 | +However, all the MGnify `tools/` and `utils/` are available in this repo, even if they are not invoked in the current |
| 161 | +version of metaGOflow. |
| 162 | +This way, we hope we encourage people to implement their own `conditionals` and/or `subworkflows` by exploiting the |
| 163 | +currently supported `tools` and `utils` as well as by developing new `tools` and/or `utils`. |
77 | 164 |
|
78 | 165 |
|
79 | 166 | <!-- cwltool --print-dot my-wf.cwl | dot -Tsvg > my-wf.svg --> |
0 commit comments