This repository hosts the benchmarking framework, built upon Snakemake, for DelSIEVE.
This framework contains scripts written in Python 3 and R. Apart from an environment containing snakemake (we recommend using Conda), the following packages should be installed before running the pipeline:
- For Python3:
- paramiko >= 2.8.0
- For R:
- base >= 4.0
- stringr
- scales
- dplyr
- optparse
- ape
- phangorn
- tools
A few files should be configured before running.
-
The pipeline is mainly configured in
config.yaml. There are some entries to be filled by users, which are marked by a phrase# TO BE SET, followed either by[MANDATORY](must be set) or by[OPTIONAL](could be ignored). Users can search for the phrase to set everything up efficiently.-
The template configuration files for SIEVE are under
templates/. -
For the key
[configFiles][DelSIEVE_simulator], a configuration file for the data simulator DelSIEVE_simulator should be specified. Those simulated scenarios used in the SIEVE paper are listed insimulation_configs/. For details, check the paper.
-
-
In
run/run.sh, users can set a few things, e.g., the name of the conda environment containing snakemake (by default,snakemake), the number of cores to use and their ranges, etc.
Since SiFit requires a large amount of memory even working on a small dataset, the framework supports running SiFit alone on another server (referred to as the remote server) with the help of git. For this to work, a few things should be noted and configured:
- The machine you plan to run the pipeline (referred to as
the local machine) andthe remote servershould meet one of the following conditions:- They are in the same local network.
- If they are not in the same local network,
the local machinemust have a public IP address forthe remote serverto clone the git repo. However,the remote servercan behind a gate server with a public IP address, specified by the key[servers][jumpServerOfRemote]inconfig.yaml.
- In
Snakefile, comment outinclude: "sifit_local.snake", and uncomment# include: "sifit_remote.snake". - Set up
run/run_remote_sifit_true_monovar.shsimilarly torun/run.sh.
With everything set up, users can run the pipeline under the root directory of this repo simply with
$ source run/run.sh Snakefileor manually with
$ conda activate snakemake
$ snakemake --use-conda --cores {NUM} -kpThis project received support from
- the Polish National Science Centre SONATA BIS grant No. 2020/38/E/NZ2/00305,
- European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 766030, as well as
- European Research Council (ERC-617457-PHYLOCANCER), the Spanish Ministry of Science and Innovation (PID2019-106247GB-I00), and Xunta de Galicia.