This repository contains the code for running a high-performance computing (HPC) pipeline that predicts the probability of mosquito bites using the BRSM model.
The pipeline is orchestrated using Snakemake which automates tasks related to data preparation, model training, and prediction.
The model needs data from the following data sources:
- Mosquito Bite Reports from Mosquito Alert (variable used to train and to be predicted)
- Sampling effort data - from Mosquito Alert.
- ERA5-Land - historical climate reanalysis data.
- CORINE Landcover – land use/cover information for Europe.
Please download the CORINE Land Cover dataset manually and place it in the following path:
data/corine_landcover/U2018_CLC2018_V2020_20u1.tif
Provide a GeoPackage file containing polygons, with:
- Only one layer
- Two columns: code (unique ID) and name
IMPORTANT: Update config/config.yaml to reflect the path to this file.
To access ERA5-Land data, you'll need an API key from the Copernicus Climate Data Store. Follow instruction at: https://cds.climate.copernicus.eu/how-to-api
We recommend using Mamba (a faster drop-in replacement for Conda). If you don't have Conda or Mamba installed, consider installing Miniforge.
Install Snakemake, Snakedeploy, and necessary plugins:
mamba create -c conda-forge -c bioconda --name snakemake snakemake=9.8.1 snakedeploy=0.11.0If you're running on an HPC with SLURM, install additional plugins:
mamba install -n snakemake -c bioconda snakemake-executor-plugin-slurm=1.5.0 snakemake-storage-plugin-fs=1.1.2Activate the environment:
conda activate snakemakeCreate and move into a project directory:
mkdir -p path/to/project-workdir
cd path/to/project-workdirDeploy the workflow using Snakedeploy:
snakedeploy deploy-workflow <URL_TO_THIS_REPO> . --tag <DESIRED_TAG>This will create two directories:
workflow/: contains the deployed Snakemake moduleconfig/: contains configuration files
Edit config/config.yaml to specify your settings (paths, parameters, etc.) according to your data and environment
snakemake --cores all --sdm condaUse the provided SLURM profile:
snakemake --cores all --sdm conda --profile slurmFor advanced features such as cluster execution, cloud deployments, and workflow customization, see the Snakemake documentation.