Skip to content

1. Home

Maria Koromina edited this page Jul 15, 2025 · 17 revisions

πŸš€ Welcome to the SAFFARI wiki!

Statistical And Functional Fine-mApping of GWAS Risk locI: SAFFARI

  • A comprehensive statistical and functional fine-mapping pipeline incorporating 4 methods (SuSiE, FINEMAP, Polyfun+SuSiE, Polyfun+FINEMAP), two reference panels (LD panel in plink format, UKB) and different ranges of fine-mapping windows.

  • This workflow takes as an input GWAS summary statistics, a top loci file and LD reference estimates and outputs files for each method/reference panel/windows. Here we provide two option for LD reference panels: either the UKB precomputed LD matrices (from 337K unrelated British-ancestry individuals from the UK Biobank) or the Haplotype Reference Consortium (HRC) LD reference panel.

  • We strongly recommend using the main branch. The other two branches are applicable for specific research groups further customizing the SAFFARI pipeline in their workflows.

The pipeline is comprised of the following 2 Snakemake modules:

  • selecting the correct UKB LD matrix for each locus to be fine-mapped, while formatting accordingly the top loci file(fetch_UKB_LD_names),
  • running the fine-mapping pipeline using GWAS summary statistics and LD reference panels (fine-mapping_multiple or fine-mapping_HRC_multiple) depending on the use of the UKB or HRC LD reference panel respectively.

Current Snakemake version used: 7.6.2.

  • To download the latest SAFFARI version, simply do: git clone https://github.com/mkoromina/SAFFARI

πŸ’‘ Introduction to Snakemake

You can find full documentation of Snakemake here. In this section, I’ll provide a brief overview and highlight some particularly useful commands. Snakemake is a pipeline tool based on Python. It consists of a series of rules, each acting as a set of instructions that guide Snakemake in generating specific outputs from given inputs. When a user requests an output, Snakemake executes all necessary rules to produce that output.

--use-conda

This command tells Snakemake to create and use the conda environment specified for each rule. This is a handy and reproducible way of installing and running code in a tightly controlled software environment. This command should always be used when running SAFFARI.

-np

This command performs a dry run, where Snakemake prints out all the jobs it would run, without actually running them. This is particularly useful if you want to see what would happen if you were to specify a certain output or rule. This helps avoid accidentally triggering 100s of unwanted jobs.

--cores

This command specifies the number of cores requested to run the SAFFARI pipeline. You can increase the number of requested cores when running more computationally intensive procedures.

--configfile

This parameter can be used to specify the .yaml file you want Snakemake to use as the configuration file. Snakemake reads the default config.yaml file located in the pipeline directory to obtain its default parameters. This file is described below in detail (see here).

You can run the pipeline like this: snakemake --profile slurm --configfile config.yaml --use-conda.


πŸ“ Directory Structure

workflow/
  └── finemapping_multiple        # Main Snakefile and rules
resources/
  β”œβ”€β”€ UKBB_LD/                    # LD reference panels (bgen format)
  β”œβ”€β”€ UKBB_priors/                # LD scores & SNP weights
  β”œβ”€β”€ {trait}_loci_ranges.tsv     # TSV of fine-mapping loci per trait
output/
  └── {trait}/
      β”œβ”€β”€ priors/
      β”œβ”€β”€ polyfun_susie_UKB_finemap/
      β”œβ”€β”€ polyfun_finemap_UKB_finemap/
      β”œβ”€β”€ only_susie_UKB_finemap/
      └── only_finemap_UKB_finemap/
envs/
  └── polyfun.yml                 # Conda environment file

πŸ“ Important notes

  • You will need to activate the snakemake conda/mamba environment prior to the pipeline execution.

  • Make sure to also follow the directory and file structures as found in the Github page. Main directories within SAFFARI: "workflow", "resources", "polyfun". Then "scripts" and "envs" are subdirectories within "workflow". Please check the READMEs in these directories too.

  • To run the Snakemake pipeline, I strongly recommend setting up a cluster profile. Fully detailed instructions for configurating profiles to run Snakemake jobs can be found here. For example, you can set up a slurm profile or a lsf profile, which will allow parallelization of jobs submission and execution. In that case, a simple job submission would look like this: snakemake --profile slurm --configfile config.yaml --use-conda

  • --configfile config.yaml: make sure to list the correct top loci file per each Snakefile used. For fetch_UKB_names_LD_multiple Snakefile, you need the config_alt.yaml, and for finemapping_multiple or finemapping_HRC_multiple, we need the files with the config.yaml.

!! Please note that the option --cluster-config is deprecated in the latest Snakemake versions.

  • Options for fine-mapping windows: (i) range.right ranges, representing the GWS locus windows, or, (ii) beginning and end, representing a 3Mb window (optional)

Both are outputted as part of the fetch_UKB_LD_names Snakemake module. Both of these can also be standalone arguments in the --start and --end flags of the fine-mapping rules within the Snakefile.


πŸ™ Credits - Acknowledgements

This work would not have been feasible without the contribution and wonderful work of other researchers:

  • Omer Weissbrod,
  • Jonathan Coleman,
  • Alice Braun,
  • Ashvin Ravi,
  • Brian Fulton-Howard,
  • Brian Schilder.

Issues

Shall any issues occur when running the pipeline, please feel free to open an issue on Github by providing a mini reproducible example. Contributions are also more than welcome!


Clone this wiki locally