-
Notifications
You must be signed in to change notification settings - Fork 2
Configuration_files
Two configuration files are required for this pipeline:
- `config.yaml` contains command line arguments, reference paths, and system options. This is a [yaml](https://en.wikipedia.org/wiki/YAML) file.
- `design.tsv` contains sample's identifiers and paths.
We suggest that you use provided script to build configuration files, and then modify them if needed. Most of the time, this scripts will be enough for you. Just look at:
- `prepare_pipeline.py`
However, if you want to, you can build them manually: every single part of these files are described below.
The script prepare_pipeline.py
is your friend during the fastidious step of pipeline customization: it builds both config file and design file. By default, this script will not overwrite any existing files.
Your can test the prepare_pipeline.py
by running make all-unit-test
. See the section of this documentation that is related to "Testing" for more information.
You may have all possible arguments of the script prepare_pipeline.py
with its argument --help
:
# Activate conda environment
conda activate vcf-annotate-snpeff-snpsift
# Read help
python3.8 prepare_pipeline.py --help
Please, find below running examples:
# In case I want all default parameters, and my VCF files are in vcf_dir:
python3.8 vcf_dir path/GWASCat.tsv path/GeneSets.gmt path/dbNSFP.tsv
# Same case as above, but
# - I want snpeff not to run with pre-installed genomes
# - I wans to search recursively in vcf_dir for VCF files
python3.8 vcf_dir \
path/GWASCat.tsv \
path/GeneSets.gmt \
path/dbNSFP.tsv \
--snpeff-extra '-no-genome'
--recursive
This is a yaml file. The following keys are required (in any order):
# As simple key: value
design: /path/to/design_file.tsv (string)
workdir: /path/to/workdir (string)
threads: maximum number of threads (integer)
singularity_docker_image: name of a docker/singularity image (string)
# As key: list of values
cold_storage:
- /path/to/cold_storage1 (string)
- /path/to/cold_storage2 (string)
...
# As nested key: key: value
ref:
GWASCat: /path/to/gwascat.tsv
GeneSets: /path/to/GeneSets.gmt
dbNSFP: /path/to/dbNSFP.tsv
params:
snpeff_extra: Extra parameters (string) for SnpEff
snpsift_varType_extra: Extra parameters (string) for SnpSift
snpsift_GWASCat_extra: Extra parameters (string) for Snpsift
snpsift_GeneSets_extra: Extra parameters (string) for Snpsift
snpsift_dbNSFP_extra: Extra parameters (string) for Snpsift
workflow
multiqc: weather to run multiqc or not (boolean)
A complete config.yaml file would look like this:
design: design.tsv
workdir: .
threads: 1
singularity_docker_image: docker://continuumio/miniconda3:4.4.10
cold_storage:
- /media
ref:
GWASCat: /path/to/gwascat.tsv
GeneSets: /path/to/GeneSets.gmt
dbNSFP: /path/to/dbNSFP.tsv
workflow:
multiqc: true
params:
copy_extra: --parents --verbose
snpeff_extra: -v
snpsift_varType_extra: ""
snpsift_GWASCat_extra: ""
snpsift_GeneSets_extra: ""
snpsift_dbNSFP_extra: "-v"
This is a TSV file describing our analysis. The column order is not relevant. If you want to build it manually, use your favorite tabular-file editor.
It must contain the following columns:
* Sample_id: the name of each samples
* VCF_File: path to the upstream VCF file
The optional columns are:
* VCF_Index: path to tbi-indexed files
* Any other information
An paired-end miniamal-example would be:
Sample_id | VCF_File |
---|---|
Sample 1 | /path/to/file1.vcf |
Sample 2 | /path/to/file2.vcf |
Typos corrections and issues are welcomed