A program for mapping fastq files to a (bowtie2 indexed) reference genome with bwa-mem, then converting the output sam file to a sorted indexed bam file. This output file is suitable for viewing in UCSC genome browser or IGV, or for passing on to other analysis pipelines that need sorted indexed bam files.
This pipeline uses conda environments to standardize the software used for each run.
- Install conda: https://github.com/conda-forge/miniforge#mambaforge
- Create a conda environment and install snakemake there:
conda create -c conda-forge -c bioconda -n snakemake snakemake
conda activate snakemake
- setup the conda environment to use strict mode:
conda config --set channel_priority strict
- Change directory to a folder where you want to run the analysis
- clone this git repository into the folder
- Edit the config.yaml file using the instructions in the comments. Use a text editor that outputs unix line endings (e.g. vscode, notepad++, gedit, micro, emacs, vim, vi, etc.)
- If snakemake is not your active conda environment, activate snakemake with:
conda activate snakemake
You can use this pipeline with locally installed copies of bwa and samtools, or you can have snakemake download copies of bwa and samtools automatically for you. If you choose to have snakemake do it, snakemake will install the programs in your current folder, in .snakemake/conda/verylongrandomstring
- To run the script with your locally installed programs, use:
snakemake -s map_fastqs.smk --cores 4
- To have snakemake use mamba (its default) to download programs, use:
snakemake -s map_fastqs.smk --cores 4 --use-conda
- or (if you're using some other conda) use:
snakemake -s map_fastqs.smk --cores 4 --use-conda --conda-frontend conda