This repository maintains the source code of the ICGC ARGO Pre Alignment QC Workflow for DNA/RNA Seq. The workflow is written in Nextflow workflow language using DSLv2, with modules imported from other ICGC ARGO GitHub repositories. Specifically, here are repositories maintaining various of tools/modules:
- https://github.com/icgc-argo-workflows/argo-qc-tools
- https://github.com/icgc-argo-workflows/dna-seq-processing-tools
- https://github.com/icgc-argo-workflows/data-processing-utility-tools
- https://github.com/icgc-argo-workflows/nextflow-dna-seq-processing-tools
Each Nextflow module (including associated container image which is registered in ghcr.io) is strictly version controlled and released independently. To ensure reproducibility the pipeline declares explicitly which specific version of a module is to be imported.
- download input sequencing metadata/data from data center using SONG/SCORE client tools
- preprocess input sequencing reads (in
FASTQorBAM) intoFASTQfile(s) per read group - perform
FastQCanalysis forFASTQfile(s) per read group - perform
Cutadaptanalysis forFASTQfile(s) per read group - perform
MultiQCanalysis to generate aggregated results - generate
SONGmetadata for all collected QC metrics files and upload them toSONG/SCORE
- study_id
- analysis_metadata
- sequencing_files
- study_id
- analysis_id
- song_url
- score_url
- api_token
- file_pair_map: CSV file with each row contains 3 columns:
read_group_id,file_r1,file_r2, which represent information for each read group - payload: Payload contains metadata for all QC files
- multiqc_report: HTML report file
multiqc_report.html
To run the pipeline, please follow instruction here to install Nextflow (version 20.10 or higher) first.
With inputs prepared, you should be able to run the workflow using the following command. Please replace the params file with a real one (with all required parameters and input files). Example params file example-params.json can be found in the workflow root folder.
Run 0.1.0 version of the pipeline:
nextflow run icgc-argo-workflows/pre-alignment-qc/main.nf -r 0.1.0 -params-file <your_params_file.json>
You may need to run nextflow pull https://github.com/icgc-argo-workflows/pre-alignment-qc if the version 0.1.0 is new since last time the pipeline was run.
Please note that SONG/SCORE services need to be available and you have appropriate API token.