Main tool : fastp
Code repository: https://github.com/OpenGene/fastp
Additional tools:
- jq: 1.7
Basic information on how to use this tool:
- executable:
fastp - help:
-?,--helpfastp - version:
-v,--version - description: A tool designed to provide ultrafast all-in-one preprocessing and quality control for FastQ data.
Additional information:
This tool is not meant for usage with long read data (e.g. Nanopore, PacBio, Cyclone). This tool is meant for processing short reads for FASTQ files generated by tools including Illumina NovaSeq and MGI.
Inputs can be presented as files, in a batch or individually, or from STDIN. Output can be pushed to a file or STDOUT.
Threading can be done with -w, --thread. The default worker thread number is 3.
Shifu Chen. 2023. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta 2: e107. https://doi.org/10.1002/imt2.107
Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu; fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics, Volume 34, Issue 17, 1 September 2018, Pages i884–i890, https://doi.org/10.1093/bioinformatics/bty560
Full documentation: https://github.com/OpenGene/fastp All options as presented in --help: https://github.com/opengene/fastp?tab=readme-ov-file#all-options
Example reports can be seen here:
- HTML report: http://opengene.org/fastp/fastp.html
- JSON report: http://opengene.org/fastp/fastp.json
fastp -i in.fq -o out.fqfastp -i SRR13957123_1.fastq.gz -I SRR13957123_2.fastq.gz -o SRR13957123_PE1.fastq.gz -O SRR13957123_PE2.fastq.gz -h SRR13957123_fastp.html -j SRR13957123_fastp.jsonpython parallel.py -i /path/to/input/folder -o /path/to/output/folder -r /path/to/reports/folder -a '-f 3 -t 2'which means to:
- process all the FASTQ data in /path/to/input/folder
- using fastp in PATH
- with arguments -f 3 and -t 2, which means trimming 3bp in head and 2bp in tail
- output all clean data to /path/to/output/folder
- output all HTML and JSON reports to /path/to/reports/folder